100 lines
9.6 KiB
Markdown
100 lines
9.6 KiB
Markdown
# Bugfix Requirements Document
|
|
|
|
## Introduction
|
|
|
|
Multiple compliance endpoints incorrectly key their queries by `compliance_uploads.id` (or by individual upload row) instead of by `compliance_uploads.report_date`. The compliance pipeline accepts one xlsx file per vertical (e.g., NTS_AEO, SDIT_CISO, TSI), so a single calendar date typically produces several `compliance_uploads` rows. Any query, aggregation, or "pick latest" logic that treats each upload as a distinct date — instead of grouping all uploads sharing a `report_date` — produces duplicated, fragmented, or silently dropped data.
|
|
|
|
The originally reported defect (GitLab issue #12, reported by nkapur) was the "Active Findings Over Time" chart on the Compliance page showing 3 entries for 5/11 after STEAM uploaded three vertical data sets that day. Investigation found that the same root cause — keying by `upload_id` instead of `report_date` — affects `GET /trends`, `GET /waterfall` (route handler `GET /top-recurring`), `GET /category-trend`, `GET /summary`, and the `compliance_snapshots` block in `persistUpload()`. This spec covers fixes for all five.
|
|
|
|
## Bug Analysis
|
|
|
|
### Current Behavior (Defect)
|
|
|
|
#### /trends (originally reported)
|
|
|
|
1.1 WHEN multiple compliance uploads exist with the same `report_date` (due to per-vertical uploads) THEN the system returns one trend data point per upload row, producing duplicate x-axis entries on the chart
|
|
|
|
1.2 WHEN the chart renders multiple entries for the same date THEN the x-axis displays repeated date labels (e.g., three "05/11/25" entries) making the trend line misleading and unreadable
|
|
|
|
1.3 WHEN per-team counts are computed for duplicate-date uploads THEN the system counts items per individual `upload_id` rather than aggregating across all uploads sharing that date, resulting in fragmented per-team totals
|
|
|
|
#### /waterfall (route handler `GET /top-recurring`)
|
|
|
|
1.4 WHEN multiple compliance uploads exist with the same `report_date` THEN the underlying query `SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC` returns one row per upload and `computeWaterfall()` emits one bar per row, producing multiple bars stacked under the same date label
|
|
|
|
1.5 WHEN `computeWaterfall()` carries `start` forward across multiple rows that share a `report_date` THEN each per-vertical row's `new_count`/`recurring_count`/`resolved_count` deltas are applied sequentially as if they were separate cycles, so the running `start` and `end` totals for that date are wrong (they reflect the last row's running balance rather than the date-level aggregate)
|
|
|
|
#### /category-trend
|
|
|
|
1.6 WHEN multiple compliance uploads exist with the same `report_date` THEN the query grouped by `cu.id, cu.report_date, category` returns one row per (upload, category) pair, producing duplicated stacked bars per date when the chart is keyed on `report_date`
|
|
|
|
1.7 WHEN per-category counts are surfaced for a date with multiple uploads THEN counts are reported per-vertical instead of aggregated across all verticals sharing that `report_date`, so no row in the response represents the full date-level category total
|
|
|
|
#### /summary
|
|
|
|
1.8 WHEN multiple uploads exist for the latest `report_date` THEN the query `WHERE vertical IS NULL ORDER BY id DESC LIMIT 1` (with fallback to `vertical = 'NTS_AEO'`) selects a single upload for that date and discards the `summary_json` of all other verticals, silently dropping their data
|
|
|
|
1.9 WHEN the summary returned by `/summary` is compared against `/trends`, `/waterfall`, or `/category-trend` for the same latest date THEN the figures do not reconcile, because `/summary` reflects one vertical's upload while the other endpoints aggregate (or duplicate) across all verticals
|
|
|
|
#### `compliance_snapshots` creation in `persistUpload()`
|
|
|
|
1.10 WHEN `persistUpload()` computes per-vertical compliance stats THEN the query filters only `WHERE team IS NOT NULL` and groups by `team`, with no filter or grouping on `vertical`, so item counts pulled from `compliance_items` are aggregated across every vertical present in the table
|
|
|
|
1.11 WHEN the resulting per-team totals are written into `compliance_snapshots` for a single vertical's upload THEN the `total_devices`, `compliant`, and `non_compliant` columns reflect cross-vertical totals rather than the snapshotted vertical, corrupting the monthly snapshot record
|
|
|
|
### Expected Behavior (Correct)
|
|
|
|
#### /trends (originally reported)
|
|
|
|
2.1 WHEN multiple compliance uploads exist with the same `report_date` THEN the system SHALL aggregate their counts (new_count, recurring_count, resolved_count, total_active) into a single trend data point per unique date
|
|
|
|
2.2 WHEN the chart renders trend data THEN each unique `report_date` SHALL appear exactly once on the x-axis regardless of how many upload records exist for that date
|
|
|
|
2.3 WHEN per-team counts are computed for a date with multiple uploads THEN the system SHALL aggregate team item counts across all uploads sharing that `report_date`, producing a single per-team total per date
|
|
|
|
#### /waterfall (route handler `GET /top-recurring`)
|
|
|
|
2.4 WHEN multiple compliance uploads exist with the same `report_date` THEN the system SHALL aggregate `new_count`, `recurring_count`, and `resolved_count` across all uploads sharing that `report_date` into a single per-date row before passing rows to `computeWaterfall()`
|
|
|
|
2.5 WHEN `computeWaterfall()` consumes the aggregated rows THEN it SHALL emit exactly one waterfall entry per unique `report_date` and the running `start`/`end` totals SHALL advance using each date's date-level aggregate deltas (not per-upload deltas)
|
|
|
|
#### /category-trend
|
|
|
|
2.6 WHEN multiple compliance uploads exist with the same `report_date` THEN the query SHALL group by `cu.report_date, category` (without `cu.id` in the GROUP BY) and `SUM`/`COUNT` items across all uploads sharing the date, producing one row per (date, category) pair
|
|
|
|
2.7 WHEN per-category counts are returned for a date with multiple uploads THEN the `count` field SHALL be the sum of items in that category across every upload for that `report_date`
|
|
|
|
#### /summary
|
|
|
|
2.8 WHEN multiple uploads exist for the latest `report_date` THEN the system SHALL either (a) merge the `summary_json` of all uploads sharing that date into a single combined summary response, or (b) return a documented, well-defined selection (e.g., a named "primary" vertical) along with metadata indicating which uploads were considered, rather than silently picking one by `ORDER BY id DESC LIMIT 1`
|
|
|
|
2.9 WHEN the response is constructed for a date with multiple uploads THEN the `upload` field SHALL identify the set of uploads that contributed to the response (or, if a single representative is returned, the response SHALL include a flag/field indicating other uploads exist for the same date that were not merged)
|
|
|
|
#### `compliance_snapshots` creation in `persistUpload()`
|
|
|
|
2.10 WHEN `persistUpload()` computes per-vertical compliance stats THEN the query SHALL filter `compliance_items` by the `vertical` of the upload being persisted (in addition to `team IS NOT NULL`) and group by `vertical, team`, so each snapshot row reflects only the items belonging to that vertical
|
|
|
|
2.11 WHEN snapshots are written into `compliance_snapshots` THEN the `total_devices`, `compliant`, and `non_compliant` values SHALL match the items belonging to the snapshotted vertical only and SHALL NOT be inflated by items from other verticals
|
|
|
|
### Unchanged Behavior (Regression Prevention)
|
|
|
|
3.1 WHEN only one compliance upload exists per `report_date` (single-file upload workflow) THEN the system SHALL CONTINUE TO return that date's counts unchanged as a single trend data point
|
|
|
|
3.2 WHEN the chart displays trend data THEN the system SHALL CONTINUE TO show all existing data fields (new_count, recurring_count, resolved_count, total_active, per-team breakdowns) with correct values
|
|
|
|
3.3 WHEN no compliance uploads exist THEN the system SHALL CONTINUE TO return an empty trends array and the chart SHALL CONTINUE TO display the "no data" state
|
|
|
|
3.4 WHEN only one compliance upload exists per `report_date` THEN `GET /waterfall` SHALL CONTINUE TO emit one entry per date with the same `start`, `new_count`, `recurring_count`, `resolved_count`, and `end` fields and the same running-total semantics as before
|
|
|
|
3.5 WHEN only one compliance upload exists per `report_date` THEN `GET /category-trend` SHALL CONTINUE TO return one row per (date, category) pair with the same `report_date`, `category`, and `count` field shape as before
|
|
|
|
3.6 WHEN only one compliance upload exists for the latest `report_date` THEN `GET /summary` SHALL CONTINUE TO return the same `entries`, `overall_scores`, and `upload` shape as before, including the existing `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for selecting which upload's summary to surface
|
|
|
|
3.7 WHEN `/summary` is called with a `team` query parameter THEN the system SHALL CONTINUE TO filter `entries` by the requested team and SHALL CONTINUE TO reject teams not in `ALLOWED_TEAMS` with HTTP 400
|
|
|
|
3.8 WHEN `persistUpload()` writes a snapshot for a vertical that is the only vertical present in `compliance_items` for that month THEN the snapshot row's `total_devices`, `compliant`, `non_compliant`, and `compliance_pct` SHALL CONTINUE TO be identical to the pre-fix values (no behavioural change in the single-vertical case)
|
|
|
|
3.9 WHEN `persistUpload()` encounters an error during snapshot creation THEN the system SHALL CONTINUE TO log the error and complete the upload commit successfully (snapshot creation remains non-critical)
|
|
|
|
3.10 WHEN any of these endpoints are queried with no matching data (no uploads, no items for a vertical, no items in a category) THEN the system SHALL CONTINUE TO return the existing empty-state response shapes
|