# Bugfix Requirements Document ## Introduction Multiple compliance endpoints incorrectly key their queries by `compliance_uploads.id` (or by individual upload row) instead of by `compliance_uploads.report_date`. The compliance pipeline accepts one xlsx file per vertical (e.g., NTS_AEO, SDIT_CISO, TSI), so a single calendar date typically produces several `compliance_uploads` rows. Any query, aggregation, or "pick latest" logic that treats each upload as a distinct date — instead of grouping all uploads sharing a `report_date` — produces duplicated, fragmented, or silently dropped data. The originally reported defect (GitLab issue #12, reported by nkapur) was the "Active Findings Over Time" chart on the Compliance page showing 3 entries for 5/11 after STEAM uploaded three vertical data sets that day. Investigation found that the same root cause — keying by `upload_id` instead of `report_date` — affects `GET /trends`, `GET /waterfall` (route handler `GET /top-recurring`), `GET /category-trend`, `GET /summary`, and the `compliance_snapshots` block in `persistUpload()`. This spec covers fixes for all five. ## Bug Analysis ### Current Behavior (Defect) #### /trends (originally reported) 1.1 WHEN multiple compliance uploads exist with the same `report_date` (due to per-vertical uploads) THEN the system returns one trend data point per upload row, producing duplicate x-axis entries on the chart 1.2 WHEN the chart renders multiple entries for the same date THEN the x-axis displays repeated date labels (e.g., three "05/11/25" entries) making the trend line misleading and unreadable 1.3 WHEN per-team counts are computed for duplicate-date uploads THEN the system counts items per individual `upload_id` rather than aggregating across all uploads sharing that date, resulting in fragmented per-team totals #### /waterfall (route handler `GET /top-recurring`) 1.4 WHEN multiple compliance uploads exist with the same `report_date` THEN the underlying query `SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC` returns one row per upload and `computeWaterfall()` emits one bar per row, producing multiple bars stacked under the same date label 1.5 WHEN `computeWaterfall()` carries `start` forward across multiple rows that share a `report_date` THEN each per-vertical row's `new_count`/`recurring_count`/`resolved_count` deltas are applied sequentially as if they were separate cycles, so the running `start` and `end` totals for that date are wrong (they reflect the last row's running balance rather than the date-level aggregate) #### /category-trend 1.6 WHEN multiple compliance uploads exist with the same `report_date` THEN the query grouped by `cu.id, cu.report_date, category` returns one row per (upload, category) pair, producing duplicated stacked bars per date when the chart is keyed on `report_date` 1.7 WHEN per-category counts are surfaced for a date with multiple uploads THEN counts are reported per-vertical instead of aggregated across all verticals sharing that `report_date`, so no row in the response represents the full date-level category total #### /summary 1.8 WHEN multiple uploads exist for the latest `report_date` THEN the query `WHERE vertical IS NULL ORDER BY id DESC LIMIT 1` (with fallback to `vertical = 'NTS_AEO'`) selects a single upload for that date and discards the `summary_json` of all other verticals, silently dropping their data 1.9 WHEN the summary returned by `/summary` is compared against `/trends`, `/waterfall`, or `/category-trend` for the same latest date THEN the figures do not reconcile, because `/summary` reflects one vertical's upload while the other endpoints aggregate (or duplicate) across all verticals #### `compliance_snapshots` creation in `persistUpload()` 1.10 WHEN `persistUpload()` computes per-vertical compliance stats THEN the query filters only `WHERE team IS NOT NULL` and groups by `team`, with no filter or grouping on `vertical`, so item counts pulled from `compliance_items` are aggregated across every vertical present in the table 1.11 WHEN the resulting per-team totals are written into `compliance_snapshots` for a single vertical's upload THEN the `total_devices`, `compliant`, and `non_compliant` columns reflect cross-vertical totals rather than the snapshotted vertical, corrupting the monthly snapshot record ### Expected Behavior (Correct) #### /trends (originally reported) 2.1 WHEN multiple compliance uploads exist with the same `report_date` THEN the system SHALL aggregate their counts (new_count, recurring_count, resolved_count, total_active) into a single trend data point per unique date 2.2 WHEN the chart renders trend data THEN each unique `report_date` SHALL appear exactly once on the x-axis regardless of how many upload records exist for that date 2.3 WHEN per-team counts are computed for a date with multiple uploads THEN the system SHALL aggregate team item counts across all uploads sharing that `report_date`, producing a single per-team total per date #### /waterfall (route handler `GET /top-recurring`) 2.4 WHEN multiple compliance uploads exist with the same `report_date` THEN the system SHALL aggregate `new_count`, `recurring_count`, and `resolved_count` across all uploads sharing that `report_date` into a single per-date row before passing rows to `computeWaterfall()` 2.5 WHEN `computeWaterfall()` consumes the aggregated rows THEN it SHALL emit exactly one waterfall entry per unique `report_date` and the running `start`/`end` totals SHALL advance using each date's date-level aggregate deltas (not per-upload deltas) #### /category-trend 2.6 WHEN multiple compliance uploads exist with the same `report_date` THEN the query SHALL group by `cu.report_date, category` (without `cu.id` in the GROUP BY) and `SUM`/`COUNT` items across all uploads sharing the date, producing one row per (date, category) pair 2.7 WHEN per-category counts are returned for a date with multiple uploads THEN the `count` field SHALL be the sum of items in that category across every upload for that `report_date` #### /summary 2.8 WHEN multiple uploads exist for the latest `report_date` THEN the system SHALL either (a) merge the `summary_json` of all uploads sharing that date into a single combined summary response, or (b) return a documented, well-defined selection (e.g., a named "primary" vertical) along with metadata indicating which uploads were considered, rather than silently picking one by `ORDER BY id DESC LIMIT 1` 2.9 WHEN the response is constructed for a date with multiple uploads THEN the `upload` field SHALL identify the set of uploads that contributed to the response (or, if a single representative is returned, the response SHALL include a flag/field indicating other uploads exist for the same date that were not merged) #### `compliance_snapshots` creation in `persistUpload()` 2.10 WHEN `persistUpload()` computes per-vertical compliance stats THEN the query SHALL filter `compliance_items` by the `vertical` of the upload being persisted (in addition to `team IS NOT NULL`) and group by `vertical, team`, so each snapshot row reflects only the items belonging to that vertical 2.11 WHEN snapshots are written into `compliance_snapshots` THEN the `total_devices`, `compliant`, and `non_compliant` values SHALL match the items belonging to the snapshotted vertical only and SHALL NOT be inflated by items from other verticals ### Unchanged Behavior (Regression Prevention) 3.1 WHEN only one compliance upload exists per `report_date` (single-file upload workflow) THEN the system SHALL CONTINUE TO return that date's counts unchanged as a single trend data point 3.2 WHEN the chart displays trend data THEN the system SHALL CONTINUE TO show all existing data fields (new_count, recurring_count, resolved_count, total_active, per-team breakdowns) with correct values 3.3 WHEN no compliance uploads exist THEN the system SHALL CONTINUE TO return an empty trends array and the chart SHALL CONTINUE TO display the "no data" state 3.4 WHEN only one compliance upload exists per `report_date` THEN `GET /waterfall` SHALL CONTINUE TO emit one entry per date with the same `start`, `new_count`, `recurring_count`, `resolved_count`, and `end` fields and the same running-total semantics as before 3.5 WHEN only one compliance upload exists per `report_date` THEN `GET /category-trend` SHALL CONTINUE TO return one row per (date, category) pair with the same `report_date`, `category`, and `count` field shape as before 3.6 WHEN only one compliance upload exists for the latest `report_date` THEN `GET /summary` SHALL CONTINUE TO return the same `entries`, `overall_scores`, and `upload` shape as before, including the existing `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for selecting which upload's summary to surface 3.7 WHEN `/summary` is called with a `team` query parameter THEN the system SHALL CONTINUE TO filter `entries` by the requested team and SHALL CONTINUE TO reject teams not in `ALLOWED_TEAMS` with HTTP 400 3.8 WHEN `persistUpload()` writes a snapshot for a vertical that is the only vertical present in `compliance_items` for that month THEN the snapshot row's `total_devices`, `compliant`, `non_compliant`, and `compliance_pct` SHALL CONTINUE TO be identical to the pre-fix values (no behavioural change in the single-vertical case) 3.9 WHEN `persistUpload()` encounters an error during snapshot creation THEN the system SHALL CONTINUE TO log the error and complete the upload commit successfully (snapshot creation remains non-critical) 3.10 WHEN any of these endpoints are queried with no matching data (no uploads, no items for a vertical, no items in a category) THEN the system SHALL CONTINUE TO return the existing empty-state response shapes