Auto-sync .kiro/ from master (post-checkout hook)
This commit is contained in:
@@ -0,0 +1 @@
|
||||
{"specId": "7a1ca671-3974-49b1-8e83-023077e758d5", "workflowType": "requirements-first", "specType": "bugfix"}
|
||||
99
.kiro/specs/compliance-duplicate-chart-entries/bugfix.md
Normal file
99
.kiro/specs/compliance-duplicate-chart-entries/bugfix.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# Bugfix Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
Multiple compliance endpoints incorrectly key their queries by `compliance_uploads.id` (or by individual upload row) instead of by `compliance_uploads.report_date`. The compliance pipeline accepts one xlsx file per vertical (e.g., NTS_AEO, SDIT_CISO, TSI), so a single calendar date typically produces several `compliance_uploads` rows. Any query, aggregation, or "pick latest" logic that treats each upload as a distinct date — instead of grouping all uploads sharing a `report_date` — produces duplicated, fragmented, or silently dropped data.
|
||||
|
||||
The originally reported defect (GitLab issue #12, reported by nkapur) was the "Active Findings Over Time" chart on the Compliance page showing 3 entries for 5/11 after STEAM uploaded three vertical data sets that day. Investigation found that the same root cause — keying by `upload_id` instead of `report_date` — affects `GET /trends`, `GET /waterfall` (route handler `GET /top-recurring`), `GET /category-trend`, `GET /summary`, and the `compliance_snapshots` block in `persistUpload()`. This spec covers fixes for all five.
|
||||
|
||||
## Bug Analysis
|
||||
|
||||
### Current Behavior (Defect)
|
||||
|
||||
#### /trends (originally reported)
|
||||
|
||||
1.1 WHEN multiple compliance uploads exist with the same `report_date` (due to per-vertical uploads) THEN the system returns one trend data point per upload row, producing duplicate x-axis entries on the chart
|
||||
|
||||
1.2 WHEN the chart renders multiple entries for the same date THEN the x-axis displays repeated date labels (e.g., three "05/11/25" entries) making the trend line misleading and unreadable
|
||||
|
||||
1.3 WHEN per-team counts are computed for duplicate-date uploads THEN the system counts items per individual `upload_id` rather than aggregating across all uploads sharing that date, resulting in fragmented per-team totals
|
||||
|
||||
#### /waterfall (route handler `GET /top-recurring`)
|
||||
|
||||
1.4 WHEN multiple compliance uploads exist with the same `report_date` THEN the underlying query `SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC` returns one row per upload and `computeWaterfall()` emits one bar per row, producing multiple bars stacked under the same date label
|
||||
|
||||
1.5 WHEN `computeWaterfall()` carries `start` forward across multiple rows that share a `report_date` THEN each per-vertical row's `new_count`/`recurring_count`/`resolved_count` deltas are applied sequentially as if they were separate cycles, so the running `start` and `end` totals for that date are wrong (they reflect the last row's running balance rather than the date-level aggregate)
|
||||
|
||||
#### /category-trend
|
||||
|
||||
1.6 WHEN multiple compliance uploads exist with the same `report_date` THEN the query grouped by `cu.id, cu.report_date, category` returns one row per (upload, category) pair, producing duplicated stacked bars per date when the chart is keyed on `report_date`
|
||||
|
||||
1.7 WHEN per-category counts are surfaced for a date with multiple uploads THEN counts are reported per-vertical instead of aggregated across all verticals sharing that `report_date`, so no row in the response represents the full date-level category total
|
||||
|
||||
#### /summary
|
||||
|
||||
1.8 WHEN multiple uploads exist for the latest `report_date` THEN the query `WHERE vertical IS NULL ORDER BY id DESC LIMIT 1` (with fallback to `vertical = 'NTS_AEO'`) selects a single upload for that date and discards the `summary_json` of all other verticals, silently dropping their data
|
||||
|
||||
1.9 WHEN the summary returned by `/summary` is compared against `/trends`, `/waterfall`, or `/category-trend` for the same latest date THEN the figures do not reconcile, because `/summary` reflects one vertical's upload while the other endpoints aggregate (or duplicate) across all verticals
|
||||
|
||||
#### `compliance_snapshots` creation in `persistUpload()`
|
||||
|
||||
1.10 WHEN `persistUpload()` computes per-vertical compliance stats THEN the query filters only `WHERE team IS NOT NULL` and groups by `team`, with no filter or grouping on `vertical`, so item counts pulled from `compliance_items` are aggregated across every vertical present in the table
|
||||
|
||||
1.11 WHEN the resulting per-team totals are written into `compliance_snapshots` for a single vertical's upload THEN the `total_devices`, `compliant`, and `non_compliant` columns reflect cross-vertical totals rather than the snapshotted vertical, corrupting the monthly snapshot record
|
||||
|
||||
### Expected Behavior (Correct)
|
||||
|
||||
#### /trends (originally reported)
|
||||
|
||||
2.1 WHEN multiple compliance uploads exist with the same `report_date` THEN the system SHALL aggregate their counts (new_count, recurring_count, resolved_count, total_active) into a single trend data point per unique date
|
||||
|
||||
2.2 WHEN the chart renders trend data THEN each unique `report_date` SHALL appear exactly once on the x-axis regardless of how many upload records exist for that date
|
||||
|
||||
2.3 WHEN per-team counts are computed for a date with multiple uploads THEN the system SHALL aggregate team item counts across all uploads sharing that `report_date`, producing a single per-team total per date
|
||||
|
||||
#### /waterfall (route handler `GET /top-recurring`)
|
||||
|
||||
2.4 WHEN multiple compliance uploads exist with the same `report_date` THEN the system SHALL aggregate `new_count`, `recurring_count`, and `resolved_count` across all uploads sharing that `report_date` into a single per-date row before passing rows to `computeWaterfall()`
|
||||
|
||||
2.5 WHEN `computeWaterfall()` consumes the aggregated rows THEN it SHALL emit exactly one waterfall entry per unique `report_date` and the running `start`/`end` totals SHALL advance using each date's date-level aggregate deltas (not per-upload deltas)
|
||||
|
||||
#### /category-trend
|
||||
|
||||
2.6 WHEN multiple compliance uploads exist with the same `report_date` THEN the query SHALL group by `cu.report_date, category` (without `cu.id` in the GROUP BY) and `SUM`/`COUNT` items across all uploads sharing the date, producing one row per (date, category) pair
|
||||
|
||||
2.7 WHEN per-category counts are returned for a date with multiple uploads THEN the `count` field SHALL be the sum of items in that category across every upload for that `report_date`
|
||||
|
||||
#### /summary
|
||||
|
||||
2.8 WHEN multiple uploads exist for the latest `report_date` THEN the system SHALL either (a) merge the `summary_json` of all uploads sharing that date into a single combined summary response, or (b) return a documented, well-defined selection (e.g., a named "primary" vertical) along with metadata indicating which uploads were considered, rather than silently picking one by `ORDER BY id DESC LIMIT 1`
|
||||
|
||||
2.9 WHEN the response is constructed for a date with multiple uploads THEN the `upload` field SHALL identify the set of uploads that contributed to the response (or, if a single representative is returned, the response SHALL include a flag/field indicating other uploads exist for the same date that were not merged)
|
||||
|
||||
#### `compliance_snapshots` creation in `persistUpload()`
|
||||
|
||||
2.10 WHEN `persistUpload()` computes per-vertical compliance stats THEN the query SHALL filter `compliance_items` by the `vertical` of the upload being persisted (in addition to `team IS NOT NULL`) and group by `vertical, team`, so each snapshot row reflects only the items belonging to that vertical
|
||||
|
||||
2.11 WHEN snapshots are written into `compliance_snapshots` THEN the `total_devices`, `compliant`, and `non_compliant` values SHALL match the items belonging to the snapshotted vertical only and SHALL NOT be inflated by items from other verticals
|
||||
|
||||
### Unchanged Behavior (Regression Prevention)
|
||||
|
||||
3.1 WHEN only one compliance upload exists per `report_date` (single-file upload workflow) THEN the system SHALL CONTINUE TO return that date's counts unchanged as a single trend data point
|
||||
|
||||
3.2 WHEN the chart displays trend data THEN the system SHALL CONTINUE TO show all existing data fields (new_count, recurring_count, resolved_count, total_active, per-team breakdowns) with correct values
|
||||
|
||||
3.3 WHEN no compliance uploads exist THEN the system SHALL CONTINUE TO return an empty trends array and the chart SHALL CONTINUE TO display the "no data" state
|
||||
|
||||
3.4 WHEN only one compliance upload exists per `report_date` THEN `GET /waterfall` SHALL CONTINUE TO emit one entry per date with the same `start`, `new_count`, `recurring_count`, `resolved_count`, and `end` fields and the same running-total semantics as before
|
||||
|
||||
3.5 WHEN only one compliance upload exists per `report_date` THEN `GET /category-trend` SHALL CONTINUE TO return one row per (date, category) pair with the same `report_date`, `category`, and `count` field shape as before
|
||||
|
||||
3.6 WHEN only one compliance upload exists for the latest `report_date` THEN `GET /summary` SHALL CONTINUE TO return the same `entries`, `overall_scores`, and `upload` shape as before, including the existing `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for selecting which upload's summary to surface
|
||||
|
||||
3.7 WHEN `/summary` is called with a `team` query parameter THEN the system SHALL CONTINUE TO filter `entries` by the requested team and SHALL CONTINUE TO reject teams not in `ALLOWED_TEAMS` with HTTP 400
|
||||
|
||||
3.8 WHEN `persistUpload()` writes a snapshot for a vertical that is the only vertical present in `compliance_items` for that month THEN the snapshot row's `total_devices`, `compliant`, `non_compliant`, and `compliance_pct` SHALL CONTINUE TO be identical to the pre-fix values (no behavioural change in the single-vertical case)
|
||||
|
||||
3.9 WHEN `persistUpload()` encounters an error during snapshot creation THEN the system SHALL CONTINUE TO log the error and complete the upload commit successfully (snapshot creation remains non-critical)
|
||||
|
||||
3.10 WHEN any of these endpoints are queried with no matching data (no uploads, no items for a vertical, no items in a category) THEN the system SHALL CONTINUE TO return the existing empty-state response shapes
|
||||
395
.kiro/specs/compliance-duplicate-chart-entries/design.md
Normal file
395
.kiro/specs/compliance-duplicate-chart-entries/design.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# Compliance Duplicate Chart Entries Bugfix Design
|
||||
|
||||
## Overview
|
||||
|
||||
Five compliance endpoints (`GET /trends`, `GET /top-recurring`, `GET /category-trend`, `GET /summary`) and the `compliance_snapshots` block inside `persistUpload()` all share the same root cause: they key by `compliance_uploads.id` (one row per uploaded xlsx) instead of by `compliance_uploads.report_date` (the calendar date the report covers). Because the compliance pipeline accepts one xlsx per vertical (NTS_AEO, SDIT_CISO, TSI), a single `report_date` typically maps to several `compliance_uploads` rows, and any query that does not aggregate over `report_date` produces duplicated, fragmented, or silently dropped data.
|
||||
|
||||
The fix is uniform across endpoints: rewrite the SQL so the result set has exactly one row per unique `report_date`, using `GROUP BY report_date` with `SUM` aggregations for count-style endpoints and `DISTINCT ON (report_date)` for the latest-snapshot endpoint. The `persistUpload()` snapshot block is fixed by adding a `vertical` filter so per-vertical snapshots are no longer cross-contaminated by other verticals' items.
|
||||
|
||||
The implementation is intentionally minimal: each fix changes a single SQL statement (and, in one case, a small JavaScript loop). No frontend changes are required — the chart components already key on `report_date` and will render correctly once the API returns one row per date.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Bug_Condition (C)**: The condition that triggers the bug — two or more rows in `compliance_uploads` share the same `report_date` (i.e., a multi-vertical upload day).
|
||||
- **Property (P)**: The desired behavior when C holds — each affected endpoint returns exactly one entry per unique `report_date`, and the values aggregated across uploads for that date reconcile with the underlying `compliance_items` totals.
|
||||
- **Preservation**: Behavior on dates with a single upload row, on the empty-data response shape, and on unrelated query parameters (e.g., `team` filter on `/summary`) — all must be byte-for-byte unchanged.
|
||||
- **report_date**: `TEXT` column on `compliance_uploads` storing the reporting period the xlsx covers (e.g., `2025-05-11`). One date can have multiple upload rows when multiple verticals are uploaded for that date.
|
||||
- **vertical**: `TEXT` column on `compliance_uploads` and `compliance_items` identifying which xlsx (NTS_AEO, SDIT_CISO, TSI) an upload or item belongs to. `NULL` indicates a legacy AEO-only upload.
|
||||
- **persistUpload()**: Function in `backend/routes/compliance.js` (lines 81–192) that writes a parsed upload to the DB inside a transaction and then writes per-vertical snapshots into `compliance_snapshots`.
|
||||
- **computeWaterfall(uploads)**: Pure helper in `backend/routes/compliance.js` (lines 235–243) that takes an ordered list of upload rows and emits one waterfall entry per row, carrying the running `start` forward.
|
||||
|
||||
## Bug Details
|
||||
|
||||
### Bug Condition
|
||||
|
||||
The bug manifests when two or more `compliance_uploads` rows share the same `report_date`. This happens whenever the operator uploads more than one vertical xlsx for the same reporting cycle (the documented multi-vertical workflow). The five affected code paths each produce one row per upload instead of aggregating to one row per `report_date`.
|
||||
|
||||
**Formal Specification:**
|
||||
```
|
||||
FUNCTION isBugCondition(uploads)
|
||||
INPUT: uploads — list of compliance_uploads rows
|
||||
OUTPUT: boolean
|
||||
|
||||
// The bug condition is triggered for any report_date that has more than one upload row
|
||||
GROUP uploads BY report_date INTO groups
|
||||
RETURN EXISTS group IN groups WHERE COUNT(group) > 1
|
||||
END FUNCTION
|
||||
```
|
||||
|
||||
For a single endpoint response to be considered buggy, the API output must additionally fail one of the following invariants (the per-endpoint manifestation of the same root cause):
|
||||
|
||||
```
|
||||
FUNCTION isBuggyResponse(endpoint, response)
|
||||
CASE endpoint OF
|
||||
'/trends': RETURN COUNT(response.trends) != COUNT(DISTINCT report_date IN compliance_uploads)
|
||||
'/top-recurring': RETURN COUNT(response.waterfall) != COUNT(DISTINCT report_date IN compliance_uploads)
|
||||
'/category-trend': RETURN EXISTS (date, category) WITH COUNT(*) > 1 IN response.categoryTrend
|
||||
'/summary': RETURN response.upload represents only one of N>1 uploads sharing the latest report_date
|
||||
AND no flag indicates other uploads exist for that date
|
||||
'persistUpload': RETURN snapshots.total_devices > items_belonging_to_this_vertical_only
|
||||
END CASE
|
||||
END FUNCTION
|
||||
```
|
||||
|
||||
### Examples
|
||||
|
||||
The originally reported case (GitLab issue #12, 2025-05-11) and the four sibling manifestations:
|
||||
|
||||
- **`/trends`** — STEAM uploads three xlsx files for `2025-05-11` (one per vertical). The chart shows three "05/11/25" entries on the x-axis instead of one. Expected: a single 05/11/25 point whose `new_count`/`recurring_count`/`resolved_count`/`total_active` are the sums of the three uploads' counts.
|
||||
|
||||
- **`/top-recurring`** — Same three uploads. `computeWaterfall()` receives three rows for `2025-05-11` and emits three bars stacked on the same date. Worse, because `start` carries forward across rows, the second and third bars' `start` reflects the first/second row's `end`, so the three bars in aggregate misrepresent the date-level deltas. Expected: one bar for `2025-05-11` whose `new_count`/`recurring_count`/`resolved_count` are summed across the three uploads, and whose `start` carries from the previous date's `end`.
|
||||
|
||||
- **`/category-trend`** — Same three uploads, each with category-tagged items. The query groups by `(cu.id, cu.report_date, category)` and returns up to `3 × |categories|` rows for `2025-05-11`. The frontend stacks these as duplicated category bars per date. Expected: one row per `(2025-05-11, category)` pair with `count` summed across the three uploads.
|
||||
|
||||
- **`/summary`** — On `2025-05-11`, three uploads exist. The query `WHERE vertical IS NULL ORDER BY id DESC LIMIT 1` (with fallback to `vertical = 'NTS_AEO'`) silently picks one and the other two verticals' `summary_json` is dropped. Expected: either the response merges all three uploads' `entries` and `overall_scores`, or the response includes a `multi_vertical_uploads` array identifying the other uploads that exist for the same `report_date` so the caller knows the response is partial.
|
||||
|
||||
- **Edge case — `persistUpload()` snapshot** — When SDIT_CISO is being persisted on `2025-05-11`, the snapshot query reads `compliance_items WHERE team IS NOT NULL` with no `vertical` filter, so the resulting per-team `total_devices`/`compliant`/`non_compliant` counts include items that belong to NTS_AEO and TSI as well. Expected: the snapshot query filters by the upload's `vertical` and groups by `(vertical, team)`.
|
||||
|
||||
## Expected Behavior
|
||||
|
||||
### Preservation Requirements
|
||||
|
||||
**Unchanged Behaviors:**
|
||||
- Single-upload-per-date dates (legacy AEO-only workflow): every endpoint returns the same numbers, in the same shape, in the same order as before the fix.
|
||||
- Empty-data responses: `/trends` returns `{ trends: [] }`, `/top-recurring` returns `{ waterfall: [] }`, `/category-trend` returns `{ categoryTrend: [] }`, `/summary` returns `{ entries: [], overall_scores: {}, upload: null }`.
|
||||
- `/summary` `team` query parameter: still filters `entries` server-side, still rejects non-`ALLOWED_TEAMS` values with HTTP 400.
|
||||
- `/summary` `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for selecting which upload's `summary_json` to surface (only the additional metadata about sibling uploads is new).
|
||||
- `persistUpload()` error handling: snapshot creation remains wrapped in a `try/catch` that logs but does not fail the upload commit.
|
||||
- `compliance_snapshots` rows for months with only a single vertical present in `compliance_items`: identical values to the pre-fix output.
|
||||
- Frontend chart components: no changes required. They already key on `report_date` and consume the existing response shapes.
|
||||
|
||||
**Scope:**
|
||||
All endpoint inputs that do not involve `report_date` collisions (single-upload dates, empty datasets, error paths, query-parameter filtering) must be byte-for-byte identical to the pre-fix output. The fix only changes what happens when two or more `compliance_uploads` rows share a `report_date`.
|
||||
|
||||
## Hypothesized Root Cause
|
||||
|
||||
All five sites have the same shape of bug — keying by `id` instead of `report_date` — but with slightly different mechanics. Listing them explicitly so the test plan can confirm or refute each one:
|
||||
|
||||
1. **`/trends` — per-row mapping over uploads.** The handler runs `SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC` and `.map()`s each row into a trend entry. Per-team counts are pre-aggregated by `upload_id` and looked up by `u.id`, so duplicate-date rows produce duplicate-date trend entries with split per-team counts.
|
||||
|
||||
2. **`/top-recurring` — `computeWaterfall()` receives per-row data.** The query is identical to `/trends`'s upload query and `computeWaterfall()` carries a stateful `start` forward across rows. Three rows for the same date become three bars whose `start`/`end` running totals are wrong relative to the date-level aggregate.
|
||||
|
||||
3. **`/category-trend` — `GROUP BY cu.id, cu.report_date, category`.** Including `cu.id` in the `GROUP BY` defeats date-level aggregation; one upload row's items get their own (date, category) group instead of summing into the date-level group.
|
||||
|
||||
4. **`/summary` — `ORDER BY id DESC LIMIT 1`.** The query selects a single representative upload for the latest date and discards every other upload sharing that date. This is a "select latest by row id" pattern that does not consider `report_date` ties.
|
||||
|
||||
5. **`persistUpload()` snapshot block — missing `vertical` filter.** The snapshot query reads `compliance_items WHERE team IS NOT NULL GROUP BY team` with no `vertical` predicate. The query was correct when there was one vertical (AEO-only legacy) and silently broke when the multi-vertical migration added a `vertical` column without updating this query.
|
||||
|
||||
The common structural cause is that the multi-vertical migration (`add_vcl_multi_vertical.js`) added a `vertical` column to `compliance_uploads` and `compliance_items` but did not audit existing read queries for the new "many uploads share a `report_date`" reality.
|
||||
|
||||
## Correctness Properties
|
||||
|
||||
Property 1: Bug Condition — `/trends` returns one entry per unique report_date
|
||||
|
||||
_For any_ set of `compliance_uploads` rows where two or more rows share a `report_date`, the response from `GET /trends` SHALL contain exactly one entry per unique `report_date`, with `new_count`, `recurring_count`, `resolved_count`, and `total_active` equal to the SUM of those columns over all uploads sharing that date, and per-team counts equal to the sum of `compliance_items` rows for that team across all those uploads.
|
||||
|
||||
**Validates: Requirements 2.1, 2.2, 2.3**
|
||||
|
||||
Property 2: Bug Condition — `/top-recurring` waterfall has one bar per unique report_date with correct running totals
|
||||
|
||||
_For any_ set of `compliance_uploads` rows where two or more rows share a `report_date`, the response from `GET /top-recurring` SHALL contain exactly one waterfall entry per unique `report_date`, the entry's `new_count`/`recurring_count`/`resolved_count` SHALL equal the sum of those columns over all uploads sharing that date, and the running invariant `entry[i].end == entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count` SHALL hold with `entry[i].start == entry[i-1].end` for adjacent entries (and `entry[0].start == 0`).
|
||||
|
||||
**Validates: Requirements 2.4, 2.5**
|
||||
|
||||
Property 3: Bug Condition — `/category-trend` returns one row per (date, category)
|
||||
|
||||
_For any_ set of `compliance_uploads` and `compliance_items` rows, the response from `GET /category-trend` SHALL contain exactly one entry per unique `(report_date, category)` pair, and each entry's `count` SHALL equal the total number of `compliance_items` for that category across every upload sharing that `report_date`.
|
||||
|
||||
**Validates: Requirements 2.6, 2.7**
|
||||
|
||||
Property 4: Bug Condition — `/summary` does not silently drop sibling uploads
|
||||
|
||||
_For any_ set of `compliance_uploads` rows where two or more rows share the latest `report_date`, the response from `GET /summary` SHALL either (a) include a merged view of all sibling uploads' `entries` and `overall_scores`, or (b) include a non-empty `multi_vertical_uploads` field listing the IDs and verticals of the other uploads for that date that were not used to populate the response. The response SHALL NOT silently drop sibling uploads.
|
||||
|
||||
**Validates: Requirements 2.8, 2.9**
|
||||
|
||||
Property 5: Bug Condition — `persistUpload()` snapshot reflects only the snapshotted vertical
|
||||
|
||||
_For any_ `persistUpload()` invocation with a non-NULL `vertical`, the rows written into `compliance_snapshots` for the current month SHALL have `total_devices`, `compliant`, and `non_compliant` values equal to the counts derived from `compliance_items` filtered to the snapshotted vertical only. No item from another vertical SHALL contribute to those counts.
|
||||
|
||||
**Validates: Requirements 2.10, 2.11**
|
||||
|
||||
Property 6: Preservation — Per-endpoint cross-date sums equal source-data totals
|
||||
|
||||
_For any_ set of uploads, summing `new_count` (and likewise `recurring_count`, `resolved_count`) across every entry in `GET /trends` SHALL equal the corresponding `SUM(new_count)` over `compliance_uploads`. Similarly, summing `count` across every entry in `GET /category-trend` SHALL equal `COUNT(*)` of `compliance_items` joined to `compliance_uploads`. This holds whether or not any date has duplicate uploads.
|
||||
|
||||
**Validates: Requirements 3.1, 3.2**
|
||||
|
||||
Property 7: Preservation — Single-upload-per-date dates are unchanged
|
||||
|
||||
_For any_ set of `compliance_uploads` where every `report_date` has exactly one row, the responses from `/trends`, `/top-recurring`, `/category-trend`, and `/summary` (and the `compliance_snapshots` rows written by `persistUpload()`) SHALL be identical to the pre-fix output for the same input. The fix SHALL NOT change behavior on the single-upload-per-date case.
|
||||
|
||||
**Validates: Requirements 3.1, 3.4, 3.5, 3.6, 3.8**
|
||||
|
||||
Property 8: Preservation — Empty-data and error-path responses are unchanged
|
||||
|
||||
_For any_ empty dataset (no uploads, no matching items, no items in a category), each affected endpoint SHALL return the same empty-state response shape as before the fix. `/summary` with a non-`ALLOWED_TEAMS` `team` parameter SHALL still respond `400`. `persistUpload()` snapshot errors SHALL still be caught and logged without failing the upload commit.
|
||||
|
||||
**Validates: Requirements 3.3, 3.7, 3.9, 3.10**
|
||||
|
||||
## Fix Implementation
|
||||
|
||||
### Changes Required
|
||||
|
||||
All changes are in `backend/routes/compliance.js`. No schema migration, no new column, no frontend change.
|
||||
|
||||
#### Fix 1: `GET /trends` — aggregate uploads and team counts by `report_date`
|
||||
|
||||
**Function**: `router.get('/trends', ...)` (around line 768)
|
||||
|
||||
**Specific Changes**:
|
||||
1. Replace the `compliance_uploads` query so it groups by `report_date` and sums the count columns:
|
||||
```sql
|
||||
SELECT report_date,
|
||||
SUM(COALESCE(new_count, 0))::int AS new_count,
|
||||
SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
|
||||
SUM(COALESCE(resolved_count, 0))::int AS resolved_count,
|
||||
SUM(COALESCE(new_count, 0) + COALESCE(recurring_count, 0))::int AS total_active
|
||||
FROM compliance_uploads
|
||||
WHERE report_date IS NOT NULL
|
||||
GROUP BY report_date
|
||||
ORDER BY report_date ASC
|
||||
```
|
||||
2. Replace the per-team `compliance_items` query so it joins to `compliance_uploads` and groups by `(report_date, team)` instead of `(upload_id, team)`:
|
||||
```sql
|
||||
SELECT cu.report_date, ci.team, COUNT(ci.id)::int AS count
|
||||
FROM compliance_items ci
|
||||
JOIN compliance_uploads cu ON ci.upload_id = cu.id
|
||||
WHERE ci.team IS NOT NULL AND cu.report_date IS NOT NULL
|
||||
GROUP BY cu.report_date, ci.team
|
||||
```
|
||||
3. Change the `teamMap` keyed lookup from `teamMap[u.id]` to `teamMap[u.report_date]` and rebuild `trends` from the per-date upload rows.
|
||||
|
||||
#### Fix 2: `GET /top-recurring` — aggregate uploads by `report_date` before passing to `computeWaterfall()`
|
||||
|
||||
**Function**: `router.get('/top-recurring', ...)` (around line 818)
|
||||
|
||||
**Specific Changes**:
|
||||
1. Replace the query with the same `GROUP BY report_date` pattern used in `/trends` (without `id`, since `computeWaterfall()` only needs `report_date`, `new_count`, `recurring_count`, `resolved_count`):
|
||||
```sql
|
||||
SELECT report_date,
|
||||
SUM(COALESCE(new_count, 0))::int AS new_count,
|
||||
SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
|
||||
SUM(COALESCE(resolved_count, 0))::int AS resolved_count
|
||||
FROM compliance_uploads
|
||||
WHERE report_date IS NOT NULL
|
||||
GROUP BY report_date
|
||||
ORDER BY report_date ASC
|
||||
```
|
||||
2. `computeWaterfall()` itself does not change — it already advances `start` correctly when fed one row per date. The fix is purely in the SQL.
|
||||
|
||||
#### Fix 3: `GET /category-trend` — drop `cu.id` from `GROUP BY`
|
||||
|
||||
**Function**: `router.get('/category-trend', ...)` (around line 838)
|
||||
|
||||
**Specific Changes**:
|
||||
1. Remove `cu.id` from the `GROUP BY` clause so the grouping is by `(report_date, category)` only:
|
||||
```sql
|
||||
SELECT cu.report_date,
|
||||
COALESCE(ci.category, 'Unknown') AS category,
|
||||
COUNT(ci.id)::int AS count
|
||||
FROM compliance_uploads cu
|
||||
JOIN compliance_items ci ON ci.upload_id = cu.id
|
||||
WHERE cu.report_date IS NOT NULL
|
||||
GROUP BY cu.report_date, COALESCE(ci.category, 'Unknown')
|
||||
ORDER BY cu.report_date ASC, category ASC
|
||||
```
|
||||
2. The response shape (`{ categoryTrend: Array<{ report_date, category, count }> }`) does not change. Only the row count for multi-vertical dates changes (collapsing duplicates into sums).
|
||||
|
||||
#### Fix 4: `GET /summary` — disclose sibling uploads for the latest date
|
||||
|
||||
**Function**: `router.get('/summary', ...)` (around line 495)
|
||||
|
||||
**Specific Changes**:
|
||||
1. Keep the existing `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for choosing the primary upload's `summary_json` (this preserves the legacy single-upload behavior).
|
||||
2. After resolving `latestUpload`, run a second query to find sibling uploads sharing the same `report_date`:
|
||||
```sql
|
||||
SELECT id, vertical, uploaded_at
|
||||
FROM compliance_uploads
|
||||
WHERE report_date = $1 AND id != $2
|
||||
ORDER BY id ASC
|
||||
```
|
||||
3. Add `multi_vertical_uploads` to the response when sibling uploads exist:
|
||||
```javascript
|
||||
res.json({
|
||||
entries,
|
||||
overall_scores: summary.overall_scores || {},
|
||||
upload: { id, report_date, uploaded_at },
|
||||
multi_vertical_uploads: siblings.map(s => ({ id: s.id, vertical: s.vertical, uploaded_at: s.uploaded_at })),
|
||||
});
|
||||
```
|
||||
4. When no sibling uploads exist (single-upload-per-date case), `multi_vertical_uploads` is `[]` (or omitted — see open question in test plan).
|
||||
|
||||
This is the conservative option (b) from requirement 2.8 — return a documented selection plus metadata about siblings — rather than option (a) full server-side merge. Option (b) is chosen because (i) the `summary_json` schema is per-vertical and merging would require reconciliation logic that doesn't currently exist, and (ii) the existing fallback selection (NTS_AEO) is the established representative for the legacy AEO chart on the Compliance page.
|
||||
|
||||
#### Fix 5: `persistUpload()` snapshot block — filter and group by `vertical`
|
||||
|
||||
**Function**: `persistUpload()` (lines 81–192), specifically the `verticalStats` query at line 157
|
||||
|
||||
**Specific Changes**:
|
||||
1. Determine the upload's `vertical` (read it from the upload row immediately after the `RETURNING id` insert, or accept it as a parameter to `persistUpload()`).
|
||||
2. Replace the `verticalStats` query with one that filters by the upload's `vertical` and groups by `(vertical, team)`:
|
||||
```sql
|
||||
SELECT vertical, team,
|
||||
COUNT(DISTINCT hostname)::int AS total_devices,
|
||||
COUNT(DISTINCT CASE WHEN status = 'resolved' THEN hostname END)::int AS compliant,
|
||||
COUNT(DISTINCT CASE WHEN status = 'active' THEN hostname END)::int AS non_compliant
|
||||
FROM compliance_items
|
||||
WHERE team IS NOT NULL AND vertical IS NOT DISTINCT FROM $1
|
||||
GROUP BY vertical, team
|
||||
```
|
||||
(`IS NOT DISTINCT FROM` handles the legacy `vertical IS NULL` case correctly, so AEO-only uploads keep their previous semantics.)
|
||||
3. The `INSERT ... ON CONFLICT (snapshot_month, vertical) DO UPDATE` already keys snapshots by `vertical`, so no change is required there. However, the `vertical` value passed in must come from the query result, not from `team AS vertical` (which conflates the team and vertical concepts).
|
||||
4. If the per-snapshot-row "vertical" identity needs to remain `team` for back-compat reasons, leave the `INSERT` mapping unchanged but ensure the underlying counts are filtered to the upload's actual `vertical`. Confirm via inspection of `compliance_snapshots` consumers (`/vcl/stats`) before finalising.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Validation Approach
|
||||
|
||||
The bug condition is straightforward to construct: insert two `compliance_uploads` rows with the same `report_date` and matching `compliance_items`, then call each affected endpoint. The two-phase approach is to first run the tests against the unfixed code to confirm the duplication/silent-drop counterexamples, then run the same tests against the fixed code and add property-based tests that explore the input space more broadly.
|
||||
|
||||
### Exploratory Bug Condition Checking
|
||||
|
||||
**Goal**: Surface counterexamples that demonstrate each of the five manifestations BEFORE implementing the fix. Confirm or refute the root cause analysis for each endpoint independently — they share a structural cause but the SQL details differ.
|
||||
|
||||
**Test Plan**: Seed a clean test database with a fixture representing the original GitLab #12 scenario (three uploads for `2025-05-11`, one each for NTS_AEO, SDIT_CISO, TSI, with realistic `compliance_items`). Call each affected endpoint and assert the buggy invariants. Run on UNFIXED code first.
|
||||
|
||||
**Test Cases**:
|
||||
|
||||
1. **`/trends` Duplicate Date Test** — Insert three uploads for `2025-05-11` (verticals NTS_AEO, SDIT_CISO, TSI), each with distinct `new_count`/`recurring_count`/`resolved_count` and matching `compliance_items` per team. Call `GET /trends`. Assert `response.trends.filter(t => t.report_date === '2025-05-11').length === 1`. (will fail on unfixed code — returns 3)
|
||||
|
||||
2. **`/top-recurring` Duplicate Bar Test** — Same fixture. Call `GET /top-recurring`. Assert `response.waterfall.filter(w => w.date === '2025-05-11').length === 1` AND assert the running invariant `waterfall[i].end === waterfall[i].start + waterfall[i].new_count + waterfall[i].recurring_count - waterfall[i].resolved_count` holds for every `i`. (will fail on unfixed code — returns 3 bars and the running totals reflect mid-row state, not date-level aggregate)
|
||||
|
||||
3. **`/category-trend` Duplicate (date, category) Test** — Same fixture, plus items tagged with two categories (e.g., "Patching" and "Configuration"). Call `GET /category-trend`. Assert that for each `(report_date, category)` pair, `response.categoryTrend.filter(c => c.report_date === '2025-05-11' && c.category === 'Patching').length === 1`. (will fail on unfixed code — returns 3 rows per category)
|
||||
|
||||
4. **`/summary` Sibling Disclosure Test** — Same fixture (three uploads for `2025-05-11`, latest date). Call `GET /summary`. Assert either (a) the response merges `entries` from all three uploads, or (b) `response.multi_vertical_uploads.length === 2`. (will fail on unfixed code — silently picks one upload, the other two are dropped without any indication)
|
||||
|
||||
5. **`persistUpload()` Cross-Vertical Contamination Test** — Pre-populate `compliance_items` with rows from multiple verticals (e.g., NTS_AEO has 100 active items, SDIT_CISO has 50 active items). Call `persistUpload()` with a fresh SDIT_CISO upload. Read back the `compliance_snapshots` row for the current month and SDIT_CISO. Assert `total_devices` reflects only SDIT_CISO items, not the combined 150. (will fail on unfixed code — total includes both verticals)
|
||||
|
||||
6. **Edge Case — Single-Upload-Per-Date Regression Test** — Insert a fixture with a single upload per date for three dates. Call all four read endpoints and capture responses. Apply the fix, re-run, and assert response equality (byte-for-byte). (should pass on unfixed code; will pass on fixed code; protects the preservation property)
|
||||
|
||||
**Expected Counterexamples**:
|
||||
- `/trends` returns N trend entries for a date with N uploads (N > 1). Cause: per-row `.map()` over uploads instead of date-level aggregation.
|
||||
- `/top-recurring` returns N waterfall bars for a date with N uploads. Cause: same per-row pattern, plus `computeWaterfall()` carries `start` forward across the duplicate-date rows.
|
||||
- `/category-trend` returns N × |categories| rows for a date with N uploads. Cause: `cu.id` is in the `GROUP BY` clause.
|
||||
- `/summary` returns one upload's `summary_json` and silently drops siblings. Cause: `ORDER BY id DESC LIMIT 1` with no `report_date`-tie handling.
|
||||
- `persistUpload()` writes inflated `total_devices`. Cause: missing `WHERE vertical = $1` and `GROUP BY vertical, team` in the snapshot query.
|
||||
|
||||
### Fix Checking
|
||||
|
||||
**Goal**: Verify that for all inputs where the bug condition holds (any `report_date` shared by two or more uploads), each fixed endpoint produces the expected aggregated/disclosed result.
|
||||
|
||||
**Pseudocode:**
|
||||
```
|
||||
FOR ALL (uploads, items) WHERE EXISTS report_date d WITH COUNT(uploads WHERE report_date = d) > 1 DO
|
||||
trends_response := GET_trends_fixed(uploads, items)
|
||||
waterfall_response := GET_top_recurring_fixed(uploads, items)
|
||||
cattrend_response := GET_category_trend_fixed(uploads, items)
|
||||
summary_response := GET_summary_fixed(uploads, items)
|
||||
snapshot_rows := persistUpload_fixed(new_upload_for_some_vertical, items)
|
||||
|
||||
ASSERT one_entry_per_date(trends_response.trends)
|
||||
ASSERT one_entry_per_date(waterfall_response.waterfall) AND running_invariant_holds(waterfall_response.waterfall)
|
||||
ASSERT one_entry_per_date_category_pair(cattrend_response.categoryTrend)
|
||||
ASSERT siblings_disclosed(summary_response, uploads)
|
||||
ASSERT snapshots_filtered_to_vertical(snapshot_rows, new_upload.vertical, items)
|
||||
END FOR
|
||||
```
|
||||
|
||||
### Preservation Checking
|
||||
|
||||
**Goal**: Verify that for all inputs where the bug condition does NOT hold (every `report_date` has exactly one upload row), the fixed endpoints produce results identical to the original endpoints.
|
||||
|
||||
**Pseudocode:**
|
||||
```
|
||||
FOR ALL (uploads, items) WHERE FORALL report_date d, COUNT(uploads WHERE report_date = d) <= 1 DO
|
||||
ASSERT GET_trends_original(uploads, items) = GET_trends_fixed(uploads, items)
|
||||
ASSERT GET_top_recurring_original(uploads, items) = GET_top_recurring_fixed(uploads, items)
|
||||
ASSERT GET_category_trend_original(uploads, items) = GET_category_trend_fixed(uploads, items)
|
||||
ASSERT GET_summary_original(uploads, items) = GET_summary_fixed(uploads, items)
|
||||
ASSERT persistUpload_original(upload, items).snapshots = persistUpload_fixed(upload, items).snapshots
|
||||
END FOR
|
||||
```
|
||||
|
||||
**Testing Approach**: Property-based testing is the right fit for preservation checking here:
|
||||
- The single-upload-per-date input space is large (any number of dates, any combination of counts, any team distribution, any category mix, any vertical), and exhaustive enumeration is impractical.
|
||||
- The preservation property is a strict equality, which is well-suited to PBT shrinking (any counterexample is a small fixture demonstrating a behavior change).
|
||||
- The legacy AEO-only data shape (`vertical IS NULL`) must be exercised, which falls naturally out of generators that include null verticals.
|
||||
|
||||
**Test Plan**: Capture responses from the unfixed code on single-upload-per-date fixtures (snapshot tests). After applying the fix, re-run the same fixtures and assert equality. Then run a property-based generator that produces random single-upload-per-date scenarios and asserts the same equality.
|
||||
|
||||
**Test Cases**:
|
||||
1. **Snapshot Equality — Empty State** — Empty `compliance_uploads`. All four endpoints return their documented empty-state shapes. Snapshot-test before and after the fix.
|
||||
2. **Snapshot Equality — Single AEO-Only Upload** — One upload with `vertical IS NULL`, classic legacy fixture. Capture pre-fix responses, apply fix, assert equality.
|
||||
3. **Snapshot Equality — Multiple Single-Upload Dates** — Five dates, one upload each, varied `vertical` values. Capture pre-fix responses, apply fix, assert equality.
|
||||
4. **`/summary` Team Filter Preservation** — Latest upload exists, `?team=STEAM` parameter is supplied. Assert `entries` is filtered to `team === 'STEAM'` rows. Assert non-`ALLOWED_TEAMS` value (e.g., `?team=OTHER`) returns HTTP 400.
|
||||
5. **`persistUpload()` Snapshot Equality — Single-Vertical Month** — Pre-populate `compliance_items` with rows from a single vertical only. Run `persistUpload()` for that vertical. Assert the resulting `compliance_snapshots` rows are identical pre-fix and post-fix.
|
||||
6. **Error Path Preservation** — Force a snapshot query failure (e.g., transient DB error). Assert `persistUpload()` still commits the upload and the error is logged but not surfaced to the caller.
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- `/trends` aggregation: two uploads sharing a `report_date`, one upload alone for an earlier date. Assert response has 2 entries and `new_count` for the shared date equals the sum of the two uploads.
|
||||
- `/top-recurring` aggregation and running totals: same fixture as above. Assert 2 waterfall entries and the running `start`/`end` invariant.
|
||||
- `/category-trend` aggregation: two uploads sharing a `report_date`, items tagged with two categories. Assert one row per `(date, category)` pair with summed counts.
|
||||
- `/summary` sibling disclosure: three uploads sharing the latest date. Assert response shape matches the chosen disclosure approach (option (b)).
|
||||
- `/summary` team filter: same upload, with and without `?team=STEAM`.
|
||||
- `persistUpload()` per-vertical snapshot: items in two verticals, run upload for one, assert snapshots for that vertical do not include the other vertical's items.
|
||||
- `persistUpload()` legacy AEO-only path (`vertical IS NULL`): unchanged behavior.
|
||||
|
||||
### Property-Based Tests
|
||||
|
||||
- **`/trends` aggregation property** — Generate a random list of `(report_date, new_count, recurring_count, resolved_count)` upload tuples (with possible date collisions). Generate matching per-team item counts. Assert the response has exactly one entry per unique `report_date` AND for each entry, `new_count` equals the SUM of input `new_count`s for that date (likewise the other count fields and per-team counts).
|
||||
- **`/top-recurring` running invariant property** — Same generator. Assert the response has one bar per unique `report_date` AND for every adjacent pair of entries, `entry[i].start === entry[i-1].end`, AND `entry[i].end === entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count`.
|
||||
- **`/category-trend` total-conservation property** — Generate a random set of `compliance_items` and uploads. Assert `SUM(response.categoryTrend.map(c => c.count)) === total number of compliance_items joined to non-null-report_date uploads`. This holds whether or not any date has multiple uploads.
|
||||
- **`/summary` sibling-disclosure property** — Generate a random set of uploads with possible duplicate `report_dates`. Pick the latest date. Assert that if any sibling upload exists for that date, the response contains a non-empty `multi_vertical_uploads` array referencing every sibling upload's id.
|
||||
- **`persistUpload()` vertical-isolation property** — Generate two non-empty disjoint sets of `compliance_items`, one per vertical. Insert both. Run `persistUpload()` for vertical A. Assert the resulting `compliance_snapshots` rows for vertical A reflect only set-A items (count of distinct hostnames matches).
|
||||
- **Cross-endpoint preservation property** — Generate any fixture where every `report_date` has exactly one upload row. Assert all five fixed endpoints produce byte-for-byte identical results to the original endpoints.
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- Full upload-to-chart flow: upload three xlsx files (one per vertical) with the same `report_date` via `POST /preview` + `POST /commit`, then call `/trends`, `/top-recurring`, `/category-trend`, `/summary` and verify all four return the expected aggregated/disclosed results.
|
||||
- Compliance Charts panel render: load `ComplianceChartsPanel.js` with a multi-vertical-day fixture and assert (via DOM snapshot) the x-axis shows each date exactly once on `Active Findings Over Time` and `Change per Report Cycle`.
|
||||
- Snapshot consumer regression: after running `persistUpload()` with the fix, call `/vcl/stats` (which reads `compliance_snapshots`) and verify per-vertical `compliance_pct` is unchanged from the pre-fix value when only one vertical's items are present, and is corrected when multiple verticals are present.
|
||||
|
||||
### Test Fixtures Required
|
||||
|
||||
The following fixtures are needed and can be reused across all five endpoints' tests:
|
||||
|
||||
1. **`fixture_empty`** — No `compliance_uploads`, no `compliance_items`. Used by the empty-state preservation tests.
|
||||
|
||||
2. **`fixture_single_upload_aeo_legacy`** — One `compliance_uploads` row with `vertical IS NULL`, `report_date = '2025-04-01'`, with ~20 `compliance_items` distributed across the four teams. Used by the legacy-path preservation tests.
|
||||
|
||||
3. **`fixture_single_upload_per_date`** — Five `compliance_uploads` rows, each with a distinct `report_date` (`2025-04-01` through `2025-05-01`), each with a distinct `vertical` value among `{NTS_AEO, SDIT_CISO, TSI, NULL, NTS_AEO}`. Used by the broader preservation tests and by `/category-trend` total-conservation.
|
||||
|
||||
4. **`fixture_multi_vertical_single_date`** — Three `compliance_uploads` rows all with `report_date = '2025-05-11'`, verticals NTS_AEO/SDIT_CISO/TSI, each with distinct `new_count`/`recurring_count`/`resolved_count` and 5–10 `compliance_items` per upload spanning multiple teams and categories. This is the canonical bug-condition fixture and reproduces the original GitLab #12 scenario.
|
||||
|
||||
5. **`fixture_mixed_history`** — Combination of `fixture_single_upload_per_date` and `fixture_multi_vertical_single_date` — multiple dates, some with single uploads, some with two or three. Used by the property-based tests as a realistic state-of-the-world fixture.
|
||||
|
||||
6. **`fixture_cross_vertical_items`** — Two non-empty disjoint sets of `compliance_items`, one tagged `vertical = 'NTS_AEO'` and one tagged `vertical = 'SDIT_CISO'`, sharing some hostnames between verticals to ensure the count-distinct logic is exercised. Used by the `persistUpload()` vertical-isolation tests.
|
||||
|
||||
7. **`fixture_pbt_generators`** — fast-check (or equivalent) arbitraries:
|
||||
- `arbReportDate`: ISO date string in a bounded range (e.g., last 90 days).
|
||||
- `arbVertical`: oneof `'NTS_AEO' | 'SDIT_CISO' | 'TSI' | null`.
|
||||
- `arbUpload`: `{ report_date, vertical, new_count, recurring_count, resolved_count }` with non-negative integer counts.
|
||||
- `arbItem`: `{ hostname, team in ALLOWED_TEAMS, category in {Patching, Configuration, Vulnerability, Other}, vertical, status in {active, resolved} }`.
|
||||
- `arbScenario`: `{ uploads: arbUpload[], items: arbItem[] }`, where items reference uploads via `upload_id` and dates can collide.
|
||||
179
.kiro/specs/compliance-duplicate-chart-entries/tasks.md
Normal file
179
.kiro/specs/compliance-duplicate-chart-entries/tasks.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Implementation Plan
|
||||
|
||||
## Overview
|
||||
|
||||
Five compliance code paths share the root cause "key by `compliance_uploads.id` instead of by `compliance_uploads.report_date`": `GET /trends`, `GET /top-recurring`, `GET /category-trend`, `GET /summary`, and the `compliance_snapshots` block inside `persistUpload()`. All fixes are contained to `backend/routes/compliance.js` and require no schema migration, no new column, and no frontend change.
|
||||
|
||||
The plan follows the bugfix workflow's exploratory methodology: a single property-based test file (`backend/__tests__/compliance-duplicate-chart-entries.property.test.js`) is written before any fix, with one test case per affected site demonstrating the bug condition, plus preservation cases observed on the unfixed code. Each fix is then implemented as its own task and verified by re-running the matching test case from the exploration suite. The plan ends with a regression checkpoint that re-runs the full preservation suite and the backend test suite.
|
||||
|
||||
## Task Dependency Graph
|
||||
|
||||
```json
|
||||
{
|
||||
"waves": [
|
||||
{ "id": 0, "tasks": ["1", "2"] },
|
||||
{ "id": 1, "tasks": ["3.1", "4.1", "5.1", "6.1", "7.1"] },
|
||||
{ "id": 2, "tasks": ["3.2", "4.2", "5.2", "6.2", "7.2"] },
|
||||
{ "id": 3, "tasks": ["8"] },
|
||||
{ "id": 4, "tasks": ["9"] }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Wave 0 establishes the test baseline: task 1 documents Property 1 (bug condition) failures on the unfixed code and task 2 captures Property 2 (preservation) baseline outputs. Wave 1 implements the five independent fixes in `backend/routes/compliance.js` (no inter-fix dependencies — each touches a different SQL statement). Wave 2 verifies each fix's slice of Property 1 now passes. Wave 3 re-runs the full Property 2 suite to confirm no regressions across the five sites. Wave 4 is the final checkpoint that runs the entire backend test suite.
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1. Write bug condition exploration property test
|
||||
- **Property 1: Bug Condition** - Multi-Vertical Date Aggregation Across Five Compliance Sites
|
||||
- **CRITICAL**: This test MUST FAIL on unfixed code — failure confirms the bug exists across all five sites
|
||||
- **DO NOT attempt to fix the test or the code when it fails**
|
||||
- **NOTE**: This test encodes the expected behavior — it will validate the fixes when it passes after implementation
|
||||
- **GOAL**: Surface counterexamples that demonstrate the bug exists for each of the five affected code paths (`/trends`, `/top-recurring`, `/category-trend`, `/summary`, `persistUpload()` snapshot block)
|
||||
- **Scoped PBT Approach**: Scope the property to the canonical bug-condition fixture (`fixture_multi_vertical_single_date` from design.md) — three uploads for `2025-05-11`, one each for `NTS_AEO`, `SDIT_CISO`, `TSI`, with distinct counts and matching items per upload — this reproduces the original GitLab #12 scenario deterministically
|
||||
- Bug Condition (from design.md): `EXISTS report_date d WHERE COUNT(compliance_uploads WHERE report_date = d) > 1`
|
||||
- Create `backend/__tests__/compliance-duplicate-chart-entries.property.test.js` using `fast-check` and the existing pg pool mock pattern from `vcl-compliance-reporting.property.test.js`
|
||||
- Test case 1.A — `/trends` duplicate-date counterexample: seed three uploads for `2025-05-11` (verticals NTS_AEO/SDIT_CISO/TSI), call `GET /trends`, assert `response.trends.filter(t => t.report_date === '2025-05-11').length === 1` AND `new_count` for that date equals the sum of the three uploads' `new_count` values (likewise `recurring_count`, `resolved_count`, `total_active`, and per-team counts)
|
||||
- Test case 1.B — `/top-recurring` duplicate-bar counterexample: same fixture, call `GET /top-recurring`, assert exactly one waterfall entry per unique `report_date` AND the running invariant `entry[i].end === entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count` holds for every `i` AND `entry[i].start === entry[i-1].end` for adjacent entries (with `entry[0].start === 0`)
|
||||
- Test case 1.C — `/category-trend` duplicate (date, category) counterexample: same fixture plus items tagged with two categories (`Patching` and `Configuration`), call `GET /category-trend`, assert `response.categoryTrend.filter(c => c.report_date === '2025-05-11' && c.category === 'Patching').length === 1` AND each entry's `count` equals the total `compliance_items` for that category across every upload sharing the date
|
||||
- Test case 1.D — `/summary` sibling-disclosure counterexample: same fixture (`2025-05-11` is the latest date), call `GET /summary`, assert either (a) `entries` is the merged view of all three uploads OR (b) `response.multi_vertical_uploads` is a non-empty array with `length === 2` listing the other two uploads' ids and verticals
|
||||
- Test case 1.E — `persistUpload()` cross-vertical contamination counterexample: pre-populate `compliance_items` with disjoint sets for two verticals (e.g., NTS_AEO has 100 active items, SDIT_CISO has 50 active items), call `persistUpload()` for a fresh SDIT_CISO upload, read back the `compliance_snapshots` row for the current month with `vertical = 'SDIT_CISO'`, assert `total_devices` reflects only SDIT_CISO items and is not inflated by NTS_AEO items
|
||||
- Wrap each test case in fast-check `fc.assert` against `arbScenario` from design.md (uploads with possibly colliding `report_date`, items referencing those uploads) so PBT also exercises larger random fixtures beyond the canonical 3-upload case
|
||||
- Add fixture builders in the test file matching the design's `fixture_multi_vertical_single_date` and `fixture_cross_vertical_items`
|
||||
- Run test on UNFIXED code: `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js`
|
||||
- **EXPECTED OUTCOME**: All five test cases FAIL (this is correct — it proves each manifestation of the bug exists)
|
||||
- Document the counterexamples found (e.g., `/trends` returns 3 entries for `2025-05-11` instead of 1, `/summary` returns one upload's `summary_json` and silently drops the other two, `compliance_snapshots.total_devices` for SDIT_CISO equals 150 instead of 50)
|
||||
- Mark task complete when test is written, run, and the failures for all five test cases are documented
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 1.10, 1.11_
|
||||
|
||||
- [x] 2. Write preservation property tests (BEFORE implementing fix)
|
||||
- **Property 2: Preservation** - Single-Upload-Per-Date Behavior Unchanged Across Five Sites
|
||||
- **IMPORTANT**: Follow observation-first methodology — capture pre-fix outputs and assert equality post-fix
|
||||
- Bug Condition negation (from design.md): `FORALL report_date d, COUNT(compliance_uploads WHERE report_date = d) <= 1`
|
||||
- Add preservation test cases to `backend/__tests__/compliance-duplicate-chart-entries.property.test.js`
|
||||
- Observe baseline behavior on UNFIXED code using the design's preservation fixtures and capture exact response bodies (snapshot-test style)
|
||||
- Test case 2.A — Empty-state preservation (`fixture_empty`): no `compliance_uploads`, no `compliance_items`. Observe `GET /trends` returns `{ trends: [] }`, `GET /top-recurring` returns `{ waterfall: [] }`, `GET /category-trend` returns `{ categoryTrend: [] }`, `GET /summary` returns `{ entries: [], overall_scores: {}, upload: null }`. Capture and assert these exact shapes
|
||||
- Test case 2.B — Single AEO-legacy-upload preservation (`fixture_single_upload_aeo_legacy`): one upload with `vertical IS NULL`, `report_date = '2025-04-01'`, ~20 items across the four teams. Observe responses from all four read endpoints, capture them, and assert byte-for-byte equality
|
||||
- Test case 2.C — Multiple single-upload-per-date preservation (`fixture_single_upload_per_date`): five uploads on five distinct dates with varied `vertical` values. Observe responses from all four read endpoints and assert equality
|
||||
- Test case 2.D — `/summary` `team` query parameter preservation: with the latest upload present, assert `?team=STEAM` filters `entries` server-side AND `?team=OTHER` (non-`ALLOWED_TEAMS`) returns HTTP 400. Capture both responses
|
||||
- Test case 2.E — `persistUpload()` single-vertical-month preservation (`fixture_cross_vertical_items` reduced to one vertical): pre-populate `compliance_items` with rows from a single vertical only, run `persistUpload()` for that vertical, capture the resulting `compliance_snapshots` rows
|
||||
- Test case 2.F — `persistUpload()` snapshot error-path preservation: force a snapshot query failure (mock `pool.query` to reject on the snapshot statement only), assert the upload still commits and the error is logged but not surfaced (HTTP 200/201, no error response)
|
||||
- Property-based extension — Cross-endpoint preservation: use fast-check `arbScenario` constrained to scenarios where every `report_date` has exactly one upload row. Assert that for every generated scenario, all four endpoint responses on UNFIXED code match the captured-baseline shape and field-level equality holds (this generator covers the design's `fixture_pbt_generators.arbScenario` restricted to the non-bug-condition input space)
|
||||
- Run tests on UNFIXED code: `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js`
|
||||
- **EXPECTED OUTCOME**: All preservation test cases PASS (this confirms the baseline behavior to preserve)
|
||||
- Mark task complete when tests are written, run, and passing on unfixed code
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10_
|
||||
|
||||
- [x] 3. Fix `/trends` — aggregate uploads and team counts by `report_date`
|
||||
|
||||
- [x] 3.1 Rewrite the `/trends` upload and team queries to group by `report_date`
|
||||
- In `backend/routes/compliance.js` `router.get('/trends', ...)` (around line 768), replace the `compliance_uploads` query with the `GROUP BY report_date` SQL from design.md Fix 1, summing `new_count`, `recurring_count`, `resolved_count`, and `(new_count + recurring_count) AS total_active`
|
||||
- Replace the per-team `compliance_items` query with the `JOIN compliance_uploads` + `GROUP BY cu.report_date, ci.team` form from design.md Fix 1
|
||||
- Change the `teamMap` keyed lookup from `teamMap[u.id]` to `teamMap[u.report_date]` and rebuild `trends` from the per-date upload rows
|
||||
- _Bug_Condition: isBugCondition(uploads) where two or more compliance_uploads rows share a `report_date`_
|
||||
- _Expected_Behavior: GET /trends returns one entry per unique report_date with summed count fields and aggregated per-team counts (Property 1 from design)_
|
||||
- _Preservation: Single-upload-per-date dates produce identical responses; empty-data response remains `{ trends: [] }`; chart components require no changes_
|
||||
- _Requirements: 2.1, 2.2, 2.3, 3.1, 3.2, 3.3_
|
||||
|
||||
- [x] 3.2 Verify the `/trends` portion of bug condition exploration test now passes
|
||||
- **Property 1: Expected Behavior** - `/trends` Returns One Entry Per Unique report_date
|
||||
- **IMPORTANT**: Re-run the SAME test case 1.A from task 1 — do NOT write a new test
|
||||
- Run `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js -t "/trends"`
|
||||
- **EXPECTED OUTCOME**: Test case 1.A PASSES (confirms `/trends` bug is fixed)
|
||||
- _Requirements: Property 1 (Validates 2.1, 2.2, 2.3) from design_
|
||||
|
||||
- [x] 4. Fix `/top-recurring` — aggregate uploads by `report_date` before passing to `computeWaterfall()`
|
||||
|
||||
- [x] 4.1 Rewrite the `/top-recurring` upload query to group by `report_date`
|
||||
- In `backend/routes/compliance.js` `router.get('/top-recurring', ...)` (around line 818), replace the query with the `GROUP BY report_date` SQL from design.md Fix 2, summing `new_count`, `recurring_count`, `resolved_count`
|
||||
- Leave `computeWaterfall()` unchanged — it already advances `start` correctly when fed one row per date; the fix is purely in the SQL
|
||||
- _Bug_Condition: isBugCondition(uploads) where two or more compliance_uploads rows share a `report_date`_
|
||||
- _Expected_Behavior: GET /top-recurring returns one waterfall entry per unique report_date with summed deltas; running invariant `entry[i].end === entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count` holds and `entry[i].start === entry[i-1].end` for adjacent entries (Property 2 from design)_
|
||||
- _Preservation: Single-upload-per-date waterfall is unchanged; empty-data response remains `{ waterfall: [] }`_
|
||||
- _Requirements: 2.4, 2.5, 3.4_
|
||||
|
||||
- [x] 4.2 Verify the `/top-recurring` portion of bug condition exploration test now passes
|
||||
- **Property 1: Expected Behavior** - `/top-recurring` Has One Bar Per Unique report_date With Correct Running Totals
|
||||
- **IMPORTANT**: Re-run the SAME test case 1.B from task 1 — do NOT write a new test
|
||||
- Run `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js -t "/top-recurring"`
|
||||
- **EXPECTED OUTCOME**: Test case 1.B PASSES (confirms `/top-recurring` bug is fixed and the running invariant holds)
|
||||
- _Requirements: Property 2 (Validates 2.4, 2.5) from design_
|
||||
|
||||
- [x] 5. Fix `/category-trend` — drop `cu.id` from `GROUP BY`
|
||||
|
||||
- [x] 5.1 Rewrite the `/category-trend` query to group by `(report_date, category)` only
|
||||
- In `backend/routes/compliance.js` `router.get('/category-trend', ...)` (around line 838), replace the query with the SQL from design.md Fix 3 — remove `cu.id` from the `GROUP BY` so grouping is by `(cu.report_date, COALESCE(ci.category, 'Unknown'))` only
|
||||
- Leave the response shape `{ categoryTrend: Array<{ report_date, category, count }> }` unchanged
|
||||
- _Bug_Condition: isBugCondition(uploads) where two or more compliance_uploads rows share a `report_date`_
|
||||
- _Expected_Behavior: GET /category-trend returns one row per unique (report_date, category) pair with `count` equal to the total `compliance_items` for that category across every upload sharing the date (Property 3 from design)_
|
||||
- _Preservation: Single-upload-per-date rows are unchanged; empty-data response remains `{ categoryTrend: [] }`; total-conservation property holds across all dates (Property 6 from design)_
|
||||
- _Requirements: 2.6, 2.7, 3.5_
|
||||
|
||||
- [x] 5.2 Verify the `/category-trend` portion of bug condition exploration test now passes
|
||||
- **Property 1: Expected Behavior** - `/category-trend` Returns One Row Per (date, category)
|
||||
- **IMPORTANT**: Re-run the SAME test case 1.C from task 1 — do NOT write a new test
|
||||
- Run `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js -t "/category-trend"`
|
||||
- **EXPECTED OUTCOME**: Test case 1.C PASSES (confirms `/category-trend` bug is fixed)
|
||||
- _Requirements: Property 3 (Validates 2.6, 2.7) from design_
|
||||
|
||||
- [x] 6. Fix `/summary` — disclose sibling uploads for the latest date
|
||||
|
||||
- [x] 6.1 Add sibling-upload disclosure to the `/summary` response
|
||||
- In `backend/routes/compliance.js` `router.get('/summary', ...)` (around line 495), keep the existing `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for selecting the primary upload's `summary_json` (preserves legacy single-upload behavior per requirement 3.6)
|
||||
- After resolving `latestUpload`, run the second query from design.md Fix 4 to find sibling uploads sharing the same `report_date`: `SELECT id, vertical, uploaded_at FROM compliance_uploads WHERE report_date = $1 AND id != $2 ORDER BY id ASC`
|
||||
- Add `multi_vertical_uploads` to the response, populated with `siblings.map(s => ({ id, vertical, uploaded_at }))`; the field is `[]` when no siblings exist
|
||||
- Do not change the `team` query parameter handling, the `ALLOWED_TEAMS` HTTP 400 response, or the `entries`/`overall_scores`/`upload` shape
|
||||
- _Bug_Condition: isBugCondition(uploads) where two or more compliance_uploads rows share the latest `report_date`_
|
||||
- _Expected_Behavior: GET /summary either merges sibling uploads' entries OR exposes a non-empty `multi_vertical_uploads` array identifying the other uploads for the same `report_date`; sibling uploads are never silently dropped (Property 4 from design)_
|
||||
- _Preservation: Single-upload-per-date `/summary` shape is unchanged (the `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback still runs); `team` query parameter still filters entries and rejects non-`ALLOWED_TEAMS` with HTTP 400; empty-data response remains `{ entries: [], overall_scores: {}, upload: null }`_
|
||||
- _Requirements: 2.8, 2.9, 3.6, 3.7_
|
||||
|
||||
- [x] 6.2 Verify the `/summary` portion of bug condition exploration test now passes
|
||||
- **Property 1: Expected Behavior** - `/summary` Does Not Silently Drop Sibling Uploads
|
||||
- **IMPORTANT**: Re-run the SAME test case 1.D from task 1 — do NOT write a new test
|
||||
- Run `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js -t "/summary"`
|
||||
- **EXPECTED OUTCOME**: Test case 1.D PASSES (confirms `/summary` discloses sibling uploads via `multi_vertical_uploads`)
|
||||
- _Requirements: Property 4 (Validates 2.8, 2.9) from design_
|
||||
|
||||
- [x] 7. Fix `persistUpload()` snapshot block — filter and group by `vertical`
|
||||
|
||||
- [x] 7.1 Rewrite the `verticalStats` query to filter by the upload's `vertical`
|
||||
- In `backend/routes/compliance.js` `persistUpload()` (lines 81–192), at the `verticalStats` query around line 157, capture the upload's `vertical` from the row returned by the `RETURNING id` insert (or accept it as a `persistUpload()` parameter)
|
||||
- Replace the `verticalStats` query with the SQL from design.md Fix 5: filter `WHERE team IS NOT NULL AND vertical IS NOT DISTINCT FROM $1` and group by `(vertical, team)`. The `IS NOT DISTINCT FROM` operator handles the legacy `vertical IS NULL` case so AEO-only uploads keep their previous semantics
|
||||
- Leave the existing `INSERT ... ON CONFLICT (snapshot_month, vertical) DO UPDATE` mapping as-is so `compliance_snapshots` consumers (`/vcl/stats`) continue to read the same column shape; only the underlying counts change
|
||||
- Keep snapshot creation wrapped in the existing `try/catch` so a snapshot failure is logged and does not fail the upload commit
|
||||
- _Bug_Condition: isBugCondition for `persistUpload()` is `compliance_items` containing rows for verticals other than the upload's vertical — the unfiltered query inflates `total_devices`/`compliant`/`non_compliant`_
|
||||
- _Expected_Behavior: compliance_snapshots rows written by persistUpload() have `total_devices`, `compliant`, `non_compliant` derived only from compliance_items rows belonging to the snapshotted vertical; no item from another vertical contributes (Property 5 from design)_
|
||||
- _Preservation: Single-vertical months produce identical snapshot rows; the legacy `vertical IS NULL` AEO-only path is unchanged via `IS NOT DISTINCT FROM`; the snapshot try/catch error path is unchanged_
|
||||
- _Requirements: 2.10, 2.11, 3.8, 3.9_
|
||||
|
||||
- [x] 7.2 Verify the `persistUpload()` portion of bug condition exploration test now passes
|
||||
- **Property 1: Expected Behavior** - persistUpload() Snapshot Reflects Only the Snapshotted Vertical
|
||||
- **IMPORTANT**: Re-run the SAME test case 1.E from task 1 — do NOT write a new test
|
||||
- Run `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js -t "persistUpload"`
|
||||
- **EXPECTED OUTCOME**: Test case 1.E PASSES (confirms snapshot rows are filtered to the snapshotted vertical only)
|
||||
- _Requirements: Property 5 (Validates 2.10, 2.11) from design_
|
||||
|
||||
- [x] 8. Verify preservation tests still pass after all five fixes
|
||||
- **Property 2: Preservation** - Single-Upload-Per-Date Behavior Unchanged
|
||||
- **IMPORTANT**: Re-run the SAME tests from task 2 — do NOT write new tests
|
||||
- Run preservation property tests from task 2: `npm run test:backend -- compliance-duplicate-chart-entries.property.test.js -t "Preservation"`
|
||||
- Confirm all six preservation test cases (2.A–2.F) and the cross-endpoint property-based preservation extension still pass
|
||||
- **EXPECTED OUTCOME**: Tests PASS (confirms no regressions across the four read endpoints, the `/summary` `team` filter, the `persistUpload()` single-vertical-month path, and the snapshot error-path behavior)
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10_
|
||||
|
||||
- [x] 9. Checkpoint — Ensure all tests pass
|
||||
- Run the full backend test suite: `npm run test:backend`
|
||||
- Confirm `compliance-duplicate-chart-entries.property.test.js` passes end-to-end (both Property 1 expected-behavior cases and Property 2 preservation cases)
|
||||
- Confirm pre-existing tests (`vcl-compliance-reporting.property.test.js`, `vcl-aggregated-burndown.property.test.js`, `vcl-aggregated-burndown.test.js`, `vcl-compliance-reporting.test.js`, `fp-submissions-cleanup.test.js`, etc.) still pass — none of these should be affected since the fix is contained to read queries and one snapshot write query in `compliance.js`
|
||||
- Spot-check the integration scenarios from design.md "Integration Tests": upload three xlsx files for the same `report_date` via `POST /preview` + `POST /commit`, then call `/trends`, `/top-recurring`, `/category-trend`, `/summary` and verify aggregated/disclosed responses; call `/vcl/stats` and verify per-vertical `compliance_pct` is correct
|
||||
- Ensure all tests pass, ask the user if questions arise
|
||||
|
||||
|
||||
## Notes
|
||||
|
||||
- All five fixes are contained to `backend/routes/compliance.js`. No database migration, no new column, no frontend change.
|
||||
- The Property 1 / Property 2 task numbering follows the bugfix workflow convention so the IDE hover-status indicator can track exploration tests against expected-behavior verification. Task 1 is the single Property 1 source; tasks 3.2 / 4.2 / 5.2 / 6.2 / 7.2 each re-run the relevant slice of Property 1 (NOT new tests) to confirm the matching fix lands correctly. Task 2 is the single Property 2 source; task 8 re-runs Property 2 in full to confirm no regressions.
|
||||
- The implementation tasks (3 through 7) are independent at the SQL level. Each can be reviewed and merged without waiting on the others, as long as task 1 and task 2 have run on the unfixed code first.
|
||||
- The `_Bug_Condition`, `_Expected_Behavior`, and `_Preservation` annotations on each fix sub-task reference the formal pseudocode in `design.md` Glossary and Bug Details sections.
|
||||
- `_Requirements: X.Y_` annotations cite clauses in `bugfix.md` Bug Analysis.
|
||||
Reference in New Issue
Block a user