Files
cve-dashboard/.kiro/specs/compliance-duplicate-chart-entries/design.md
2026-05-19 15:01:25 -06:00

396 lines
34 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Compliance Duplicate Chart Entries Bugfix Design
## Overview
Five compliance endpoints (`GET /trends`, `GET /top-recurring`, `GET /category-trend`, `GET /summary`) and the `compliance_snapshots` block inside `persistUpload()` all share the same root cause: they key by `compliance_uploads.id` (one row per uploaded xlsx) instead of by `compliance_uploads.report_date` (the calendar date the report covers). Because the compliance pipeline accepts one xlsx per vertical (NTS_AEO, SDIT_CISO, TSI), a single `report_date` typically maps to several `compliance_uploads` rows, and any query that does not aggregate over `report_date` produces duplicated, fragmented, or silently dropped data.
The fix is uniform across endpoints: rewrite the SQL so the result set has exactly one row per unique `report_date`, using `GROUP BY report_date` with `SUM` aggregations for count-style endpoints and `DISTINCT ON (report_date)` for the latest-snapshot endpoint. The `persistUpload()` snapshot block is fixed by adding a `vertical` filter so per-vertical snapshots are no longer cross-contaminated by other verticals' items.
The implementation is intentionally minimal: each fix changes a single SQL statement (and, in one case, a small JavaScript loop). No frontend changes are required — the chart components already key on `report_date` and will render correctly once the API returns one row per date.
## Glossary
- **Bug_Condition (C)**: The condition that triggers the bug — two or more rows in `compliance_uploads` share the same `report_date` (i.e., a multi-vertical upload day).
- **Property (P)**: The desired behavior when C holds — each affected endpoint returns exactly one entry per unique `report_date`, and the values aggregated across uploads for that date reconcile with the underlying `compliance_items` totals.
- **Preservation**: Behavior on dates with a single upload row, on the empty-data response shape, and on unrelated query parameters (e.g., `team` filter on `/summary`) — all must be byte-for-byte unchanged.
- **report_date**: `TEXT` column on `compliance_uploads` storing the reporting period the xlsx covers (e.g., `2025-05-11`). One date can have multiple upload rows when multiple verticals are uploaded for that date.
- **vertical**: `TEXT` column on `compliance_uploads` and `compliance_items` identifying which xlsx (NTS_AEO, SDIT_CISO, TSI) an upload or item belongs to. `NULL` indicates a legacy AEO-only upload.
- **persistUpload()**: Function in `backend/routes/compliance.js` (lines 81192) that writes a parsed upload to the DB inside a transaction and then writes per-vertical snapshots into `compliance_snapshots`.
- **computeWaterfall(uploads)**: Pure helper in `backend/routes/compliance.js` (lines 235243) that takes an ordered list of upload rows and emits one waterfall entry per row, carrying the running `start` forward.
## Bug Details
### Bug Condition
The bug manifests when two or more `compliance_uploads` rows share the same `report_date`. This happens whenever the operator uploads more than one vertical xlsx for the same reporting cycle (the documented multi-vertical workflow). The five affected code paths each produce one row per upload instead of aggregating to one row per `report_date`.
**Formal Specification:**
```
FUNCTION isBugCondition(uploads)
INPUT: uploads — list of compliance_uploads rows
OUTPUT: boolean
// The bug condition is triggered for any report_date that has more than one upload row
GROUP uploads BY report_date INTO groups
RETURN EXISTS group IN groups WHERE COUNT(group) > 1
END FUNCTION
```
For a single endpoint response to be considered buggy, the API output must additionally fail one of the following invariants (the per-endpoint manifestation of the same root cause):
```
FUNCTION isBuggyResponse(endpoint, response)
CASE endpoint OF
'/trends': RETURN COUNT(response.trends) != COUNT(DISTINCT report_date IN compliance_uploads)
'/top-recurring': RETURN COUNT(response.waterfall) != COUNT(DISTINCT report_date IN compliance_uploads)
'/category-trend': RETURN EXISTS (date, category) WITH COUNT(*) > 1 IN response.categoryTrend
'/summary': RETURN response.upload represents only one of N>1 uploads sharing the latest report_date
AND no flag indicates other uploads exist for that date
'persistUpload': RETURN snapshots.total_devices > items_belonging_to_this_vertical_only
END CASE
END FUNCTION
```
### Examples
The originally reported case (GitLab issue #12, 2025-05-11) and the four sibling manifestations:
- **`/trends`** — STEAM uploads three xlsx files for `2025-05-11` (one per vertical). The chart shows three "05/11/25" entries on the x-axis instead of one. Expected: a single 05/11/25 point whose `new_count`/`recurring_count`/`resolved_count`/`total_active` are the sums of the three uploads' counts.
- **`/top-recurring`** — Same three uploads. `computeWaterfall()` receives three rows for `2025-05-11` and emits three bars stacked on the same date. Worse, because `start` carries forward across rows, the second and third bars' `start` reflects the first/second row's `end`, so the three bars in aggregate misrepresent the date-level deltas. Expected: one bar for `2025-05-11` whose `new_count`/`recurring_count`/`resolved_count` are summed across the three uploads, and whose `start` carries from the previous date's `end`.
- **`/category-trend`** — Same three uploads, each with category-tagged items. The query groups by `(cu.id, cu.report_date, category)` and returns up to `3 × |categories|` rows for `2025-05-11`. The frontend stacks these as duplicated category bars per date. Expected: one row per `(2025-05-11, category)` pair with `count` summed across the three uploads.
- **`/summary`** — On `2025-05-11`, three uploads exist. The query `WHERE vertical IS NULL ORDER BY id DESC LIMIT 1` (with fallback to `vertical = 'NTS_AEO'`) silently picks one and the other two verticals' `summary_json` is dropped. Expected: either the response merges all three uploads' `entries` and `overall_scores`, or the response includes a `multi_vertical_uploads` array identifying the other uploads that exist for the same `report_date` so the caller knows the response is partial.
- **Edge case — `persistUpload()` snapshot** — When SDIT_CISO is being persisted on `2025-05-11`, the snapshot query reads `compliance_items WHERE team IS NOT NULL` with no `vertical` filter, so the resulting per-team `total_devices`/`compliant`/`non_compliant` counts include items that belong to NTS_AEO and TSI as well. Expected: the snapshot query filters by the upload's `vertical` and groups by `(vertical, team)`.
## Expected Behavior
### Preservation Requirements
**Unchanged Behaviors:**
- Single-upload-per-date dates (legacy AEO-only workflow): every endpoint returns the same numbers, in the same shape, in the same order as before the fix.
- Empty-data responses: `/trends` returns `{ trends: [] }`, `/top-recurring` returns `{ waterfall: [] }`, `/category-trend` returns `{ categoryTrend: [] }`, `/summary` returns `{ entries: [], overall_scores: {}, upload: null }`.
- `/summary` `team` query parameter: still filters `entries` server-side, still rejects non-`ALLOWED_TEAMS` values with HTTP 400.
- `/summary` `vertical IS NULL``vertical = 'NTS_AEO'` fallback for selecting which upload's `summary_json` to surface (only the additional metadata about sibling uploads is new).
- `persistUpload()` error handling: snapshot creation remains wrapped in a `try/catch` that logs but does not fail the upload commit.
- `compliance_snapshots` rows for months with only a single vertical present in `compliance_items`: identical values to the pre-fix output.
- Frontend chart components: no changes required. They already key on `report_date` and consume the existing response shapes.
**Scope:**
All endpoint inputs that do not involve `report_date` collisions (single-upload dates, empty datasets, error paths, query-parameter filtering) must be byte-for-byte identical to the pre-fix output. The fix only changes what happens when two or more `compliance_uploads` rows share a `report_date`.
## Hypothesized Root Cause
All five sites have the same shape of bug — keying by `id` instead of `report_date` — but with slightly different mechanics. Listing them explicitly so the test plan can confirm or refute each one:
1. **`/trends` — per-row mapping over uploads.** The handler runs `SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC` and `.map()`s each row into a trend entry. Per-team counts are pre-aggregated by `upload_id` and looked up by `u.id`, so duplicate-date rows produce duplicate-date trend entries with split per-team counts.
2. **`/top-recurring``computeWaterfall()` receives per-row data.** The query is identical to `/trends`'s upload query and `computeWaterfall()` carries a stateful `start` forward across rows. Three rows for the same date become three bars whose `start`/`end` running totals are wrong relative to the date-level aggregate.
3. **`/category-trend``GROUP BY cu.id, cu.report_date, category`.** Including `cu.id` in the `GROUP BY` defeats date-level aggregation; one upload row's items get their own (date, category) group instead of summing into the date-level group.
4. **`/summary``ORDER BY id DESC LIMIT 1`.** The query selects a single representative upload for the latest date and discards every other upload sharing that date. This is a "select latest by row id" pattern that does not consider `report_date` ties.
5. **`persistUpload()` snapshot block — missing `vertical` filter.** The snapshot query reads `compliance_items WHERE team IS NOT NULL GROUP BY team` with no `vertical` predicate. The query was correct when there was one vertical (AEO-only legacy) and silently broke when the multi-vertical migration added a `vertical` column without updating this query.
The common structural cause is that the multi-vertical migration (`add_vcl_multi_vertical.js`) added a `vertical` column to `compliance_uploads` and `compliance_items` but did not audit existing read queries for the new "many uploads share a `report_date`" reality.
## Correctness Properties
Property 1: Bug Condition — `/trends` returns one entry per unique report_date
_For any_ set of `compliance_uploads` rows where two or more rows share a `report_date`, the response from `GET /trends` SHALL contain exactly one entry per unique `report_date`, with `new_count`, `recurring_count`, `resolved_count`, and `total_active` equal to the SUM of those columns over all uploads sharing that date, and per-team counts equal to the sum of `compliance_items` rows for that team across all those uploads.
**Validates: Requirements 2.1, 2.2, 2.3**
Property 2: Bug Condition — `/top-recurring` waterfall has one bar per unique report_date with correct running totals
_For any_ set of `compliance_uploads` rows where two or more rows share a `report_date`, the response from `GET /top-recurring` SHALL contain exactly one waterfall entry per unique `report_date`, the entry's `new_count`/`recurring_count`/`resolved_count` SHALL equal the sum of those columns over all uploads sharing that date, and the running invariant `entry[i].end == entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count` SHALL hold with `entry[i].start == entry[i-1].end` for adjacent entries (and `entry[0].start == 0`).
**Validates: Requirements 2.4, 2.5**
Property 3: Bug Condition — `/category-trend` returns one row per (date, category)
_For any_ set of `compliance_uploads` and `compliance_items` rows, the response from `GET /category-trend` SHALL contain exactly one entry per unique `(report_date, category)` pair, and each entry's `count` SHALL equal the total number of `compliance_items` for that category across every upload sharing that `report_date`.
**Validates: Requirements 2.6, 2.7**
Property 4: Bug Condition — `/summary` does not silently drop sibling uploads
_For any_ set of `compliance_uploads` rows where two or more rows share the latest `report_date`, the response from `GET /summary` SHALL either (a) include a merged view of all sibling uploads' `entries` and `overall_scores`, or (b) include a non-empty `multi_vertical_uploads` field listing the IDs and verticals of the other uploads for that date that were not used to populate the response. The response SHALL NOT silently drop sibling uploads.
**Validates: Requirements 2.8, 2.9**
Property 5: Bug Condition — `persistUpload()` snapshot reflects only the snapshotted vertical
_For any_ `persistUpload()` invocation with a non-NULL `vertical`, the rows written into `compliance_snapshots` for the current month SHALL have `total_devices`, `compliant`, and `non_compliant` values equal to the counts derived from `compliance_items` filtered to the snapshotted vertical only. No item from another vertical SHALL contribute to those counts.
**Validates: Requirements 2.10, 2.11**
Property 6: Preservation — Per-endpoint cross-date sums equal source-data totals
_For any_ set of uploads, summing `new_count` (and likewise `recurring_count`, `resolved_count`) across every entry in `GET /trends` SHALL equal the corresponding `SUM(new_count)` over `compliance_uploads`. Similarly, summing `count` across every entry in `GET /category-trend` SHALL equal `COUNT(*)` of `compliance_items` joined to `compliance_uploads`. This holds whether or not any date has duplicate uploads.
**Validates: Requirements 3.1, 3.2**
Property 7: Preservation — Single-upload-per-date dates are unchanged
_For any_ set of `compliance_uploads` where every `report_date` has exactly one row, the responses from `/trends`, `/top-recurring`, `/category-trend`, and `/summary` (and the `compliance_snapshots` rows written by `persistUpload()`) SHALL be identical to the pre-fix output for the same input. The fix SHALL NOT change behavior on the single-upload-per-date case.
**Validates: Requirements 3.1, 3.4, 3.5, 3.6, 3.8**
Property 8: Preservation — Empty-data and error-path responses are unchanged
_For any_ empty dataset (no uploads, no matching items, no items in a category), each affected endpoint SHALL return the same empty-state response shape as before the fix. `/summary` with a non-`ALLOWED_TEAMS` `team` parameter SHALL still respond `400`. `persistUpload()` snapshot errors SHALL still be caught and logged without failing the upload commit.
**Validates: Requirements 3.3, 3.7, 3.9, 3.10**
## Fix Implementation
### Changes Required
All changes are in `backend/routes/compliance.js`. No schema migration, no new column, no frontend change.
#### Fix 1: `GET /trends` — aggregate uploads and team counts by `report_date`
**Function**: `router.get('/trends', ...)` (around line 768)
**Specific Changes**:
1. Replace the `compliance_uploads` query so it groups by `report_date` and sums the count columns:
```sql
SELECT report_date,
SUM(COALESCE(new_count, 0))::int AS new_count,
SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
SUM(COALESCE(resolved_count, 0))::int AS resolved_count,
SUM(COALESCE(new_count, 0) + COALESCE(recurring_count, 0))::int AS total_active
FROM compliance_uploads
WHERE report_date IS NOT NULL
GROUP BY report_date
ORDER BY report_date ASC
```
2. Replace the per-team `compliance_items` query so it joins to `compliance_uploads` and groups by `(report_date, team)` instead of `(upload_id, team)`:
```sql
SELECT cu.report_date, ci.team, COUNT(ci.id)::int AS count
FROM compliance_items ci
JOIN compliance_uploads cu ON ci.upload_id = cu.id
WHERE ci.team IS NOT NULL AND cu.report_date IS NOT NULL
GROUP BY cu.report_date, ci.team
```
3. Change the `teamMap` keyed lookup from `teamMap[u.id]` to `teamMap[u.report_date]` and rebuild `trends` from the per-date upload rows.
#### Fix 2: `GET /top-recurring` — aggregate uploads by `report_date` before passing to `computeWaterfall()`
**Function**: `router.get('/top-recurring', ...)` (around line 818)
**Specific Changes**:
1. Replace the query with the same `GROUP BY report_date` pattern used in `/trends` (without `id`, since `computeWaterfall()` only needs `report_date`, `new_count`, `recurring_count`, `resolved_count`):
```sql
SELECT report_date,
SUM(COALESCE(new_count, 0))::int AS new_count,
SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
SUM(COALESCE(resolved_count, 0))::int AS resolved_count
FROM compliance_uploads
WHERE report_date IS NOT NULL
GROUP BY report_date
ORDER BY report_date ASC
```
2. `computeWaterfall()` itself does not change — it already advances `start` correctly when fed one row per date. The fix is purely in the SQL.
#### Fix 3: `GET /category-trend` — drop `cu.id` from `GROUP BY`
**Function**: `router.get('/category-trend', ...)` (around line 838)
**Specific Changes**:
1. Remove `cu.id` from the `GROUP BY` clause so the grouping is by `(report_date, category)` only:
```sql
SELECT cu.report_date,
COALESCE(ci.category, 'Unknown') AS category,
COUNT(ci.id)::int AS count
FROM compliance_uploads cu
JOIN compliance_items ci ON ci.upload_id = cu.id
WHERE cu.report_date IS NOT NULL
GROUP BY cu.report_date, COALESCE(ci.category, 'Unknown')
ORDER BY cu.report_date ASC, category ASC
```
2. The response shape (`{ categoryTrend: Array<{ report_date, category, count }> }`) does not change. Only the row count for multi-vertical dates changes (collapsing duplicates into sums).
#### Fix 4: `GET /summary` — disclose sibling uploads for the latest date
**Function**: `router.get('/summary', ...)` (around line 495)
**Specific Changes**:
1. Keep the existing `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for choosing the primary upload's `summary_json` (this preserves the legacy single-upload behavior).
2. After resolving `latestUpload`, run a second query to find sibling uploads sharing the same `report_date`:
```sql
SELECT id, vertical, uploaded_at
FROM compliance_uploads
WHERE report_date = $1 AND id != $2
ORDER BY id ASC
```
3. Add `multi_vertical_uploads` to the response when sibling uploads exist:
```javascript
res.json({
entries,
overall_scores: summary.overall_scores || {},
upload: { id, report_date, uploaded_at },
multi_vertical_uploads: siblings.map(s => ({ id: s.id, vertical: s.vertical, uploaded_at: s.uploaded_at })),
});
```
4. When no sibling uploads exist (single-upload-per-date case), `multi_vertical_uploads` is `[]` (or omitted — see open question in test plan).
This is the conservative option (b) from requirement 2.8 — return a documented selection plus metadata about siblings — rather than option (a) full server-side merge. Option (b) is chosen because (i) the `summary_json` schema is per-vertical and merging would require reconciliation logic that doesn't currently exist, and (ii) the existing fallback selection (NTS_AEO) is the established representative for the legacy AEO chart on the Compliance page.
#### Fix 5: `persistUpload()` snapshot block — filter and group by `vertical`
**Function**: `persistUpload()` (lines 81192), specifically the `verticalStats` query at line 157
**Specific Changes**:
1. Determine the upload's `vertical` (read it from the upload row immediately after the `RETURNING id` insert, or accept it as a parameter to `persistUpload()`).
2. Replace the `verticalStats` query with one that filters by the upload's `vertical` and groups by `(vertical, team)`:
```sql
SELECT vertical, team,
COUNT(DISTINCT hostname)::int AS total_devices,
COUNT(DISTINCT CASE WHEN status = 'resolved' THEN hostname END)::int AS compliant,
COUNT(DISTINCT CASE WHEN status = 'active' THEN hostname END)::int AS non_compliant
FROM compliance_items
WHERE team IS NOT NULL AND vertical IS NOT DISTINCT FROM $1
GROUP BY vertical, team
```
(`IS NOT DISTINCT FROM` handles the legacy `vertical IS NULL` case correctly, so AEO-only uploads keep their previous semantics.)
3. The `INSERT ... ON CONFLICT (snapshot_month, vertical) DO UPDATE` already keys snapshots by `vertical`, so no change is required there. However, the `vertical` value passed in must come from the query result, not from `team AS vertical` (which conflates the team and vertical concepts).
4. If the per-snapshot-row "vertical" identity needs to remain `team` for back-compat reasons, leave the `INSERT` mapping unchanged but ensure the underlying counts are filtered to the upload's actual `vertical`. Confirm via inspection of `compliance_snapshots` consumers (`/vcl/stats`) before finalising.
## Testing Strategy
### Validation Approach
The bug condition is straightforward to construct: insert two `compliance_uploads` rows with the same `report_date` and matching `compliance_items`, then call each affected endpoint. The two-phase approach is to first run the tests against the unfixed code to confirm the duplication/silent-drop counterexamples, then run the same tests against the fixed code and add property-based tests that explore the input space more broadly.
### Exploratory Bug Condition Checking
**Goal**: Surface counterexamples that demonstrate each of the five manifestations BEFORE implementing the fix. Confirm or refute the root cause analysis for each endpoint independently — they share a structural cause but the SQL details differ.
**Test Plan**: Seed a clean test database with a fixture representing the original GitLab #12 scenario (three uploads for `2025-05-11`, one each for NTS_AEO, SDIT_CISO, TSI, with realistic `compliance_items`). Call each affected endpoint and assert the buggy invariants. Run on UNFIXED code first.
**Test Cases**:
1. **`/trends` Duplicate Date Test** — Insert three uploads for `2025-05-11` (verticals NTS_AEO, SDIT_CISO, TSI), each with distinct `new_count`/`recurring_count`/`resolved_count` and matching `compliance_items` per team. Call `GET /trends`. Assert `response.trends.filter(t => t.report_date === '2025-05-11').length === 1`. (will fail on unfixed code — returns 3)
2. **`/top-recurring` Duplicate Bar Test** — Same fixture. Call `GET /top-recurring`. Assert `response.waterfall.filter(w => w.date === '2025-05-11').length === 1` AND assert the running invariant `waterfall[i].end === waterfall[i].start + waterfall[i].new_count + waterfall[i].recurring_count - waterfall[i].resolved_count` holds for every `i`. (will fail on unfixed code — returns 3 bars and the running totals reflect mid-row state, not date-level aggregate)
3. **`/category-trend` Duplicate (date, category) Test** — Same fixture, plus items tagged with two categories (e.g., "Patching" and "Configuration"). Call `GET /category-trend`. Assert that for each `(report_date, category)` pair, `response.categoryTrend.filter(c => c.report_date === '2025-05-11' && c.category === 'Patching').length === 1`. (will fail on unfixed code — returns 3 rows per category)
4. **`/summary` Sibling Disclosure Test** — Same fixture (three uploads for `2025-05-11`, latest date). Call `GET /summary`. Assert either (a) the response merges `entries` from all three uploads, or (b) `response.multi_vertical_uploads.length === 2`. (will fail on unfixed code — silently picks one upload, the other two are dropped without any indication)
5. **`persistUpload()` Cross-Vertical Contamination Test** — Pre-populate `compliance_items` with rows from multiple verticals (e.g., NTS_AEO has 100 active items, SDIT_CISO has 50 active items). Call `persistUpload()` with a fresh SDIT_CISO upload. Read back the `compliance_snapshots` row for the current month and SDIT_CISO. Assert `total_devices` reflects only SDIT_CISO items, not the combined 150. (will fail on unfixed code — total includes both verticals)
6. **Edge Case — Single-Upload-Per-Date Regression Test** — Insert a fixture with a single upload per date for three dates. Call all four read endpoints and capture responses. Apply the fix, re-run, and assert response equality (byte-for-byte). (should pass on unfixed code; will pass on fixed code; protects the preservation property)
**Expected Counterexamples**:
- `/trends` returns N trend entries for a date with N uploads (N > 1). Cause: per-row `.map()` over uploads instead of date-level aggregation.
- `/top-recurring` returns N waterfall bars for a date with N uploads. Cause: same per-row pattern, plus `computeWaterfall()` carries `start` forward across the duplicate-date rows.
- `/category-trend` returns N × |categories| rows for a date with N uploads. Cause: `cu.id` is in the `GROUP BY` clause.
- `/summary` returns one upload's `summary_json` and silently drops siblings. Cause: `ORDER BY id DESC LIMIT 1` with no `report_date`-tie handling.
- `persistUpload()` writes inflated `total_devices`. Cause: missing `WHERE vertical = $1` and `GROUP BY vertical, team` in the snapshot query.
### Fix Checking
**Goal**: Verify that for all inputs where the bug condition holds (any `report_date` shared by two or more uploads), each fixed endpoint produces the expected aggregated/disclosed result.
**Pseudocode:**
```
FOR ALL (uploads, items) WHERE EXISTS report_date d WITH COUNT(uploads WHERE report_date = d) > 1 DO
trends_response := GET_trends_fixed(uploads, items)
waterfall_response := GET_top_recurring_fixed(uploads, items)
cattrend_response := GET_category_trend_fixed(uploads, items)
summary_response := GET_summary_fixed(uploads, items)
snapshot_rows := persistUpload_fixed(new_upload_for_some_vertical, items)
ASSERT one_entry_per_date(trends_response.trends)
ASSERT one_entry_per_date(waterfall_response.waterfall) AND running_invariant_holds(waterfall_response.waterfall)
ASSERT one_entry_per_date_category_pair(cattrend_response.categoryTrend)
ASSERT siblings_disclosed(summary_response, uploads)
ASSERT snapshots_filtered_to_vertical(snapshot_rows, new_upload.vertical, items)
END FOR
```
### Preservation Checking
**Goal**: Verify that for all inputs where the bug condition does NOT hold (every `report_date` has exactly one upload row), the fixed endpoints produce results identical to the original endpoints.
**Pseudocode:**
```
FOR ALL (uploads, items) WHERE FORALL report_date d, COUNT(uploads WHERE report_date = d) <= 1 DO
ASSERT GET_trends_original(uploads, items) = GET_trends_fixed(uploads, items)
ASSERT GET_top_recurring_original(uploads, items) = GET_top_recurring_fixed(uploads, items)
ASSERT GET_category_trend_original(uploads, items) = GET_category_trend_fixed(uploads, items)
ASSERT GET_summary_original(uploads, items) = GET_summary_fixed(uploads, items)
ASSERT persistUpload_original(upload, items).snapshots = persistUpload_fixed(upload, items).snapshots
END FOR
```
**Testing Approach**: Property-based testing is the right fit for preservation checking here:
- The single-upload-per-date input space is large (any number of dates, any combination of counts, any team distribution, any category mix, any vertical), and exhaustive enumeration is impractical.
- The preservation property is a strict equality, which is well-suited to PBT shrinking (any counterexample is a small fixture demonstrating a behavior change).
- The legacy AEO-only data shape (`vertical IS NULL`) must be exercised, which falls naturally out of generators that include null verticals.
**Test Plan**: Capture responses from the unfixed code on single-upload-per-date fixtures (snapshot tests). After applying the fix, re-run the same fixtures and assert equality. Then run a property-based generator that produces random single-upload-per-date scenarios and asserts the same equality.
**Test Cases**:
1. **Snapshot Equality — Empty State** — Empty `compliance_uploads`. All four endpoints return their documented empty-state shapes. Snapshot-test before and after the fix.
2. **Snapshot Equality — Single AEO-Only Upload** — One upload with `vertical IS NULL`, classic legacy fixture. Capture pre-fix responses, apply fix, assert equality.
3. **Snapshot Equality — Multiple Single-Upload Dates** — Five dates, one upload each, varied `vertical` values. Capture pre-fix responses, apply fix, assert equality.
4. **`/summary` Team Filter Preservation** — Latest upload exists, `?team=STEAM` parameter is supplied. Assert `entries` is filtered to `team === 'STEAM'` rows. Assert non-`ALLOWED_TEAMS` value (e.g., `?team=OTHER`) returns HTTP 400.
5. **`persistUpload()` Snapshot Equality — Single-Vertical Month** — Pre-populate `compliance_items` with rows from a single vertical only. Run `persistUpload()` for that vertical. Assert the resulting `compliance_snapshots` rows are identical pre-fix and post-fix.
6. **Error Path Preservation** — Force a snapshot query failure (e.g., transient DB error). Assert `persistUpload()` still commits the upload and the error is logged but not surfaced to the caller.
### Unit Tests
- `/trends` aggregation: two uploads sharing a `report_date`, one upload alone for an earlier date. Assert response has 2 entries and `new_count` for the shared date equals the sum of the two uploads.
- `/top-recurring` aggregation and running totals: same fixture as above. Assert 2 waterfall entries and the running `start`/`end` invariant.
- `/category-trend` aggregation: two uploads sharing a `report_date`, items tagged with two categories. Assert one row per `(date, category)` pair with summed counts.
- `/summary` sibling disclosure: three uploads sharing the latest date. Assert response shape matches the chosen disclosure approach (option (b)).
- `/summary` team filter: same upload, with and without `?team=STEAM`.
- `persistUpload()` per-vertical snapshot: items in two verticals, run upload for one, assert snapshots for that vertical do not include the other vertical's items.
- `persistUpload()` legacy AEO-only path (`vertical IS NULL`): unchanged behavior.
### Property-Based Tests
- **`/trends` aggregation property** — Generate a random list of `(report_date, new_count, recurring_count, resolved_count)` upload tuples (with possible date collisions). Generate matching per-team item counts. Assert the response has exactly one entry per unique `report_date` AND for each entry, `new_count` equals the SUM of input `new_count`s for that date (likewise the other count fields and per-team counts).
- **`/top-recurring` running invariant property** — Same generator. Assert the response has one bar per unique `report_date` AND for every adjacent pair of entries, `entry[i].start === entry[i-1].end`, AND `entry[i].end === entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count`.
- **`/category-trend` total-conservation property** — Generate a random set of `compliance_items` and uploads. Assert `SUM(response.categoryTrend.map(c => c.count)) === total number of compliance_items joined to non-null-report_date uploads`. This holds whether or not any date has multiple uploads.
- **`/summary` sibling-disclosure property** — Generate a random set of uploads with possible duplicate `report_dates`. Pick the latest date. Assert that if any sibling upload exists for that date, the response contains a non-empty `multi_vertical_uploads` array referencing every sibling upload's id.
- **`persistUpload()` vertical-isolation property** — Generate two non-empty disjoint sets of `compliance_items`, one per vertical. Insert both. Run `persistUpload()` for vertical A. Assert the resulting `compliance_snapshots` rows for vertical A reflect only set-A items (count of distinct hostnames matches).
- **Cross-endpoint preservation property** — Generate any fixture where every `report_date` has exactly one upload row. Assert all five fixed endpoints produce byte-for-byte identical results to the original endpoints.
### Integration Tests
- Full upload-to-chart flow: upload three xlsx files (one per vertical) with the same `report_date` via `POST /preview` + `POST /commit`, then call `/trends`, `/top-recurring`, `/category-trend`, `/summary` and verify all four return the expected aggregated/disclosed results.
- Compliance Charts panel render: load `ComplianceChartsPanel.js` with a multi-vertical-day fixture and assert (via DOM snapshot) the x-axis shows each date exactly once on `Active Findings Over Time` and `Change per Report Cycle`.
- Snapshot consumer regression: after running `persistUpload()` with the fix, call `/vcl/stats` (which reads `compliance_snapshots`) and verify per-vertical `compliance_pct` is unchanged from the pre-fix value when only one vertical's items are present, and is corrected when multiple verticals are present.
### Test Fixtures Required
The following fixtures are needed and can be reused across all five endpoints' tests:
1. **`fixture_empty`** — No `compliance_uploads`, no `compliance_items`. Used by the empty-state preservation tests.
2. **`fixture_single_upload_aeo_legacy`** — One `compliance_uploads` row with `vertical IS NULL`, `report_date = '2025-04-01'`, with ~20 `compliance_items` distributed across the four teams. Used by the legacy-path preservation tests.
3. **`fixture_single_upload_per_date`** — Five `compliance_uploads` rows, each with a distinct `report_date` (`2025-04-01` through `2025-05-01`), each with a distinct `vertical` value among `{NTS_AEO, SDIT_CISO, TSI, NULL, NTS_AEO}`. Used by the broader preservation tests and by `/category-trend` total-conservation.
4. **`fixture_multi_vertical_single_date`** — Three `compliance_uploads` rows all with `report_date = '2025-05-11'`, verticals NTS_AEO/SDIT_CISO/TSI, each with distinct `new_count`/`recurring_count`/`resolved_count` and 510 `compliance_items` per upload spanning multiple teams and categories. This is the canonical bug-condition fixture and reproduces the original GitLab #12 scenario.
5. **`fixture_mixed_history`** — Combination of `fixture_single_upload_per_date` and `fixture_multi_vertical_single_date` — multiple dates, some with single uploads, some with two or three. Used by the property-based tests as a realistic state-of-the-world fixture.
6. **`fixture_cross_vertical_items`** — Two non-empty disjoint sets of `compliance_items`, one tagged `vertical = 'NTS_AEO'` and one tagged `vertical = 'SDIT_CISO'`, sharing some hostnames between verticals to ensure the count-distinct logic is exercised. Used by the `persistUpload()` vertical-isolation tests.
7. **`fixture_pbt_generators`** — fast-check (or equivalent) arbitraries:
- `arbReportDate`: ISO date string in a bounded range (e.g., last 90 days).
- `arbVertical`: oneof `'NTS_AEO' | 'SDIT_CISO' | 'TSI' | null`.
- `arbUpload`: `{ report_date, vertical, new_count, recurring_count, resolved_count }` with non-negative integer counts.
- `arbItem`: `{ hostname, team in ALLOWED_TEAMS, category in {Patching, Configuration, Vulnerability, Other}, vertical, status in {active, resolved} }`.
- `arbScenario`: `{ uploads: arbUpload[], items: arbItem[] }`, where items reference uploads via `upload_id` and dates can collide.