396 lines
34 KiB
Markdown
396 lines
34 KiB
Markdown
# Compliance Duplicate Chart Entries Bugfix Design
|
||
|
||
## Overview
|
||
|
||
Five compliance endpoints (`GET /trends`, `GET /top-recurring`, `GET /category-trend`, `GET /summary`) and the `compliance_snapshots` block inside `persistUpload()` all share the same root cause: they key by `compliance_uploads.id` (one row per uploaded xlsx) instead of by `compliance_uploads.report_date` (the calendar date the report covers). Because the compliance pipeline accepts one xlsx per vertical (NTS_AEO, SDIT_CISO, TSI), a single `report_date` typically maps to several `compliance_uploads` rows, and any query that does not aggregate over `report_date` produces duplicated, fragmented, or silently dropped data.
|
||
|
||
The fix is uniform across endpoints: rewrite the SQL so the result set has exactly one row per unique `report_date`, using `GROUP BY report_date` with `SUM` aggregations for count-style endpoints and `DISTINCT ON (report_date)` for the latest-snapshot endpoint. The `persistUpload()` snapshot block is fixed by adding a `vertical` filter so per-vertical snapshots are no longer cross-contaminated by other verticals' items.
|
||
|
||
The implementation is intentionally minimal: each fix changes a single SQL statement (and, in one case, a small JavaScript loop). No frontend changes are required — the chart components already key on `report_date` and will render correctly once the API returns one row per date.
|
||
|
||
## Glossary
|
||
|
||
- **Bug_Condition (C)**: The condition that triggers the bug — two or more rows in `compliance_uploads` share the same `report_date` (i.e., a multi-vertical upload day).
|
||
- **Property (P)**: The desired behavior when C holds — each affected endpoint returns exactly one entry per unique `report_date`, and the values aggregated across uploads for that date reconcile with the underlying `compliance_items` totals.
|
||
- **Preservation**: Behavior on dates with a single upload row, on the empty-data response shape, and on unrelated query parameters (e.g., `team` filter on `/summary`) — all must be byte-for-byte unchanged.
|
||
- **report_date**: `TEXT` column on `compliance_uploads` storing the reporting period the xlsx covers (e.g., `2025-05-11`). One date can have multiple upload rows when multiple verticals are uploaded for that date.
|
||
- **vertical**: `TEXT` column on `compliance_uploads` and `compliance_items` identifying which xlsx (NTS_AEO, SDIT_CISO, TSI) an upload or item belongs to. `NULL` indicates a legacy AEO-only upload.
|
||
- **persistUpload()**: Function in `backend/routes/compliance.js` (lines 81–192) that writes a parsed upload to the DB inside a transaction and then writes per-vertical snapshots into `compliance_snapshots`.
|
||
- **computeWaterfall(uploads)**: Pure helper in `backend/routes/compliance.js` (lines 235–243) that takes an ordered list of upload rows and emits one waterfall entry per row, carrying the running `start` forward.
|
||
|
||
## Bug Details
|
||
|
||
### Bug Condition
|
||
|
||
The bug manifests when two or more `compliance_uploads` rows share the same `report_date`. This happens whenever the operator uploads more than one vertical xlsx for the same reporting cycle (the documented multi-vertical workflow). The five affected code paths each produce one row per upload instead of aggregating to one row per `report_date`.
|
||
|
||
**Formal Specification:**
|
||
```
|
||
FUNCTION isBugCondition(uploads)
|
||
INPUT: uploads — list of compliance_uploads rows
|
||
OUTPUT: boolean
|
||
|
||
// The bug condition is triggered for any report_date that has more than one upload row
|
||
GROUP uploads BY report_date INTO groups
|
||
RETURN EXISTS group IN groups WHERE COUNT(group) > 1
|
||
END FUNCTION
|
||
```
|
||
|
||
For a single endpoint response to be considered buggy, the API output must additionally fail one of the following invariants (the per-endpoint manifestation of the same root cause):
|
||
|
||
```
|
||
FUNCTION isBuggyResponse(endpoint, response)
|
||
CASE endpoint OF
|
||
'/trends': RETURN COUNT(response.trends) != COUNT(DISTINCT report_date IN compliance_uploads)
|
||
'/top-recurring': RETURN COUNT(response.waterfall) != COUNT(DISTINCT report_date IN compliance_uploads)
|
||
'/category-trend': RETURN EXISTS (date, category) WITH COUNT(*) > 1 IN response.categoryTrend
|
||
'/summary': RETURN response.upload represents only one of N>1 uploads sharing the latest report_date
|
||
AND no flag indicates other uploads exist for that date
|
||
'persistUpload': RETURN snapshots.total_devices > items_belonging_to_this_vertical_only
|
||
END CASE
|
||
END FUNCTION
|
||
```
|
||
|
||
### Examples
|
||
|
||
The originally reported case (GitLab issue #12, 2025-05-11) and the four sibling manifestations:
|
||
|
||
- **`/trends`** — STEAM uploads three xlsx files for `2025-05-11` (one per vertical). The chart shows three "05/11/25" entries on the x-axis instead of one. Expected: a single 05/11/25 point whose `new_count`/`recurring_count`/`resolved_count`/`total_active` are the sums of the three uploads' counts.
|
||
|
||
- **`/top-recurring`** — Same three uploads. `computeWaterfall()` receives three rows for `2025-05-11` and emits three bars stacked on the same date. Worse, because `start` carries forward across rows, the second and third bars' `start` reflects the first/second row's `end`, so the three bars in aggregate misrepresent the date-level deltas. Expected: one bar for `2025-05-11` whose `new_count`/`recurring_count`/`resolved_count` are summed across the three uploads, and whose `start` carries from the previous date's `end`.
|
||
|
||
- **`/category-trend`** — Same three uploads, each with category-tagged items. The query groups by `(cu.id, cu.report_date, category)` and returns up to `3 × |categories|` rows for `2025-05-11`. The frontend stacks these as duplicated category bars per date. Expected: one row per `(2025-05-11, category)` pair with `count` summed across the three uploads.
|
||
|
||
- **`/summary`** — On `2025-05-11`, three uploads exist. The query `WHERE vertical IS NULL ORDER BY id DESC LIMIT 1` (with fallback to `vertical = 'NTS_AEO'`) silently picks one and the other two verticals' `summary_json` is dropped. Expected: either the response merges all three uploads' `entries` and `overall_scores`, or the response includes a `multi_vertical_uploads` array identifying the other uploads that exist for the same `report_date` so the caller knows the response is partial.
|
||
|
||
- **Edge case — `persistUpload()` snapshot** — When SDIT_CISO is being persisted on `2025-05-11`, the snapshot query reads `compliance_items WHERE team IS NOT NULL` with no `vertical` filter, so the resulting per-team `total_devices`/`compliant`/`non_compliant` counts include items that belong to NTS_AEO and TSI as well. Expected: the snapshot query filters by the upload's `vertical` and groups by `(vertical, team)`.
|
||
|
||
## Expected Behavior
|
||
|
||
### Preservation Requirements
|
||
|
||
**Unchanged Behaviors:**
|
||
- Single-upload-per-date dates (legacy AEO-only workflow): every endpoint returns the same numbers, in the same shape, in the same order as before the fix.
|
||
- Empty-data responses: `/trends` returns `{ trends: [] }`, `/top-recurring` returns `{ waterfall: [] }`, `/category-trend` returns `{ categoryTrend: [] }`, `/summary` returns `{ entries: [], overall_scores: {}, upload: null }`.
|
||
- `/summary` `team` query parameter: still filters `entries` server-side, still rejects non-`ALLOWED_TEAMS` values with HTTP 400.
|
||
- `/summary` `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for selecting which upload's `summary_json` to surface (only the additional metadata about sibling uploads is new).
|
||
- `persistUpload()` error handling: snapshot creation remains wrapped in a `try/catch` that logs but does not fail the upload commit.
|
||
- `compliance_snapshots` rows for months with only a single vertical present in `compliance_items`: identical values to the pre-fix output.
|
||
- Frontend chart components: no changes required. They already key on `report_date` and consume the existing response shapes.
|
||
|
||
**Scope:**
|
||
All endpoint inputs that do not involve `report_date` collisions (single-upload dates, empty datasets, error paths, query-parameter filtering) must be byte-for-byte identical to the pre-fix output. The fix only changes what happens when two or more `compliance_uploads` rows share a `report_date`.
|
||
|
||
## Hypothesized Root Cause
|
||
|
||
All five sites have the same shape of bug — keying by `id` instead of `report_date` — but with slightly different mechanics. Listing them explicitly so the test plan can confirm or refute each one:
|
||
|
||
1. **`/trends` — per-row mapping over uploads.** The handler runs `SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC` and `.map()`s each row into a trend entry. Per-team counts are pre-aggregated by `upload_id` and looked up by `u.id`, so duplicate-date rows produce duplicate-date trend entries with split per-team counts.
|
||
|
||
2. **`/top-recurring` — `computeWaterfall()` receives per-row data.** The query is identical to `/trends`'s upload query and `computeWaterfall()` carries a stateful `start` forward across rows. Three rows for the same date become three bars whose `start`/`end` running totals are wrong relative to the date-level aggregate.
|
||
|
||
3. **`/category-trend` — `GROUP BY cu.id, cu.report_date, category`.** Including `cu.id` in the `GROUP BY` defeats date-level aggregation; one upload row's items get their own (date, category) group instead of summing into the date-level group.
|
||
|
||
4. **`/summary` — `ORDER BY id DESC LIMIT 1`.** The query selects a single representative upload for the latest date and discards every other upload sharing that date. This is a "select latest by row id" pattern that does not consider `report_date` ties.
|
||
|
||
5. **`persistUpload()` snapshot block — missing `vertical` filter.** The snapshot query reads `compliance_items WHERE team IS NOT NULL GROUP BY team` with no `vertical` predicate. The query was correct when there was one vertical (AEO-only legacy) and silently broke when the multi-vertical migration added a `vertical` column without updating this query.
|
||
|
||
The common structural cause is that the multi-vertical migration (`add_vcl_multi_vertical.js`) added a `vertical` column to `compliance_uploads` and `compliance_items` but did not audit existing read queries for the new "many uploads share a `report_date`" reality.
|
||
|
||
## Correctness Properties
|
||
|
||
Property 1: Bug Condition — `/trends` returns one entry per unique report_date
|
||
|
||
_For any_ set of `compliance_uploads` rows where two or more rows share a `report_date`, the response from `GET /trends` SHALL contain exactly one entry per unique `report_date`, with `new_count`, `recurring_count`, `resolved_count`, and `total_active` equal to the SUM of those columns over all uploads sharing that date, and per-team counts equal to the sum of `compliance_items` rows for that team across all those uploads.
|
||
|
||
**Validates: Requirements 2.1, 2.2, 2.3**
|
||
|
||
Property 2: Bug Condition — `/top-recurring` waterfall has one bar per unique report_date with correct running totals
|
||
|
||
_For any_ set of `compliance_uploads` rows where two or more rows share a `report_date`, the response from `GET /top-recurring` SHALL contain exactly one waterfall entry per unique `report_date`, the entry's `new_count`/`recurring_count`/`resolved_count` SHALL equal the sum of those columns over all uploads sharing that date, and the running invariant `entry[i].end == entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count` SHALL hold with `entry[i].start == entry[i-1].end` for adjacent entries (and `entry[0].start == 0`).
|
||
|
||
**Validates: Requirements 2.4, 2.5**
|
||
|
||
Property 3: Bug Condition — `/category-trend` returns one row per (date, category)
|
||
|
||
_For any_ set of `compliance_uploads` and `compliance_items` rows, the response from `GET /category-trend` SHALL contain exactly one entry per unique `(report_date, category)` pair, and each entry's `count` SHALL equal the total number of `compliance_items` for that category across every upload sharing that `report_date`.
|
||
|
||
**Validates: Requirements 2.6, 2.7**
|
||
|
||
Property 4: Bug Condition — `/summary` does not silently drop sibling uploads
|
||
|
||
_For any_ set of `compliance_uploads` rows where two or more rows share the latest `report_date`, the response from `GET /summary` SHALL either (a) include a merged view of all sibling uploads' `entries` and `overall_scores`, or (b) include a non-empty `multi_vertical_uploads` field listing the IDs and verticals of the other uploads for that date that were not used to populate the response. The response SHALL NOT silently drop sibling uploads.
|
||
|
||
**Validates: Requirements 2.8, 2.9**
|
||
|
||
Property 5: Bug Condition — `persistUpload()` snapshot reflects only the snapshotted vertical
|
||
|
||
_For any_ `persistUpload()` invocation with a non-NULL `vertical`, the rows written into `compliance_snapshots` for the current month SHALL have `total_devices`, `compliant`, and `non_compliant` values equal to the counts derived from `compliance_items` filtered to the snapshotted vertical only. No item from another vertical SHALL contribute to those counts.
|
||
|
||
**Validates: Requirements 2.10, 2.11**
|
||
|
||
Property 6: Preservation — Per-endpoint cross-date sums equal source-data totals
|
||
|
||
_For any_ set of uploads, summing `new_count` (and likewise `recurring_count`, `resolved_count`) across every entry in `GET /trends` SHALL equal the corresponding `SUM(new_count)` over `compliance_uploads`. Similarly, summing `count` across every entry in `GET /category-trend` SHALL equal `COUNT(*)` of `compliance_items` joined to `compliance_uploads`. This holds whether or not any date has duplicate uploads.
|
||
|
||
**Validates: Requirements 3.1, 3.2**
|
||
|
||
Property 7: Preservation — Single-upload-per-date dates are unchanged
|
||
|
||
_For any_ set of `compliance_uploads` where every `report_date` has exactly one row, the responses from `/trends`, `/top-recurring`, `/category-trend`, and `/summary` (and the `compliance_snapshots` rows written by `persistUpload()`) SHALL be identical to the pre-fix output for the same input. The fix SHALL NOT change behavior on the single-upload-per-date case.
|
||
|
||
**Validates: Requirements 3.1, 3.4, 3.5, 3.6, 3.8**
|
||
|
||
Property 8: Preservation — Empty-data and error-path responses are unchanged
|
||
|
||
_For any_ empty dataset (no uploads, no matching items, no items in a category), each affected endpoint SHALL return the same empty-state response shape as before the fix. `/summary` with a non-`ALLOWED_TEAMS` `team` parameter SHALL still respond `400`. `persistUpload()` snapshot errors SHALL still be caught and logged without failing the upload commit.
|
||
|
||
**Validates: Requirements 3.3, 3.7, 3.9, 3.10**
|
||
|
||
## Fix Implementation
|
||
|
||
### Changes Required
|
||
|
||
All changes are in `backend/routes/compliance.js`. No schema migration, no new column, no frontend change.
|
||
|
||
#### Fix 1: `GET /trends` — aggregate uploads and team counts by `report_date`
|
||
|
||
**Function**: `router.get('/trends', ...)` (around line 768)
|
||
|
||
**Specific Changes**:
|
||
1. Replace the `compliance_uploads` query so it groups by `report_date` and sums the count columns:
|
||
```sql
|
||
SELECT report_date,
|
||
SUM(COALESCE(new_count, 0))::int AS new_count,
|
||
SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
|
||
SUM(COALESCE(resolved_count, 0))::int AS resolved_count,
|
||
SUM(COALESCE(new_count, 0) + COALESCE(recurring_count, 0))::int AS total_active
|
||
FROM compliance_uploads
|
||
WHERE report_date IS NOT NULL
|
||
GROUP BY report_date
|
||
ORDER BY report_date ASC
|
||
```
|
||
2. Replace the per-team `compliance_items` query so it joins to `compliance_uploads` and groups by `(report_date, team)` instead of `(upload_id, team)`:
|
||
```sql
|
||
SELECT cu.report_date, ci.team, COUNT(ci.id)::int AS count
|
||
FROM compliance_items ci
|
||
JOIN compliance_uploads cu ON ci.upload_id = cu.id
|
||
WHERE ci.team IS NOT NULL AND cu.report_date IS NOT NULL
|
||
GROUP BY cu.report_date, ci.team
|
||
```
|
||
3. Change the `teamMap` keyed lookup from `teamMap[u.id]` to `teamMap[u.report_date]` and rebuild `trends` from the per-date upload rows.
|
||
|
||
#### Fix 2: `GET /top-recurring` — aggregate uploads by `report_date` before passing to `computeWaterfall()`
|
||
|
||
**Function**: `router.get('/top-recurring', ...)` (around line 818)
|
||
|
||
**Specific Changes**:
|
||
1. Replace the query with the same `GROUP BY report_date` pattern used in `/trends` (without `id`, since `computeWaterfall()` only needs `report_date`, `new_count`, `recurring_count`, `resolved_count`):
|
||
```sql
|
||
SELECT report_date,
|
||
SUM(COALESCE(new_count, 0))::int AS new_count,
|
||
SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
|
||
SUM(COALESCE(resolved_count, 0))::int AS resolved_count
|
||
FROM compliance_uploads
|
||
WHERE report_date IS NOT NULL
|
||
GROUP BY report_date
|
||
ORDER BY report_date ASC
|
||
```
|
||
2. `computeWaterfall()` itself does not change — it already advances `start` correctly when fed one row per date. The fix is purely in the SQL.
|
||
|
||
#### Fix 3: `GET /category-trend` — drop `cu.id` from `GROUP BY`
|
||
|
||
**Function**: `router.get('/category-trend', ...)` (around line 838)
|
||
|
||
**Specific Changes**:
|
||
1. Remove `cu.id` from the `GROUP BY` clause so the grouping is by `(report_date, category)` only:
|
||
```sql
|
||
SELECT cu.report_date,
|
||
COALESCE(ci.category, 'Unknown') AS category,
|
||
COUNT(ci.id)::int AS count
|
||
FROM compliance_uploads cu
|
||
JOIN compliance_items ci ON ci.upload_id = cu.id
|
||
WHERE cu.report_date IS NOT NULL
|
||
GROUP BY cu.report_date, COALESCE(ci.category, 'Unknown')
|
||
ORDER BY cu.report_date ASC, category ASC
|
||
```
|
||
2. The response shape (`{ categoryTrend: Array<{ report_date, category, count }> }`) does not change. Only the row count for multi-vertical dates changes (collapsing duplicates into sums).
|
||
|
||
#### Fix 4: `GET /summary` — disclose sibling uploads for the latest date
|
||
|
||
**Function**: `router.get('/summary', ...)` (around line 495)
|
||
|
||
**Specific Changes**:
|
||
1. Keep the existing `vertical IS NULL` → `vertical = 'NTS_AEO'` fallback for choosing the primary upload's `summary_json` (this preserves the legacy single-upload behavior).
|
||
2. After resolving `latestUpload`, run a second query to find sibling uploads sharing the same `report_date`:
|
||
```sql
|
||
SELECT id, vertical, uploaded_at
|
||
FROM compliance_uploads
|
||
WHERE report_date = $1 AND id != $2
|
||
ORDER BY id ASC
|
||
```
|
||
3. Add `multi_vertical_uploads` to the response when sibling uploads exist:
|
||
```javascript
|
||
res.json({
|
||
entries,
|
||
overall_scores: summary.overall_scores || {},
|
||
upload: { id, report_date, uploaded_at },
|
||
multi_vertical_uploads: siblings.map(s => ({ id: s.id, vertical: s.vertical, uploaded_at: s.uploaded_at })),
|
||
});
|
||
```
|
||
4. When no sibling uploads exist (single-upload-per-date case), `multi_vertical_uploads` is `[]` (or omitted — see open question in test plan).
|
||
|
||
This is the conservative option (b) from requirement 2.8 — return a documented selection plus metadata about siblings — rather than option (a) full server-side merge. Option (b) is chosen because (i) the `summary_json` schema is per-vertical and merging would require reconciliation logic that doesn't currently exist, and (ii) the existing fallback selection (NTS_AEO) is the established representative for the legacy AEO chart on the Compliance page.
|
||
|
||
#### Fix 5: `persistUpload()` snapshot block — filter and group by `vertical`
|
||
|
||
**Function**: `persistUpload()` (lines 81–192), specifically the `verticalStats` query at line 157
|
||
|
||
**Specific Changes**:
|
||
1. Determine the upload's `vertical` (read it from the upload row immediately after the `RETURNING id` insert, or accept it as a parameter to `persistUpload()`).
|
||
2. Replace the `verticalStats` query with one that filters by the upload's `vertical` and groups by `(vertical, team)`:
|
||
```sql
|
||
SELECT vertical, team,
|
||
COUNT(DISTINCT hostname)::int AS total_devices,
|
||
COUNT(DISTINCT CASE WHEN status = 'resolved' THEN hostname END)::int AS compliant,
|
||
COUNT(DISTINCT CASE WHEN status = 'active' THEN hostname END)::int AS non_compliant
|
||
FROM compliance_items
|
||
WHERE team IS NOT NULL AND vertical IS NOT DISTINCT FROM $1
|
||
GROUP BY vertical, team
|
||
```
|
||
(`IS NOT DISTINCT FROM` handles the legacy `vertical IS NULL` case correctly, so AEO-only uploads keep their previous semantics.)
|
||
3. The `INSERT ... ON CONFLICT (snapshot_month, vertical) DO UPDATE` already keys snapshots by `vertical`, so no change is required there. However, the `vertical` value passed in must come from the query result, not from `team AS vertical` (which conflates the team and vertical concepts).
|
||
4. If the per-snapshot-row "vertical" identity needs to remain `team` for back-compat reasons, leave the `INSERT` mapping unchanged but ensure the underlying counts are filtered to the upload's actual `vertical`. Confirm via inspection of `compliance_snapshots` consumers (`/vcl/stats`) before finalising.
|
||
|
||
## Testing Strategy
|
||
|
||
### Validation Approach
|
||
|
||
The bug condition is straightforward to construct: insert two `compliance_uploads` rows with the same `report_date` and matching `compliance_items`, then call each affected endpoint. The two-phase approach is to first run the tests against the unfixed code to confirm the duplication/silent-drop counterexamples, then run the same tests against the fixed code and add property-based tests that explore the input space more broadly.
|
||
|
||
### Exploratory Bug Condition Checking
|
||
|
||
**Goal**: Surface counterexamples that demonstrate each of the five manifestations BEFORE implementing the fix. Confirm or refute the root cause analysis for each endpoint independently — they share a structural cause but the SQL details differ.
|
||
|
||
**Test Plan**: Seed a clean test database with a fixture representing the original GitLab #12 scenario (three uploads for `2025-05-11`, one each for NTS_AEO, SDIT_CISO, TSI, with realistic `compliance_items`). Call each affected endpoint and assert the buggy invariants. Run on UNFIXED code first.
|
||
|
||
**Test Cases**:
|
||
|
||
1. **`/trends` Duplicate Date Test** — Insert three uploads for `2025-05-11` (verticals NTS_AEO, SDIT_CISO, TSI), each with distinct `new_count`/`recurring_count`/`resolved_count` and matching `compliance_items` per team. Call `GET /trends`. Assert `response.trends.filter(t => t.report_date === '2025-05-11').length === 1`. (will fail on unfixed code — returns 3)
|
||
|
||
2. **`/top-recurring` Duplicate Bar Test** — Same fixture. Call `GET /top-recurring`. Assert `response.waterfall.filter(w => w.date === '2025-05-11').length === 1` AND assert the running invariant `waterfall[i].end === waterfall[i].start + waterfall[i].new_count + waterfall[i].recurring_count - waterfall[i].resolved_count` holds for every `i`. (will fail on unfixed code — returns 3 bars and the running totals reflect mid-row state, not date-level aggregate)
|
||
|
||
3. **`/category-trend` Duplicate (date, category) Test** — Same fixture, plus items tagged with two categories (e.g., "Patching" and "Configuration"). Call `GET /category-trend`. Assert that for each `(report_date, category)` pair, `response.categoryTrend.filter(c => c.report_date === '2025-05-11' && c.category === 'Patching').length === 1`. (will fail on unfixed code — returns 3 rows per category)
|
||
|
||
4. **`/summary` Sibling Disclosure Test** — Same fixture (three uploads for `2025-05-11`, latest date). Call `GET /summary`. Assert either (a) the response merges `entries` from all three uploads, or (b) `response.multi_vertical_uploads.length === 2`. (will fail on unfixed code — silently picks one upload, the other two are dropped without any indication)
|
||
|
||
5. **`persistUpload()` Cross-Vertical Contamination Test** — Pre-populate `compliance_items` with rows from multiple verticals (e.g., NTS_AEO has 100 active items, SDIT_CISO has 50 active items). Call `persistUpload()` with a fresh SDIT_CISO upload. Read back the `compliance_snapshots` row for the current month and SDIT_CISO. Assert `total_devices` reflects only SDIT_CISO items, not the combined 150. (will fail on unfixed code — total includes both verticals)
|
||
|
||
6. **Edge Case — Single-Upload-Per-Date Regression Test** — Insert a fixture with a single upload per date for three dates. Call all four read endpoints and capture responses. Apply the fix, re-run, and assert response equality (byte-for-byte). (should pass on unfixed code; will pass on fixed code; protects the preservation property)
|
||
|
||
**Expected Counterexamples**:
|
||
- `/trends` returns N trend entries for a date with N uploads (N > 1). Cause: per-row `.map()` over uploads instead of date-level aggregation.
|
||
- `/top-recurring` returns N waterfall bars for a date with N uploads. Cause: same per-row pattern, plus `computeWaterfall()` carries `start` forward across the duplicate-date rows.
|
||
- `/category-trend` returns N × |categories| rows for a date with N uploads. Cause: `cu.id` is in the `GROUP BY` clause.
|
||
- `/summary` returns one upload's `summary_json` and silently drops siblings. Cause: `ORDER BY id DESC LIMIT 1` with no `report_date`-tie handling.
|
||
- `persistUpload()` writes inflated `total_devices`. Cause: missing `WHERE vertical = $1` and `GROUP BY vertical, team` in the snapshot query.
|
||
|
||
### Fix Checking
|
||
|
||
**Goal**: Verify that for all inputs where the bug condition holds (any `report_date` shared by two or more uploads), each fixed endpoint produces the expected aggregated/disclosed result.
|
||
|
||
**Pseudocode:**
|
||
```
|
||
FOR ALL (uploads, items) WHERE EXISTS report_date d WITH COUNT(uploads WHERE report_date = d) > 1 DO
|
||
trends_response := GET_trends_fixed(uploads, items)
|
||
waterfall_response := GET_top_recurring_fixed(uploads, items)
|
||
cattrend_response := GET_category_trend_fixed(uploads, items)
|
||
summary_response := GET_summary_fixed(uploads, items)
|
||
snapshot_rows := persistUpload_fixed(new_upload_for_some_vertical, items)
|
||
|
||
ASSERT one_entry_per_date(trends_response.trends)
|
||
ASSERT one_entry_per_date(waterfall_response.waterfall) AND running_invariant_holds(waterfall_response.waterfall)
|
||
ASSERT one_entry_per_date_category_pair(cattrend_response.categoryTrend)
|
||
ASSERT siblings_disclosed(summary_response, uploads)
|
||
ASSERT snapshots_filtered_to_vertical(snapshot_rows, new_upload.vertical, items)
|
||
END FOR
|
||
```
|
||
|
||
### Preservation Checking
|
||
|
||
**Goal**: Verify that for all inputs where the bug condition does NOT hold (every `report_date` has exactly one upload row), the fixed endpoints produce results identical to the original endpoints.
|
||
|
||
**Pseudocode:**
|
||
```
|
||
FOR ALL (uploads, items) WHERE FORALL report_date d, COUNT(uploads WHERE report_date = d) <= 1 DO
|
||
ASSERT GET_trends_original(uploads, items) = GET_trends_fixed(uploads, items)
|
||
ASSERT GET_top_recurring_original(uploads, items) = GET_top_recurring_fixed(uploads, items)
|
||
ASSERT GET_category_trend_original(uploads, items) = GET_category_trend_fixed(uploads, items)
|
||
ASSERT GET_summary_original(uploads, items) = GET_summary_fixed(uploads, items)
|
||
ASSERT persistUpload_original(upload, items).snapshots = persistUpload_fixed(upload, items).snapshots
|
||
END FOR
|
||
```
|
||
|
||
**Testing Approach**: Property-based testing is the right fit for preservation checking here:
|
||
- The single-upload-per-date input space is large (any number of dates, any combination of counts, any team distribution, any category mix, any vertical), and exhaustive enumeration is impractical.
|
||
- The preservation property is a strict equality, which is well-suited to PBT shrinking (any counterexample is a small fixture demonstrating a behavior change).
|
||
- The legacy AEO-only data shape (`vertical IS NULL`) must be exercised, which falls naturally out of generators that include null verticals.
|
||
|
||
**Test Plan**: Capture responses from the unfixed code on single-upload-per-date fixtures (snapshot tests). After applying the fix, re-run the same fixtures and assert equality. Then run a property-based generator that produces random single-upload-per-date scenarios and asserts the same equality.
|
||
|
||
**Test Cases**:
|
||
1. **Snapshot Equality — Empty State** — Empty `compliance_uploads`. All four endpoints return their documented empty-state shapes. Snapshot-test before and after the fix.
|
||
2. **Snapshot Equality — Single AEO-Only Upload** — One upload with `vertical IS NULL`, classic legacy fixture. Capture pre-fix responses, apply fix, assert equality.
|
||
3. **Snapshot Equality — Multiple Single-Upload Dates** — Five dates, one upload each, varied `vertical` values. Capture pre-fix responses, apply fix, assert equality.
|
||
4. **`/summary` Team Filter Preservation** — Latest upload exists, `?team=STEAM` parameter is supplied. Assert `entries` is filtered to `team === 'STEAM'` rows. Assert non-`ALLOWED_TEAMS` value (e.g., `?team=OTHER`) returns HTTP 400.
|
||
5. **`persistUpload()` Snapshot Equality — Single-Vertical Month** — Pre-populate `compliance_items` with rows from a single vertical only. Run `persistUpload()` for that vertical. Assert the resulting `compliance_snapshots` rows are identical pre-fix and post-fix.
|
||
6. **Error Path Preservation** — Force a snapshot query failure (e.g., transient DB error). Assert `persistUpload()` still commits the upload and the error is logged but not surfaced to the caller.
|
||
|
||
### Unit Tests
|
||
|
||
- `/trends` aggregation: two uploads sharing a `report_date`, one upload alone for an earlier date. Assert response has 2 entries and `new_count` for the shared date equals the sum of the two uploads.
|
||
- `/top-recurring` aggregation and running totals: same fixture as above. Assert 2 waterfall entries and the running `start`/`end` invariant.
|
||
- `/category-trend` aggregation: two uploads sharing a `report_date`, items tagged with two categories. Assert one row per `(date, category)` pair with summed counts.
|
||
- `/summary` sibling disclosure: three uploads sharing the latest date. Assert response shape matches the chosen disclosure approach (option (b)).
|
||
- `/summary` team filter: same upload, with and without `?team=STEAM`.
|
||
- `persistUpload()` per-vertical snapshot: items in two verticals, run upload for one, assert snapshots for that vertical do not include the other vertical's items.
|
||
- `persistUpload()` legacy AEO-only path (`vertical IS NULL`): unchanged behavior.
|
||
|
||
### Property-Based Tests
|
||
|
||
- **`/trends` aggregation property** — Generate a random list of `(report_date, new_count, recurring_count, resolved_count)` upload tuples (with possible date collisions). Generate matching per-team item counts. Assert the response has exactly one entry per unique `report_date` AND for each entry, `new_count` equals the SUM of input `new_count`s for that date (likewise the other count fields and per-team counts).
|
||
- **`/top-recurring` running invariant property** — Same generator. Assert the response has one bar per unique `report_date` AND for every adjacent pair of entries, `entry[i].start === entry[i-1].end`, AND `entry[i].end === entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count`.
|
||
- **`/category-trend` total-conservation property** — Generate a random set of `compliance_items` and uploads. Assert `SUM(response.categoryTrend.map(c => c.count)) === total number of compliance_items joined to non-null-report_date uploads`. This holds whether or not any date has multiple uploads.
|
||
- **`/summary` sibling-disclosure property** — Generate a random set of uploads with possible duplicate `report_dates`. Pick the latest date. Assert that if any sibling upload exists for that date, the response contains a non-empty `multi_vertical_uploads` array referencing every sibling upload's id.
|
||
- **`persistUpload()` vertical-isolation property** — Generate two non-empty disjoint sets of `compliance_items`, one per vertical. Insert both. Run `persistUpload()` for vertical A. Assert the resulting `compliance_snapshots` rows for vertical A reflect only set-A items (count of distinct hostnames matches).
|
||
- **Cross-endpoint preservation property** — Generate any fixture where every `report_date` has exactly one upload row. Assert all five fixed endpoints produce byte-for-byte identical results to the original endpoints.
|
||
|
||
### Integration Tests
|
||
|
||
- Full upload-to-chart flow: upload three xlsx files (one per vertical) with the same `report_date` via `POST /preview` + `POST /commit`, then call `/trends`, `/top-recurring`, `/category-trend`, `/summary` and verify all four return the expected aggregated/disclosed results.
|
||
- Compliance Charts panel render: load `ComplianceChartsPanel.js` with a multi-vertical-day fixture and assert (via DOM snapshot) the x-axis shows each date exactly once on `Active Findings Over Time` and `Change per Report Cycle`.
|
||
- Snapshot consumer regression: after running `persistUpload()` with the fix, call `/vcl/stats` (which reads `compliance_snapshots`) and verify per-vertical `compliance_pct` is unchanged from the pre-fix value when only one vertical's items are present, and is corrected when multiple verticals are present.
|
||
|
||
### Test Fixtures Required
|
||
|
||
The following fixtures are needed and can be reused across all five endpoints' tests:
|
||
|
||
1. **`fixture_empty`** — No `compliance_uploads`, no `compliance_items`. Used by the empty-state preservation tests.
|
||
|
||
2. **`fixture_single_upload_aeo_legacy`** — One `compliance_uploads` row with `vertical IS NULL`, `report_date = '2025-04-01'`, with ~20 `compliance_items` distributed across the four teams. Used by the legacy-path preservation tests.
|
||
|
||
3. **`fixture_single_upload_per_date`** — Five `compliance_uploads` rows, each with a distinct `report_date` (`2025-04-01` through `2025-05-01`), each with a distinct `vertical` value among `{NTS_AEO, SDIT_CISO, TSI, NULL, NTS_AEO}`. Used by the broader preservation tests and by `/category-trend` total-conservation.
|
||
|
||
4. **`fixture_multi_vertical_single_date`** — Three `compliance_uploads` rows all with `report_date = '2025-05-11'`, verticals NTS_AEO/SDIT_CISO/TSI, each with distinct `new_count`/`recurring_count`/`resolved_count` and 5–10 `compliance_items` per upload spanning multiple teams and categories. This is the canonical bug-condition fixture and reproduces the original GitLab #12 scenario.
|
||
|
||
5. **`fixture_mixed_history`** — Combination of `fixture_single_upload_per_date` and `fixture_multi_vertical_single_date` — multiple dates, some with single uploads, some with two or three. Used by the property-based tests as a realistic state-of-the-world fixture.
|
||
|
||
6. **`fixture_cross_vertical_items`** — Two non-empty disjoint sets of `compliance_items`, one tagged `vertical = 'NTS_AEO'` and one tagged `vertical = 'SDIT_CISO'`, sharing some hostnames between verticals to ensure the count-distinct logic is exercised. Used by the `persistUpload()` vertical-isolation tests.
|
||
|
||
7. **`fixture_pbt_generators`** — fast-check (or equivalent) arbitraries:
|
||
- `arbReportDate`: ISO date string in a bounded range (e.g., last 90 days).
|
||
- `arbVertical`: oneof `'NTS_AEO' | 'SDIT_CISO' | 'TSI' | null`.
|
||
- `arbUpload`: `{ report_date, vertical, new_count, recurring_count, resolved_count }` with non-negative integer counts.
|
||
- `arbItem`: `{ hostname, team in ALLOWED_TEAMS, category in {Patching, Configuration, Vulnerability, Other}, vertical, status in {active, resolved} }`.
|
||
- `arbScenario`: `{ uploads: arbUpload[], items: arbItem[] }`, where items reference uploads via `upload_id` and dates can collide.
|