Files
cve-dashboard/.kiro/specs/compliance-duplicate-chart-entries/design.md
2026-05-19 15:01:25 -06:00

34 KiB
Raw Blame History

Compliance Duplicate Chart Entries Bugfix Design

Overview

Five compliance endpoints (GET /trends, GET /top-recurring, GET /category-trend, GET /summary) and the compliance_snapshots block inside persistUpload() all share the same root cause: they key by compliance_uploads.id (one row per uploaded xlsx) instead of by compliance_uploads.report_date (the calendar date the report covers). Because the compliance pipeline accepts one xlsx per vertical (NTS_AEO, SDIT_CISO, TSI), a single report_date typically maps to several compliance_uploads rows, and any query that does not aggregate over report_date produces duplicated, fragmented, or silently dropped data.

The fix is uniform across endpoints: rewrite the SQL so the result set has exactly one row per unique report_date, using GROUP BY report_date with SUM aggregations for count-style endpoints and DISTINCT ON (report_date) for the latest-snapshot endpoint. The persistUpload() snapshot block is fixed by adding a vertical filter so per-vertical snapshots are no longer cross-contaminated by other verticals' items.

The implementation is intentionally minimal: each fix changes a single SQL statement (and, in one case, a small JavaScript loop). No frontend changes are required — the chart components already key on report_date and will render correctly once the API returns one row per date.

Glossary

  • Bug_Condition (C): The condition that triggers the bug — two or more rows in compliance_uploads share the same report_date (i.e., a multi-vertical upload day).
  • Property (P): The desired behavior when C holds — each affected endpoint returns exactly one entry per unique report_date, and the values aggregated across uploads for that date reconcile with the underlying compliance_items totals.
  • Preservation: Behavior on dates with a single upload row, on the empty-data response shape, and on unrelated query parameters (e.g., team filter on /summary) — all must be byte-for-byte unchanged.
  • report_date: TEXT column on compliance_uploads storing the reporting period the xlsx covers (e.g., 2025-05-11). One date can have multiple upload rows when multiple verticals are uploaded for that date.
  • vertical: TEXT column on compliance_uploads and compliance_items identifying which xlsx (NTS_AEO, SDIT_CISO, TSI) an upload or item belongs to. NULL indicates a legacy AEO-only upload.
  • persistUpload(): Function in backend/routes/compliance.js (lines 81192) that writes a parsed upload to the DB inside a transaction and then writes per-vertical snapshots into compliance_snapshots.
  • computeWaterfall(uploads): Pure helper in backend/routes/compliance.js (lines 235243) that takes an ordered list of upload rows and emits one waterfall entry per row, carrying the running start forward.

Bug Details

Bug Condition

The bug manifests when two or more compliance_uploads rows share the same report_date. This happens whenever the operator uploads more than one vertical xlsx for the same reporting cycle (the documented multi-vertical workflow). The five affected code paths each produce one row per upload instead of aggregating to one row per report_date.

Formal Specification:

FUNCTION isBugCondition(uploads)
  INPUT: uploads — list of compliance_uploads rows
  OUTPUT: boolean

  // The bug condition is triggered for any report_date that has more than one upload row
  GROUP uploads BY report_date INTO groups
  RETURN EXISTS group IN groups WHERE COUNT(group) > 1
END FUNCTION

For a single endpoint response to be considered buggy, the API output must additionally fail one of the following invariants (the per-endpoint manifestation of the same root cause):

FUNCTION isBuggyResponse(endpoint, response)
  CASE endpoint OF
    '/trends':         RETURN COUNT(response.trends)        != COUNT(DISTINCT report_date IN compliance_uploads)
    '/top-recurring':  RETURN COUNT(response.waterfall)     != COUNT(DISTINCT report_date IN compliance_uploads)
    '/category-trend': RETURN EXISTS (date, category) WITH COUNT(*) > 1 IN response.categoryTrend
    '/summary':        RETURN response.upload represents only one of N>1 uploads sharing the latest report_date
                              AND no flag indicates other uploads exist for that date
    'persistUpload':   RETURN snapshots.total_devices > items_belonging_to_this_vertical_only
  END CASE
END FUNCTION

Examples

The originally reported case (GitLab issue #12, 2025-05-11) and the four sibling manifestations:

  • /trends — STEAM uploads three xlsx files for 2025-05-11 (one per vertical). The chart shows three "05/11/25" entries on the x-axis instead of one. Expected: a single 05/11/25 point whose new_count/recurring_count/resolved_count/total_active are the sums of the three uploads' counts.

  • /top-recurring — Same three uploads. computeWaterfall() receives three rows for 2025-05-11 and emits three bars stacked on the same date. Worse, because start carries forward across rows, the second and third bars' start reflects the first/second row's end, so the three bars in aggregate misrepresent the date-level deltas. Expected: one bar for 2025-05-11 whose new_count/recurring_count/resolved_count are summed across the three uploads, and whose start carries from the previous date's end.

  • /category-trend — Same three uploads, each with category-tagged items. The query groups by (cu.id, cu.report_date, category) and returns up to 3 × |categories| rows for 2025-05-11. The frontend stacks these as duplicated category bars per date. Expected: one row per (2025-05-11, category) pair with count summed across the three uploads.

  • /summary — On 2025-05-11, three uploads exist. The query WHERE vertical IS NULL ORDER BY id DESC LIMIT 1 (with fallback to vertical = 'NTS_AEO') silently picks one and the other two verticals' summary_json is dropped. Expected: either the response merges all three uploads' entries and overall_scores, or the response includes a multi_vertical_uploads array identifying the other uploads that exist for the same report_date so the caller knows the response is partial.

  • Edge case — persistUpload() snapshot — When SDIT_CISO is being persisted on 2025-05-11, the snapshot query reads compliance_items WHERE team IS NOT NULL with no vertical filter, so the resulting per-team total_devices/compliant/non_compliant counts include items that belong to NTS_AEO and TSI as well. Expected: the snapshot query filters by the upload's vertical and groups by (vertical, team).

Expected Behavior

Preservation Requirements

Unchanged Behaviors:

  • Single-upload-per-date dates (legacy AEO-only workflow): every endpoint returns the same numbers, in the same shape, in the same order as before the fix.
  • Empty-data responses: /trends returns { trends: [] }, /top-recurring returns { waterfall: [] }, /category-trend returns { categoryTrend: [] }, /summary returns { entries: [], overall_scores: {}, upload: null }.
  • /summary team query parameter: still filters entries server-side, still rejects non-ALLOWED_TEAMS values with HTTP 400.
  • /summary vertical IS NULLvertical = 'NTS_AEO' fallback for selecting which upload's summary_json to surface (only the additional metadata about sibling uploads is new).
  • persistUpload() error handling: snapshot creation remains wrapped in a try/catch that logs but does not fail the upload commit.
  • compliance_snapshots rows for months with only a single vertical present in compliance_items: identical values to the pre-fix output.
  • Frontend chart components: no changes required. They already key on report_date and consume the existing response shapes.

Scope: All endpoint inputs that do not involve report_date collisions (single-upload dates, empty datasets, error paths, query-parameter filtering) must be byte-for-byte identical to the pre-fix output. The fix only changes what happens when two or more compliance_uploads rows share a report_date.

Hypothesized Root Cause

All five sites have the same shape of bug — keying by id instead of report_date — but with slightly different mechanics. Listing them explicitly so the test plan can confirm or refute each one:

  1. /trends — per-row mapping over uploads. The handler runs SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC and .map()s each row into a trend entry. Per-team counts are pre-aggregated by upload_id and looked up by u.id, so duplicate-date rows produce duplicate-date trend entries with split per-team counts.

  2. /top-recurringcomputeWaterfall() receives per-row data. The query is identical to /trends's upload query and computeWaterfall() carries a stateful start forward across rows. Three rows for the same date become three bars whose start/end running totals are wrong relative to the date-level aggregate.

  3. /category-trendGROUP BY cu.id, cu.report_date, category. Including cu.id in the GROUP BY defeats date-level aggregation; one upload row's items get their own (date, category) group instead of summing into the date-level group.

  4. /summaryORDER BY id DESC LIMIT 1. The query selects a single representative upload for the latest date and discards every other upload sharing that date. This is a "select latest by row id" pattern that does not consider report_date ties.

  5. persistUpload() snapshot block — missing vertical filter. The snapshot query reads compliance_items WHERE team IS NOT NULL GROUP BY team with no vertical predicate. The query was correct when there was one vertical (AEO-only legacy) and silently broke when the multi-vertical migration added a vertical column without updating this query.

The common structural cause is that the multi-vertical migration (add_vcl_multi_vertical.js) added a vertical column to compliance_uploads and compliance_items but did not audit existing read queries for the new "many uploads share a report_date" reality.

Correctness Properties

Property 1: Bug Condition — /trends returns one entry per unique report_date

For any set of compliance_uploads rows where two or more rows share a report_date, the response from GET /trends SHALL contain exactly one entry per unique report_date, with new_count, recurring_count, resolved_count, and total_active equal to the SUM of those columns over all uploads sharing that date, and per-team counts equal to the sum of compliance_items rows for that team across all those uploads.

Validates: Requirements 2.1, 2.2, 2.3

Property 2: Bug Condition — /top-recurring waterfall has one bar per unique report_date with correct running totals

For any set of compliance_uploads rows where two or more rows share a report_date, the response from GET /top-recurring SHALL contain exactly one waterfall entry per unique report_date, the entry's new_count/recurring_count/resolved_count SHALL equal the sum of those columns over all uploads sharing that date, and the running invariant entry[i].end == entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count SHALL hold with entry[i].start == entry[i-1].end for adjacent entries (and entry[0].start == 0).

Validates: Requirements 2.4, 2.5

Property 3: Bug Condition — /category-trend returns one row per (date, category)

For any set of compliance_uploads and compliance_items rows, the response from GET /category-trend SHALL contain exactly one entry per unique (report_date, category) pair, and each entry's count SHALL equal the total number of compliance_items for that category across every upload sharing that report_date.

Validates: Requirements 2.6, 2.7

Property 4: Bug Condition — /summary does not silently drop sibling uploads

For any set of compliance_uploads rows where two or more rows share the latest report_date, the response from GET /summary SHALL either (a) include a merged view of all sibling uploads' entries and overall_scores, or (b) include a non-empty multi_vertical_uploads field listing the IDs and verticals of the other uploads for that date that were not used to populate the response. The response SHALL NOT silently drop sibling uploads.

Validates: Requirements 2.8, 2.9

Property 5: Bug Condition — persistUpload() snapshot reflects only the snapshotted vertical

For any persistUpload() invocation with a non-NULL vertical, the rows written into compliance_snapshots for the current month SHALL have total_devices, compliant, and non_compliant values equal to the counts derived from compliance_items filtered to the snapshotted vertical only. No item from another vertical SHALL contribute to those counts.

Validates: Requirements 2.10, 2.11

Property 6: Preservation — Per-endpoint cross-date sums equal source-data totals

For any set of uploads, summing new_count (and likewise recurring_count, resolved_count) across every entry in GET /trends SHALL equal the corresponding SUM(new_count) over compliance_uploads. Similarly, summing count across every entry in GET /category-trend SHALL equal COUNT(*) of compliance_items joined to compliance_uploads. This holds whether or not any date has duplicate uploads.

Validates: Requirements 3.1, 3.2

Property 7: Preservation — Single-upload-per-date dates are unchanged

For any set of compliance_uploads where every report_date has exactly one row, the responses from /trends, /top-recurring, /category-trend, and /summary (and the compliance_snapshots rows written by persistUpload()) SHALL be identical to the pre-fix output for the same input. The fix SHALL NOT change behavior on the single-upload-per-date case.

Validates: Requirements 3.1, 3.4, 3.5, 3.6, 3.8

Property 8: Preservation — Empty-data and error-path responses are unchanged

For any empty dataset (no uploads, no matching items, no items in a category), each affected endpoint SHALL return the same empty-state response shape as before the fix. /summary with a non-ALLOWED_TEAMS team parameter SHALL still respond 400. persistUpload() snapshot errors SHALL still be caught and logged without failing the upload commit.

Validates: Requirements 3.3, 3.7, 3.9, 3.10

Fix Implementation

Changes Required

All changes are in backend/routes/compliance.js. No schema migration, no new column, no frontend change.

Function: router.get('/trends', ...) (around line 768)

Specific Changes:

  1. Replace the compliance_uploads query so it groups by report_date and sums the count columns:
    SELECT report_date,
           SUM(COALESCE(new_count, 0))::int       AS new_count,
           SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
           SUM(COALESCE(resolved_count, 0))::int  AS resolved_count,
           SUM(COALESCE(new_count, 0) + COALESCE(recurring_count, 0))::int AS total_active
    FROM compliance_uploads
    WHERE report_date IS NOT NULL
    GROUP BY report_date
    ORDER BY report_date ASC
    
  2. Replace the per-team compliance_items query so it joins to compliance_uploads and groups by (report_date, team) instead of (upload_id, team):
    SELECT cu.report_date, ci.team, COUNT(ci.id)::int AS count
    FROM compliance_items ci
    JOIN compliance_uploads cu ON ci.upload_id = cu.id
    WHERE ci.team IS NOT NULL AND cu.report_date IS NOT NULL
    GROUP BY cu.report_date, ci.team
    
  3. Change the teamMap keyed lookup from teamMap[u.id] to teamMap[u.report_date] and rebuild trends from the per-date upload rows.

Fix 2: GET /top-recurring — aggregate uploads by report_date before passing to computeWaterfall()

Function: router.get('/top-recurring', ...) (around line 818)

Specific Changes:

  1. Replace the query with the same GROUP BY report_date pattern used in /trends (without id, since computeWaterfall() only needs report_date, new_count, recurring_count, resolved_count):
    SELECT report_date,
           SUM(COALESCE(new_count, 0))::int       AS new_count,
           SUM(COALESCE(recurring_count, 0))::int AS recurring_count,
           SUM(COALESCE(resolved_count, 0))::int  AS resolved_count
    FROM compliance_uploads
    WHERE report_date IS NOT NULL
    GROUP BY report_date
    ORDER BY report_date ASC
    
  2. computeWaterfall() itself does not change — it already advances start correctly when fed one row per date. The fix is purely in the SQL.

Fix 3: GET /category-trend — drop cu.id from GROUP BY

Function: router.get('/category-trend', ...) (around line 838)

Specific Changes:

  1. Remove cu.id from the GROUP BY clause so the grouping is by (report_date, category) only:
    SELECT cu.report_date,
           COALESCE(ci.category, 'Unknown') AS category,
           COUNT(ci.id)::int AS count
    FROM compliance_uploads cu
    JOIN compliance_items ci ON ci.upload_id = cu.id
    WHERE cu.report_date IS NOT NULL
    GROUP BY cu.report_date, COALESCE(ci.category, 'Unknown')
    ORDER BY cu.report_date ASC, category ASC
    
  2. The response shape ({ categoryTrend: Array<{ report_date, category, count }> }) does not change. Only the row count for multi-vertical dates changes (collapsing duplicates into sums).

Fix 4: GET /summary — disclose sibling uploads for the latest date

Function: router.get('/summary', ...) (around line 495)

Specific Changes:

  1. Keep the existing vertical IS NULLvertical = 'NTS_AEO' fallback for choosing the primary upload's summary_json (this preserves the legacy single-upload behavior).
  2. After resolving latestUpload, run a second query to find sibling uploads sharing the same report_date:
    SELECT id, vertical, uploaded_at
    FROM compliance_uploads
    WHERE report_date = $1 AND id != $2
    ORDER BY id ASC
    
  3. Add multi_vertical_uploads to the response when sibling uploads exist:
    res.json({
      entries,
      overall_scores: summary.overall_scores || {},
      upload: { id, report_date, uploaded_at },
      multi_vertical_uploads: siblings.map(s => ({ id: s.id, vertical: s.vertical, uploaded_at: s.uploaded_at })),
    });
    
  4. When no sibling uploads exist (single-upload-per-date case), multi_vertical_uploads is [] (or omitted — see open question in test plan).

This is the conservative option (b) from requirement 2.8 — return a documented selection plus metadata about siblings — rather than option (a) full server-side merge. Option (b) is chosen because (i) the summary_json schema is per-vertical and merging would require reconciliation logic that doesn't currently exist, and (ii) the existing fallback selection (NTS_AEO) is the established representative for the legacy AEO chart on the Compliance page.

Fix 5: persistUpload() snapshot block — filter and group by vertical

Function: persistUpload() (lines 81192), specifically the verticalStats query at line 157

Specific Changes:

  1. Determine the upload's vertical (read it from the upload row immediately after the RETURNING id insert, or accept it as a parameter to persistUpload()).
  2. Replace the verticalStats query with one that filters by the upload's vertical and groups by (vertical, team):
    SELECT vertical, team,
           COUNT(DISTINCT hostname)::int AS total_devices,
           COUNT(DISTINCT CASE WHEN status = 'resolved' THEN hostname END)::int AS compliant,
           COUNT(DISTINCT CASE WHEN status = 'active'   THEN hostname END)::int AS non_compliant
    FROM compliance_items
    WHERE team IS NOT NULL AND vertical IS NOT DISTINCT FROM $1
    GROUP BY vertical, team
    
    (IS NOT DISTINCT FROM handles the legacy vertical IS NULL case correctly, so AEO-only uploads keep their previous semantics.)
  3. The INSERT ... ON CONFLICT (snapshot_month, vertical) DO UPDATE already keys snapshots by vertical, so no change is required there. However, the vertical value passed in must come from the query result, not from team AS vertical (which conflates the team and vertical concepts).
  4. If the per-snapshot-row "vertical" identity needs to remain team for back-compat reasons, leave the INSERT mapping unchanged but ensure the underlying counts are filtered to the upload's actual vertical. Confirm via inspection of compliance_snapshots consumers (/vcl/stats) before finalising.

Testing Strategy

Validation Approach

The bug condition is straightforward to construct: insert two compliance_uploads rows with the same report_date and matching compliance_items, then call each affected endpoint. The two-phase approach is to first run the tests against the unfixed code to confirm the duplication/silent-drop counterexamples, then run the same tests against the fixed code and add property-based tests that explore the input space more broadly.

Exploratory Bug Condition Checking

Goal: Surface counterexamples that demonstrate each of the five manifestations BEFORE implementing the fix. Confirm or refute the root cause analysis for each endpoint independently — they share a structural cause but the SQL details differ.

Test Plan: Seed a clean test database with a fixture representing the original GitLab #12 scenario (three uploads for 2025-05-11, one each for NTS_AEO, SDIT_CISO, TSI, with realistic compliance_items). Call each affected endpoint and assert the buggy invariants. Run on UNFIXED code first.

Test Cases:

  1. /trends Duplicate Date Test — Insert three uploads for 2025-05-11 (verticals NTS_AEO, SDIT_CISO, TSI), each with distinct new_count/recurring_count/resolved_count and matching compliance_items per team. Call GET /trends. Assert response.trends.filter(t => t.report_date === '2025-05-11').length === 1. (will fail on unfixed code — returns 3)

  2. /top-recurring Duplicate Bar Test — Same fixture. Call GET /top-recurring. Assert response.waterfall.filter(w => w.date === '2025-05-11').length === 1 AND assert the running invariant waterfall[i].end === waterfall[i].start + waterfall[i].new_count + waterfall[i].recurring_count - waterfall[i].resolved_count holds for every i. (will fail on unfixed code — returns 3 bars and the running totals reflect mid-row state, not date-level aggregate)

  3. /category-trend Duplicate (date, category) Test — Same fixture, plus items tagged with two categories (e.g., "Patching" and "Configuration"). Call GET /category-trend. Assert that for each (report_date, category) pair, response.categoryTrend.filter(c => c.report_date === '2025-05-11' && c.category === 'Patching').length === 1. (will fail on unfixed code — returns 3 rows per category)

  4. /summary Sibling Disclosure Test — Same fixture (three uploads for 2025-05-11, latest date). Call GET /summary. Assert either (a) the response merges entries from all three uploads, or (b) response.multi_vertical_uploads.length === 2. (will fail on unfixed code — silently picks one upload, the other two are dropped without any indication)

  5. persistUpload() Cross-Vertical Contamination Test — Pre-populate compliance_items with rows from multiple verticals (e.g., NTS_AEO has 100 active items, SDIT_CISO has 50 active items). Call persistUpload() with a fresh SDIT_CISO upload. Read back the compliance_snapshots row for the current month and SDIT_CISO. Assert total_devices reflects only SDIT_CISO items, not the combined 150. (will fail on unfixed code — total includes both verticals)

  6. Edge Case — Single-Upload-Per-Date Regression Test — Insert a fixture with a single upload per date for three dates. Call all four read endpoints and capture responses. Apply the fix, re-run, and assert response equality (byte-for-byte). (should pass on unfixed code; will pass on fixed code; protects the preservation property)

Expected Counterexamples:

  • /trends returns N trend entries for a date with N uploads (N > 1). Cause: per-row .map() over uploads instead of date-level aggregation.
  • /top-recurring returns N waterfall bars for a date with N uploads. Cause: same per-row pattern, plus computeWaterfall() carries start forward across the duplicate-date rows.
  • /category-trend returns N × |categories| rows for a date with N uploads. Cause: cu.id is in the GROUP BY clause.
  • /summary returns one upload's summary_json and silently drops siblings. Cause: ORDER BY id DESC LIMIT 1 with no report_date-tie handling.
  • persistUpload() writes inflated total_devices. Cause: missing WHERE vertical = $1 and GROUP BY vertical, team in the snapshot query.

Fix Checking

Goal: Verify that for all inputs where the bug condition holds (any report_date shared by two or more uploads), each fixed endpoint produces the expected aggregated/disclosed result.

Pseudocode:

FOR ALL (uploads, items) WHERE EXISTS report_date d WITH COUNT(uploads WHERE report_date = d) > 1 DO
  trends_response       := GET_trends_fixed(uploads, items)
  waterfall_response    := GET_top_recurring_fixed(uploads, items)
  cattrend_response     := GET_category_trend_fixed(uploads, items)
  summary_response      := GET_summary_fixed(uploads, items)
  snapshot_rows         := persistUpload_fixed(new_upload_for_some_vertical, items)

  ASSERT one_entry_per_date(trends_response.trends)
  ASSERT one_entry_per_date(waterfall_response.waterfall) AND running_invariant_holds(waterfall_response.waterfall)
  ASSERT one_entry_per_date_category_pair(cattrend_response.categoryTrend)
  ASSERT siblings_disclosed(summary_response, uploads)
  ASSERT snapshots_filtered_to_vertical(snapshot_rows, new_upload.vertical, items)
END FOR

Preservation Checking

Goal: Verify that for all inputs where the bug condition does NOT hold (every report_date has exactly one upload row), the fixed endpoints produce results identical to the original endpoints.

Pseudocode:

FOR ALL (uploads, items) WHERE FORALL report_date d, COUNT(uploads WHERE report_date = d) <= 1 DO
  ASSERT GET_trends_original(uploads, items)        = GET_trends_fixed(uploads, items)
  ASSERT GET_top_recurring_original(uploads, items) = GET_top_recurring_fixed(uploads, items)
  ASSERT GET_category_trend_original(uploads, items) = GET_category_trend_fixed(uploads, items)
  ASSERT GET_summary_original(uploads, items)        = GET_summary_fixed(uploads, items)
  ASSERT persistUpload_original(upload, items).snapshots = persistUpload_fixed(upload, items).snapshots
END FOR

Testing Approach: Property-based testing is the right fit for preservation checking here:

  • The single-upload-per-date input space is large (any number of dates, any combination of counts, any team distribution, any category mix, any vertical), and exhaustive enumeration is impractical.
  • The preservation property is a strict equality, which is well-suited to PBT shrinking (any counterexample is a small fixture demonstrating a behavior change).
  • The legacy AEO-only data shape (vertical IS NULL) must be exercised, which falls naturally out of generators that include null verticals.

Test Plan: Capture responses from the unfixed code on single-upload-per-date fixtures (snapshot tests). After applying the fix, re-run the same fixtures and assert equality. Then run a property-based generator that produces random single-upload-per-date scenarios and asserts the same equality.

Test Cases:

  1. Snapshot Equality — Empty State — Empty compliance_uploads. All four endpoints return their documented empty-state shapes. Snapshot-test before and after the fix.
  2. Snapshot Equality — Single AEO-Only Upload — One upload with vertical IS NULL, classic legacy fixture. Capture pre-fix responses, apply fix, assert equality.
  3. Snapshot Equality — Multiple Single-Upload Dates — Five dates, one upload each, varied vertical values. Capture pre-fix responses, apply fix, assert equality.
  4. /summary Team Filter Preservation — Latest upload exists, ?team=STEAM parameter is supplied. Assert entries is filtered to team === 'STEAM' rows. Assert non-ALLOWED_TEAMS value (e.g., ?team=OTHER) returns HTTP 400.
  5. persistUpload() Snapshot Equality — Single-Vertical Month — Pre-populate compliance_items with rows from a single vertical only. Run persistUpload() for that vertical. Assert the resulting compliance_snapshots rows are identical pre-fix and post-fix.
  6. Error Path Preservation — Force a snapshot query failure (e.g., transient DB error). Assert persistUpload() still commits the upload and the error is logged but not surfaced to the caller.

Unit Tests

  • /trends aggregation: two uploads sharing a report_date, one upload alone for an earlier date. Assert response has 2 entries and new_count for the shared date equals the sum of the two uploads.
  • /top-recurring aggregation and running totals: same fixture as above. Assert 2 waterfall entries and the running start/end invariant.
  • /category-trend aggregation: two uploads sharing a report_date, items tagged with two categories. Assert one row per (date, category) pair with summed counts.
  • /summary sibling disclosure: three uploads sharing the latest date. Assert response shape matches the chosen disclosure approach (option (b)).
  • /summary team filter: same upload, with and without ?team=STEAM.
  • persistUpload() per-vertical snapshot: items in two verticals, run upload for one, assert snapshots for that vertical do not include the other vertical's items.
  • persistUpload() legacy AEO-only path (vertical IS NULL): unchanged behavior.

Property-Based Tests

  • /trends aggregation property — Generate a random list of (report_date, new_count, recurring_count, resolved_count) upload tuples (with possible date collisions). Generate matching per-team item counts. Assert the response has exactly one entry per unique report_date AND for each entry, new_count equals the SUM of input new_counts for that date (likewise the other count fields and per-team counts).
  • /top-recurring running invariant property — Same generator. Assert the response has one bar per unique report_date AND for every adjacent pair of entries, entry[i].start === entry[i-1].end, AND entry[i].end === entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count.
  • /category-trend total-conservation property — Generate a random set of compliance_items and uploads. Assert SUM(response.categoryTrend.map(c => c.count)) === total number of compliance_items joined to non-null-report_date uploads. This holds whether or not any date has multiple uploads.
  • /summary sibling-disclosure property — Generate a random set of uploads with possible duplicate report_dates. Pick the latest date. Assert that if any sibling upload exists for that date, the response contains a non-empty multi_vertical_uploads array referencing every sibling upload's id.
  • persistUpload() vertical-isolation property — Generate two non-empty disjoint sets of compliance_items, one per vertical. Insert both. Run persistUpload() for vertical A. Assert the resulting compliance_snapshots rows for vertical A reflect only set-A items (count of distinct hostnames matches).
  • Cross-endpoint preservation property — Generate any fixture where every report_date has exactly one upload row. Assert all five fixed endpoints produce byte-for-byte identical results to the original endpoints.

Integration Tests

  • Full upload-to-chart flow: upload three xlsx files (one per vertical) with the same report_date via POST /preview + POST /commit, then call /trends, /top-recurring, /category-trend, /summary and verify all four return the expected aggregated/disclosed results.
  • Compliance Charts panel render: load ComplianceChartsPanel.js with a multi-vertical-day fixture and assert (via DOM snapshot) the x-axis shows each date exactly once on Active Findings Over Time and Change per Report Cycle.
  • Snapshot consumer regression: after running persistUpload() with the fix, call /vcl/stats (which reads compliance_snapshots) and verify per-vertical compliance_pct is unchanged from the pre-fix value when only one vertical's items are present, and is corrected when multiple verticals are present.

Test Fixtures Required

The following fixtures are needed and can be reused across all five endpoints' tests:

  1. fixture_empty — No compliance_uploads, no compliance_items. Used by the empty-state preservation tests.

  2. fixture_single_upload_aeo_legacy — One compliance_uploads row with vertical IS NULL, report_date = '2025-04-01', with ~20 compliance_items distributed across the four teams. Used by the legacy-path preservation tests.

  3. fixture_single_upload_per_date — Five compliance_uploads rows, each with a distinct report_date (2025-04-01 through 2025-05-01), each with a distinct vertical value among {NTS_AEO, SDIT_CISO, TSI, NULL, NTS_AEO}. Used by the broader preservation tests and by /category-trend total-conservation.

  4. fixture_multi_vertical_single_date — Three compliance_uploads rows all with report_date = '2025-05-11', verticals NTS_AEO/SDIT_CISO/TSI, each with distinct new_count/recurring_count/resolved_count and 510 compliance_items per upload spanning multiple teams and categories. This is the canonical bug-condition fixture and reproduces the original GitLab #12 scenario.

  5. fixture_mixed_history — Combination of fixture_single_upload_per_date and fixture_multi_vertical_single_date — multiple dates, some with single uploads, some with two or three. Used by the property-based tests as a realistic state-of-the-world fixture.

  6. fixture_cross_vertical_items — Two non-empty disjoint sets of compliance_items, one tagged vertical = 'NTS_AEO' and one tagged vertical = 'SDIT_CISO', sharing some hostnames between verticals to ensure the count-distinct logic is exercised. Used by the persistUpload() vertical-isolation tests.

  7. fixture_pbt_generators — fast-check (or equivalent) arbitraries:

    • arbReportDate: ISO date string in a bounded range (e.g., last 90 days).
    • arbVertical: oneof 'NTS_AEO' | 'SDIT_CISO' | 'TSI' | null.
    • arbUpload: { report_date, vertical, new_count, recurring_count, resolved_count } with non-negative integer counts.
    • arbItem: { hostname, team in ALLOWED_TEAMS, category in {Patching, Configuration, Vulnerability, Other}, vertical, status in {active, resolved} }.
    • arbScenario: { uploads: arbUpload[], items: arbItem[] }, where items reference uploads via upload_id and dates can collide.