34 KiB
Compliance Duplicate Chart Entries Bugfix Design
Overview
Five compliance endpoints (GET /trends, GET /top-recurring, GET /category-trend, GET /summary) and the compliance_snapshots block inside persistUpload() all share the same root cause: they key by compliance_uploads.id (one row per uploaded xlsx) instead of by compliance_uploads.report_date (the calendar date the report covers). Because the compliance pipeline accepts one xlsx per vertical (NTS_AEO, SDIT_CISO, TSI), a single report_date typically maps to several compliance_uploads rows, and any query that does not aggregate over report_date produces duplicated, fragmented, or silently dropped data.
The fix is uniform across endpoints: rewrite the SQL so the result set has exactly one row per unique report_date, using GROUP BY report_date with SUM aggregations for count-style endpoints and DISTINCT ON (report_date) for the latest-snapshot endpoint. The persistUpload() snapshot block is fixed by adding a vertical filter so per-vertical snapshots are no longer cross-contaminated by other verticals' items.
The implementation is intentionally minimal: each fix changes a single SQL statement (and, in one case, a small JavaScript loop). No frontend changes are required — the chart components already key on report_date and will render correctly once the API returns one row per date.
Glossary
- Bug_Condition (C): The condition that triggers the bug — two or more rows in
compliance_uploadsshare the samereport_date(i.e., a multi-vertical upload day). - Property (P): The desired behavior when C holds — each affected endpoint returns exactly one entry per unique
report_date, and the values aggregated across uploads for that date reconcile with the underlyingcompliance_itemstotals. - Preservation: Behavior on dates with a single upload row, on the empty-data response shape, and on unrelated query parameters (e.g.,
teamfilter on/summary) — all must be byte-for-byte unchanged. - report_date:
TEXTcolumn oncompliance_uploadsstoring the reporting period the xlsx covers (e.g.,2025-05-11). One date can have multiple upload rows when multiple verticals are uploaded for that date. - vertical:
TEXTcolumn oncompliance_uploadsandcompliance_itemsidentifying which xlsx (NTS_AEO, SDIT_CISO, TSI) an upload or item belongs to.NULLindicates a legacy AEO-only upload. - persistUpload(): Function in
backend/routes/compliance.js(lines 81–192) that writes a parsed upload to the DB inside a transaction and then writes per-vertical snapshots intocompliance_snapshots. - computeWaterfall(uploads): Pure helper in
backend/routes/compliance.js(lines 235–243) that takes an ordered list of upload rows and emits one waterfall entry per row, carrying the runningstartforward.
Bug Details
Bug Condition
The bug manifests when two or more compliance_uploads rows share the same report_date. This happens whenever the operator uploads more than one vertical xlsx for the same reporting cycle (the documented multi-vertical workflow). The five affected code paths each produce one row per upload instead of aggregating to one row per report_date.
Formal Specification:
FUNCTION isBugCondition(uploads)
INPUT: uploads — list of compliance_uploads rows
OUTPUT: boolean
// The bug condition is triggered for any report_date that has more than one upload row
GROUP uploads BY report_date INTO groups
RETURN EXISTS group IN groups WHERE COUNT(group) > 1
END FUNCTION
For a single endpoint response to be considered buggy, the API output must additionally fail one of the following invariants (the per-endpoint manifestation of the same root cause):
FUNCTION isBuggyResponse(endpoint, response)
CASE endpoint OF
'/trends': RETURN COUNT(response.trends) != COUNT(DISTINCT report_date IN compliance_uploads)
'/top-recurring': RETURN COUNT(response.waterfall) != COUNT(DISTINCT report_date IN compliance_uploads)
'/category-trend': RETURN EXISTS (date, category) WITH COUNT(*) > 1 IN response.categoryTrend
'/summary': RETURN response.upload represents only one of N>1 uploads sharing the latest report_date
AND no flag indicates other uploads exist for that date
'persistUpload': RETURN snapshots.total_devices > items_belonging_to_this_vertical_only
END CASE
END FUNCTION
Examples
The originally reported case (GitLab issue #12, 2025-05-11) and the four sibling manifestations:
-
/trends— STEAM uploads three xlsx files for2025-05-11(one per vertical). The chart shows three "05/11/25" entries on the x-axis instead of one. Expected: a single 05/11/25 point whosenew_count/recurring_count/resolved_count/total_activeare the sums of the three uploads' counts. -
/top-recurring— Same three uploads.computeWaterfall()receives three rows for2025-05-11and emits three bars stacked on the same date. Worse, becausestartcarries forward across rows, the second and third bars'startreflects the first/second row'send, so the three bars in aggregate misrepresent the date-level deltas. Expected: one bar for2025-05-11whosenew_count/recurring_count/resolved_countare summed across the three uploads, and whosestartcarries from the previous date'send. -
/category-trend— Same three uploads, each with category-tagged items. The query groups by(cu.id, cu.report_date, category)and returns up to3 × |categories|rows for2025-05-11. The frontend stacks these as duplicated category bars per date. Expected: one row per(2025-05-11, category)pair withcountsummed across the three uploads. -
/summary— On2025-05-11, three uploads exist. The queryWHERE vertical IS NULL ORDER BY id DESC LIMIT 1(with fallback tovertical = 'NTS_AEO') silently picks one and the other two verticals'summary_jsonis dropped. Expected: either the response merges all three uploads'entriesandoverall_scores, or the response includes amulti_vertical_uploadsarray identifying the other uploads that exist for the samereport_dateso the caller knows the response is partial. -
Edge case —
persistUpload()snapshot — When SDIT_CISO is being persisted on2025-05-11, the snapshot query readscompliance_items WHERE team IS NOT NULLwith noverticalfilter, so the resulting per-teamtotal_devices/compliant/non_compliantcounts include items that belong to NTS_AEO and TSI as well. Expected: the snapshot query filters by the upload'sverticaland groups by(vertical, team).
Expected Behavior
Preservation Requirements
Unchanged Behaviors:
- Single-upload-per-date dates (legacy AEO-only workflow): every endpoint returns the same numbers, in the same shape, in the same order as before the fix.
- Empty-data responses:
/trendsreturns{ trends: [] },/top-recurringreturns{ waterfall: [] },/category-trendreturns{ categoryTrend: [] },/summaryreturns{ entries: [], overall_scores: {}, upload: null }. /summaryteamquery parameter: still filtersentriesserver-side, still rejects non-ALLOWED_TEAMSvalues with HTTP 400./summaryvertical IS NULL→vertical = 'NTS_AEO'fallback for selecting which upload'ssummary_jsonto surface (only the additional metadata about sibling uploads is new).persistUpload()error handling: snapshot creation remains wrapped in atry/catchthat logs but does not fail the upload commit.compliance_snapshotsrows for months with only a single vertical present incompliance_items: identical values to the pre-fix output.- Frontend chart components: no changes required. They already key on
report_dateand consume the existing response shapes.
Scope:
All endpoint inputs that do not involve report_date collisions (single-upload dates, empty datasets, error paths, query-parameter filtering) must be byte-for-byte identical to the pre-fix output. The fix only changes what happens when two or more compliance_uploads rows share a report_date.
Hypothesized Root Cause
All five sites have the same shape of bug — keying by id instead of report_date — but with slightly different mechanics. Listing them explicitly so the test plan can confirm or refute each one:
-
/trends— per-row mapping over uploads. The handler runsSELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASCand.map()s each row into a trend entry. Per-team counts are pre-aggregated byupload_idand looked up byu.id, so duplicate-date rows produce duplicate-date trend entries with split per-team counts. -
/top-recurring—computeWaterfall()receives per-row data. The query is identical to/trends's upload query andcomputeWaterfall()carries a statefulstartforward across rows. Three rows for the same date become three bars whosestart/endrunning totals are wrong relative to the date-level aggregate. -
/category-trend—GROUP BY cu.id, cu.report_date, category. Includingcu.idin theGROUP BYdefeats date-level aggregation; one upload row's items get their own (date, category) group instead of summing into the date-level group. -
/summary—ORDER BY id DESC LIMIT 1. The query selects a single representative upload for the latest date and discards every other upload sharing that date. This is a "select latest by row id" pattern that does not considerreport_dateties. -
persistUpload()snapshot block — missingverticalfilter. The snapshot query readscompliance_items WHERE team IS NOT NULL GROUP BY teamwith noverticalpredicate. The query was correct when there was one vertical (AEO-only legacy) and silently broke when the multi-vertical migration added averticalcolumn without updating this query.
The common structural cause is that the multi-vertical migration (add_vcl_multi_vertical.js) added a vertical column to compliance_uploads and compliance_items but did not audit existing read queries for the new "many uploads share a report_date" reality.
Correctness Properties
Property 1: Bug Condition — /trends returns one entry per unique report_date
For any set of compliance_uploads rows where two or more rows share a report_date, the response from GET /trends SHALL contain exactly one entry per unique report_date, with new_count, recurring_count, resolved_count, and total_active equal to the SUM of those columns over all uploads sharing that date, and per-team counts equal to the sum of compliance_items rows for that team across all those uploads.
Validates: Requirements 2.1, 2.2, 2.3
Property 2: Bug Condition — /top-recurring waterfall has one bar per unique report_date with correct running totals
For any set of compliance_uploads rows where two or more rows share a report_date, the response from GET /top-recurring SHALL contain exactly one waterfall entry per unique report_date, the entry's new_count/recurring_count/resolved_count SHALL equal the sum of those columns over all uploads sharing that date, and the running invariant entry[i].end == entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count SHALL hold with entry[i].start == entry[i-1].end for adjacent entries (and entry[0].start == 0).
Validates: Requirements 2.4, 2.5
Property 3: Bug Condition — /category-trend returns one row per (date, category)
For any set of compliance_uploads and compliance_items rows, the response from GET /category-trend SHALL contain exactly one entry per unique (report_date, category) pair, and each entry's count SHALL equal the total number of compliance_items for that category across every upload sharing that report_date.
Validates: Requirements 2.6, 2.7
Property 4: Bug Condition — /summary does not silently drop sibling uploads
For any set of compliance_uploads rows where two or more rows share the latest report_date, the response from GET /summary SHALL either (a) include a merged view of all sibling uploads' entries and overall_scores, or (b) include a non-empty multi_vertical_uploads field listing the IDs and verticals of the other uploads for that date that were not used to populate the response. The response SHALL NOT silently drop sibling uploads.
Validates: Requirements 2.8, 2.9
Property 5: Bug Condition — persistUpload() snapshot reflects only the snapshotted vertical
For any persistUpload() invocation with a non-NULL vertical, the rows written into compliance_snapshots for the current month SHALL have total_devices, compliant, and non_compliant values equal to the counts derived from compliance_items filtered to the snapshotted vertical only. No item from another vertical SHALL contribute to those counts.
Validates: Requirements 2.10, 2.11
Property 6: Preservation — Per-endpoint cross-date sums equal source-data totals
For any set of uploads, summing new_count (and likewise recurring_count, resolved_count) across every entry in GET /trends SHALL equal the corresponding SUM(new_count) over compliance_uploads. Similarly, summing count across every entry in GET /category-trend SHALL equal COUNT(*) of compliance_items joined to compliance_uploads. This holds whether or not any date has duplicate uploads.
Validates: Requirements 3.1, 3.2
Property 7: Preservation — Single-upload-per-date dates are unchanged
For any set of compliance_uploads where every report_date has exactly one row, the responses from /trends, /top-recurring, /category-trend, and /summary (and the compliance_snapshots rows written by persistUpload()) SHALL be identical to the pre-fix output for the same input. The fix SHALL NOT change behavior on the single-upload-per-date case.
Validates: Requirements 3.1, 3.4, 3.5, 3.6, 3.8
Property 8: Preservation — Empty-data and error-path responses are unchanged
For any empty dataset (no uploads, no matching items, no items in a category), each affected endpoint SHALL return the same empty-state response shape as before the fix. /summary with a non-ALLOWED_TEAMS team parameter SHALL still respond 400. persistUpload() snapshot errors SHALL still be caught and logged without failing the upload commit.
Validates: Requirements 3.3, 3.7, 3.9, 3.10
Fix Implementation
Changes Required
All changes are in backend/routes/compliance.js. No schema migration, no new column, no frontend change.
Fix 1: GET /trends — aggregate uploads and team counts by report_date
Function: router.get('/trends', ...) (around line 768)
Specific Changes:
- Replace the
compliance_uploadsquery so it groups byreport_dateand sums the count columns:SELECT report_date, SUM(COALESCE(new_count, 0))::int AS new_count, SUM(COALESCE(recurring_count, 0))::int AS recurring_count, SUM(COALESCE(resolved_count, 0))::int AS resolved_count, SUM(COALESCE(new_count, 0) + COALESCE(recurring_count, 0))::int AS total_active FROM compliance_uploads WHERE report_date IS NOT NULL GROUP BY report_date ORDER BY report_date ASC - Replace the per-team
compliance_itemsquery so it joins tocompliance_uploadsand groups by(report_date, team)instead of(upload_id, team):SELECT cu.report_date, ci.team, COUNT(ci.id)::int AS count FROM compliance_items ci JOIN compliance_uploads cu ON ci.upload_id = cu.id WHERE ci.team IS NOT NULL AND cu.report_date IS NOT NULL GROUP BY cu.report_date, ci.team - Change the
teamMapkeyed lookup fromteamMap[u.id]toteamMap[u.report_date]and rebuildtrendsfrom the per-date upload rows.
Fix 2: GET /top-recurring — aggregate uploads by report_date before passing to computeWaterfall()
Function: router.get('/top-recurring', ...) (around line 818)
Specific Changes:
- Replace the query with the same
GROUP BY report_datepattern used in/trends(withoutid, sincecomputeWaterfall()only needsreport_date,new_count,recurring_count,resolved_count):SELECT report_date, SUM(COALESCE(new_count, 0))::int AS new_count, SUM(COALESCE(recurring_count, 0))::int AS recurring_count, SUM(COALESCE(resolved_count, 0))::int AS resolved_count FROM compliance_uploads WHERE report_date IS NOT NULL GROUP BY report_date ORDER BY report_date ASC computeWaterfall()itself does not change — it already advancesstartcorrectly when fed one row per date. The fix is purely in the SQL.
Fix 3: GET /category-trend — drop cu.id from GROUP BY
Function: router.get('/category-trend', ...) (around line 838)
Specific Changes:
- Remove
cu.idfrom theGROUP BYclause so the grouping is by(report_date, category)only:SELECT cu.report_date, COALESCE(ci.category, 'Unknown') AS category, COUNT(ci.id)::int AS count FROM compliance_uploads cu JOIN compliance_items ci ON ci.upload_id = cu.id WHERE cu.report_date IS NOT NULL GROUP BY cu.report_date, COALESCE(ci.category, 'Unknown') ORDER BY cu.report_date ASC, category ASC - The response shape (
{ categoryTrend: Array<{ report_date, category, count }> }) does not change. Only the row count for multi-vertical dates changes (collapsing duplicates into sums).
Fix 4: GET /summary — disclose sibling uploads for the latest date
Function: router.get('/summary', ...) (around line 495)
Specific Changes:
- Keep the existing
vertical IS NULL→vertical = 'NTS_AEO'fallback for choosing the primary upload'ssummary_json(this preserves the legacy single-upload behavior). - After resolving
latestUpload, run a second query to find sibling uploads sharing the samereport_date:SELECT id, vertical, uploaded_at FROM compliance_uploads WHERE report_date = $1 AND id != $2 ORDER BY id ASC - Add
multi_vertical_uploadsto the response when sibling uploads exist:res.json({ entries, overall_scores: summary.overall_scores || {}, upload: { id, report_date, uploaded_at }, multi_vertical_uploads: siblings.map(s => ({ id: s.id, vertical: s.vertical, uploaded_at: s.uploaded_at })), }); - When no sibling uploads exist (single-upload-per-date case),
multi_vertical_uploadsis[](or omitted — see open question in test plan).
This is the conservative option (b) from requirement 2.8 — return a documented selection plus metadata about siblings — rather than option (a) full server-side merge. Option (b) is chosen because (i) the summary_json schema is per-vertical and merging would require reconciliation logic that doesn't currently exist, and (ii) the existing fallback selection (NTS_AEO) is the established representative for the legacy AEO chart on the Compliance page.
Fix 5: persistUpload() snapshot block — filter and group by vertical
Function: persistUpload() (lines 81–192), specifically the verticalStats query at line 157
Specific Changes:
- Determine the upload's
vertical(read it from the upload row immediately after theRETURNING idinsert, or accept it as a parameter topersistUpload()). - Replace the
verticalStatsquery with one that filters by the upload'sverticaland groups by(vertical, team):(SELECT vertical, team, COUNT(DISTINCT hostname)::int AS total_devices, COUNT(DISTINCT CASE WHEN status = 'resolved' THEN hostname END)::int AS compliant, COUNT(DISTINCT CASE WHEN status = 'active' THEN hostname END)::int AS non_compliant FROM compliance_items WHERE team IS NOT NULL AND vertical IS NOT DISTINCT FROM $1 GROUP BY vertical, teamIS NOT DISTINCT FROMhandles the legacyvertical IS NULLcase correctly, so AEO-only uploads keep their previous semantics.) - The
INSERT ... ON CONFLICT (snapshot_month, vertical) DO UPDATEalready keys snapshots byvertical, so no change is required there. However, theverticalvalue passed in must come from the query result, not fromteam AS vertical(which conflates the team and vertical concepts). - If the per-snapshot-row "vertical" identity needs to remain
teamfor back-compat reasons, leave theINSERTmapping unchanged but ensure the underlying counts are filtered to the upload's actualvertical. Confirm via inspection ofcompliance_snapshotsconsumers (/vcl/stats) before finalising.
Testing Strategy
Validation Approach
The bug condition is straightforward to construct: insert two compliance_uploads rows with the same report_date and matching compliance_items, then call each affected endpoint. The two-phase approach is to first run the tests against the unfixed code to confirm the duplication/silent-drop counterexamples, then run the same tests against the fixed code and add property-based tests that explore the input space more broadly.
Exploratory Bug Condition Checking
Goal: Surface counterexamples that demonstrate each of the five manifestations BEFORE implementing the fix. Confirm or refute the root cause analysis for each endpoint independently — they share a structural cause but the SQL details differ.
Test Plan: Seed a clean test database with a fixture representing the original GitLab #12 scenario (three uploads for 2025-05-11, one each for NTS_AEO, SDIT_CISO, TSI, with realistic compliance_items). Call each affected endpoint and assert the buggy invariants. Run on UNFIXED code first.
Test Cases:
-
/trendsDuplicate Date Test — Insert three uploads for2025-05-11(verticals NTS_AEO, SDIT_CISO, TSI), each with distinctnew_count/recurring_count/resolved_countand matchingcompliance_itemsper team. CallGET /trends. Assertresponse.trends.filter(t => t.report_date === '2025-05-11').length === 1. (will fail on unfixed code — returns 3) -
/top-recurringDuplicate Bar Test — Same fixture. CallGET /top-recurring. Assertresponse.waterfall.filter(w => w.date === '2025-05-11').length === 1AND assert the running invariantwaterfall[i].end === waterfall[i].start + waterfall[i].new_count + waterfall[i].recurring_count - waterfall[i].resolved_countholds for everyi. (will fail on unfixed code — returns 3 bars and the running totals reflect mid-row state, not date-level aggregate) -
/category-trendDuplicate (date, category) Test — Same fixture, plus items tagged with two categories (e.g., "Patching" and "Configuration"). CallGET /category-trend. Assert that for each(report_date, category)pair,response.categoryTrend.filter(c => c.report_date === '2025-05-11' && c.category === 'Patching').length === 1. (will fail on unfixed code — returns 3 rows per category) -
/summarySibling Disclosure Test — Same fixture (three uploads for2025-05-11, latest date). CallGET /summary. Assert either (a) the response mergesentriesfrom all three uploads, or (b)response.multi_vertical_uploads.length === 2. (will fail on unfixed code — silently picks one upload, the other two are dropped without any indication) -
persistUpload()Cross-Vertical Contamination Test — Pre-populatecompliance_itemswith rows from multiple verticals (e.g., NTS_AEO has 100 active items, SDIT_CISO has 50 active items). CallpersistUpload()with a fresh SDIT_CISO upload. Read back thecompliance_snapshotsrow for the current month and SDIT_CISO. Asserttotal_devicesreflects only SDIT_CISO items, not the combined 150. (will fail on unfixed code — total includes both verticals) -
Edge Case — Single-Upload-Per-Date Regression Test — Insert a fixture with a single upload per date for three dates. Call all four read endpoints and capture responses. Apply the fix, re-run, and assert response equality (byte-for-byte). (should pass on unfixed code; will pass on fixed code; protects the preservation property)
Expected Counterexamples:
/trendsreturns N trend entries for a date with N uploads (N > 1). Cause: per-row.map()over uploads instead of date-level aggregation./top-recurringreturns N waterfall bars for a date with N uploads. Cause: same per-row pattern, pluscomputeWaterfall()carriesstartforward across the duplicate-date rows./category-trendreturns N × |categories| rows for a date with N uploads. Cause:cu.idis in theGROUP BYclause./summaryreturns one upload'ssummary_jsonand silently drops siblings. Cause:ORDER BY id DESC LIMIT 1with noreport_date-tie handling.persistUpload()writes inflatedtotal_devices. Cause: missingWHERE vertical = $1andGROUP BY vertical, teamin the snapshot query.
Fix Checking
Goal: Verify that for all inputs where the bug condition holds (any report_date shared by two or more uploads), each fixed endpoint produces the expected aggregated/disclosed result.
Pseudocode:
FOR ALL (uploads, items) WHERE EXISTS report_date d WITH COUNT(uploads WHERE report_date = d) > 1 DO
trends_response := GET_trends_fixed(uploads, items)
waterfall_response := GET_top_recurring_fixed(uploads, items)
cattrend_response := GET_category_trend_fixed(uploads, items)
summary_response := GET_summary_fixed(uploads, items)
snapshot_rows := persistUpload_fixed(new_upload_for_some_vertical, items)
ASSERT one_entry_per_date(trends_response.trends)
ASSERT one_entry_per_date(waterfall_response.waterfall) AND running_invariant_holds(waterfall_response.waterfall)
ASSERT one_entry_per_date_category_pair(cattrend_response.categoryTrend)
ASSERT siblings_disclosed(summary_response, uploads)
ASSERT snapshots_filtered_to_vertical(snapshot_rows, new_upload.vertical, items)
END FOR
Preservation Checking
Goal: Verify that for all inputs where the bug condition does NOT hold (every report_date has exactly one upload row), the fixed endpoints produce results identical to the original endpoints.
Pseudocode:
FOR ALL (uploads, items) WHERE FORALL report_date d, COUNT(uploads WHERE report_date = d) <= 1 DO
ASSERT GET_trends_original(uploads, items) = GET_trends_fixed(uploads, items)
ASSERT GET_top_recurring_original(uploads, items) = GET_top_recurring_fixed(uploads, items)
ASSERT GET_category_trend_original(uploads, items) = GET_category_trend_fixed(uploads, items)
ASSERT GET_summary_original(uploads, items) = GET_summary_fixed(uploads, items)
ASSERT persistUpload_original(upload, items).snapshots = persistUpload_fixed(upload, items).snapshots
END FOR
Testing Approach: Property-based testing is the right fit for preservation checking here:
- The single-upload-per-date input space is large (any number of dates, any combination of counts, any team distribution, any category mix, any vertical), and exhaustive enumeration is impractical.
- The preservation property is a strict equality, which is well-suited to PBT shrinking (any counterexample is a small fixture demonstrating a behavior change).
- The legacy AEO-only data shape (
vertical IS NULL) must be exercised, which falls naturally out of generators that include null verticals.
Test Plan: Capture responses from the unfixed code on single-upload-per-date fixtures (snapshot tests). After applying the fix, re-run the same fixtures and assert equality. Then run a property-based generator that produces random single-upload-per-date scenarios and asserts the same equality.
Test Cases:
- Snapshot Equality — Empty State — Empty
compliance_uploads. All four endpoints return their documented empty-state shapes. Snapshot-test before and after the fix. - Snapshot Equality — Single AEO-Only Upload — One upload with
vertical IS NULL, classic legacy fixture. Capture pre-fix responses, apply fix, assert equality. - Snapshot Equality — Multiple Single-Upload Dates — Five dates, one upload each, varied
verticalvalues. Capture pre-fix responses, apply fix, assert equality. /summaryTeam Filter Preservation — Latest upload exists,?team=STEAMparameter is supplied. Assertentriesis filtered toteam === 'STEAM'rows. Assert non-ALLOWED_TEAMSvalue (e.g.,?team=OTHER) returns HTTP 400.persistUpload()Snapshot Equality — Single-Vertical Month — Pre-populatecompliance_itemswith rows from a single vertical only. RunpersistUpload()for that vertical. Assert the resultingcompliance_snapshotsrows are identical pre-fix and post-fix.- Error Path Preservation — Force a snapshot query failure (e.g., transient DB error). Assert
persistUpload()still commits the upload and the error is logged but not surfaced to the caller.
Unit Tests
/trendsaggregation: two uploads sharing areport_date, one upload alone for an earlier date. Assert response has 2 entries andnew_countfor the shared date equals the sum of the two uploads./top-recurringaggregation and running totals: same fixture as above. Assert 2 waterfall entries and the runningstart/endinvariant./category-trendaggregation: two uploads sharing areport_date, items tagged with two categories. Assert one row per(date, category)pair with summed counts./summarysibling disclosure: three uploads sharing the latest date. Assert response shape matches the chosen disclosure approach (option (b))./summaryteam filter: same upload, with and without?team=STEAM.persistUpload()per-vertical snapshot: items in two verticals, run upload for one, assert snapshots for that vertical do not include the other vertical's items.persistUpload()legacy AEO-only path (vertical IS NULL): unchanged behavior.
Property-Based Tests
/trendsaggregation property — Generate a random list of(report_date, new_count, recurring_count, resolved_count)upload tuples (with possible date collisions). Generate matching per-team item counts. Assert the response has exactly one entry per uniquereport_dateAND for each entry,new_countequals the SUM of inputnew_counts for that date (likewise the other count fields and per-team counts)./top-recurringrunning invariant property — Same generator. Assert the response has one bar per uniquereport_dateAND for every adjacent pair of entries,entry[i].start === entry[i-1].end, ANDentry[i].end === entry[i].start + entry[i].new_count + entry[i].recurring_count - entry[i].resolved_count./category-trendtotal-conservation property — Generate a random set ofcompliance_itemsand uploads. AssertSUM(response.categoryTrend.map(c => c.count)) === total number of compliance_items joined to non-null-report_date uploads. This holds whether or not any date has multiple uploads./summarysibling-disclosure property — Generate a random set of uploads with possible duplicatereport_dates. Pick the latest date. Assert that if any sibling upload exists for that date, the response contains a non-emptymulti_vertical_uploadsarray referencing every sibling upload's id.persistUpload()vertical-isolation property — Generate two non-empty disjoint sets ofcompliance_items, one per vertical. Insert both. RunpersistUpload()for vertical A. Assert the resultingcompliance_snapshotsrows for vertical A reflect only set-A items (count of distinct hostnames matches).- Cross-endpoint preservation property — Generate any fixture where every
report_datehas exactly one upload row. Assert all five fixed endpoints produce byte-for-byte identical results to the original endpoints.
Integration Tests
- Full upload-to-chart flow: upload three xlsx files (one per vertical) with the same
report_dateviaPOST /preview+POST /commit, then call/trends,/top-recurring,/category-trend,/summaryand verify all four return the expected aggregated/disclosed results. - Compliance Charts panel render: load
ComplianceChartsPanel.jswith a multi-vertical-day fixture and assert (via DOM snapshot) the x-axis shows each date exactly once onActive Findings Over TimeandChange per Report Cycle. - Snapshot consumer regression: after running
persistUpload()with the fix, call/vcl/stats(which readscompliance_snapshots) and verify per-verticalcompliance_pctis unchanged from the pre-fix value when only one vertical's items are present, and is corrected when multiple verticals are present.
Test Fixtures Required
The following fixtures are needed and can be reused across all five endpoints' tests:
-
fixture_empty— Nocompliance_uploads, nocompliance_items. Used by the empty-state preservation tests. -
fixture_single_upload_aeo_legacy— Onecompliance_uploadsrow withvertical IS NULL,report_date = '2025-04-01', with ~20compliance_itemsdistributed across the four teams. Used by the legacy-path preservation tests. -
fixture_single_upload_per_date— Fivecompliance_uploadsrows, each with a distinctreport_date(2025-04-01through2025-05-01), each with a distinctverticalvalue among{NTS_AEO, SDIT_CISO, TSI, NULL, NTS_AEO}. Used by the broader preservation tests and by/category-trendtotal-conservation. -
fixture_multi_vertical_single_date— Threecompliance_uploadsrows all withreport_date = '2025-05-11', verticals NTS_AEO/SDIT_CISO/TSI, each with distinctnew_count/recurring_count/resolved_countand 5–10compliance_itemsper upload spanning multiple teams and categories. This is the canonical bug-condition fixture and reproduces the original GitLab #12 scenario. -
fixture_mixed_history— Combination offixture_single_upload_per_dateandfixture_multi_vertical_single_date— multiple dates, some with single uploads, some with two or three. Used by the property-based tests as a realistic state-of-the-world fixture. -
fixture_cross_vertical_items— Two non-empty disjoint sets ofcompliance_items, one taggedvertical = 'NTS_AEO'and one taggedvertical = 'SDIT_CISO', sharing some hostnames between verticals to ensure the count-distinct logic is exercised. Used by thepersistUpload()vertical-isolation tests. -
fixture_pbt_generators— fast-check (or equivalent) arbitraries:arbReportDate: ISO date string in a bounded range (e.g., last 90 days).arbVertical: oneof'NTS_AEO' | 'SDIT_CISO' | 'TSI' | null.arbUpload:{ report_date, vertical, new_count, recurring_count, resolved_count }with non-negative integer counts.arbItem:{ hostname, team in ALLOWED_TEAMS, category in {Patching, Configuration, Vulnerability, Other}, vertical, status in {active, resolved} }.arbScenario:{ uploads: arbUpload[], items: arbItem[] }, where items reference uploads viaupload_idand dates can collide.