47 KiB
Compliance Duplicate Failing Metrics Bugfix Design
Overview
Five compliance endpoints in backend/routes/compliance.js (GET /items, GET /items/:hostname, GET /vcl/stats, GET /mttr) and the compliance_snapshots block inside persistUpload() all share the same root cause: each one reads compliance_items with either no vertical filter or a filter that admits both legacy vertical IS NULL rows and multi-vertical vertical = 'NTS_AEO' rows, but does not deduplicate on (hostname, metric_id). When the same (hostname, metric_id) pair exists in two verticals (the documented multi-vertical workflow), the duplicate row distorts the response — it duplicates a metric chip in the UI, double-counts a device across teams, inflates aging buckets, inflates forecast row counts, and (in the snapshot block) lets a single hostname appear in both compliant and non_compliant columns of the same (snapshot_month, vertical) row.
The fix is uniform across endpoints: dedupe at the SQL layer using DISTINCT ON (hostname, metric_id) with a deterministic ORDER BY (highest seen_count, then most recent upload_id) so each unique violation contributes exactly one row to the aggregation, and rewrite the snapshot query so each hostname is classified by its MIN(status) (active wins over resolved) inside a CTE before the count. The groupByHostname helper and the /items/:hostname response builder also gain a defensive in-memory dedupe keyed by metric_id (or (metric_id, status) for the detail endpoint), so a duplicate row that slips through any unforeseen code path still cannot duplicate a chip in the UI.
The implementation is intentionally minimal: each fix changes a single SQL statement and (for two endpoints) a small JavaScript loop. No schema migration, no new column, no frontend change. The frontend ComplianceDetailPanel already keys metrics on metric_id for chip rendering and will render correctly once the API stops returning duplicate rows.
Glossary
- Bug_Condition (C): The condition that triggers the bug — two or more rows in
compliance_itemsshare the same(hostname, metric_id)pair across verticals (i.e., one row withvertical IS NULLand another withvertical = 'NTS_AEO', or two non-null verticals that both pass an endpoint'sWHEREclause). - Property (P): The desired behavior when C holds — each affected endpoint returns or counts each
(hostname, metric_id)exactly once, using the row with the highestseen_count(with most recentupload_idas tiebreak) as the representative. - Preservation: Behavior on rows where
(hostname, metric_id)is unique across verticals, on the empty-data response shape, and on unrelated query parameters (e.g.,teamandstatuson/items) — all must be byte-for-byte unchanged. - vertical:
TEXTcolumn oncompliance_itemsandcompliance_uploadsidentifying which xlsx (NTS_AEO, SDIT_CISO, TSI) the row originated from.NULLindicates a legacy AEO-only upload. The bug class is specifically about rows that share(hostname, metric_id)but differ invertical. - groupByHostname(rows, noteHostnames): Helper in
backend/routes/compliance.js(lines ~213–230) that flattens a list of joinedcompliance_itemsrows into one device object per hostname, pushing each row'smetric_idontofailing_metrics. It performs no deduplication. - bucketAgingItems(items): Helper in
backend/routes/compliance.js(lines ~234–254) that places each item into one of fourseen_countbuckets per team. It iterates rows directly, so duplicate rows produce double-counted buckets. - persistUpload(): Function in
backend/routes/compliance.js(lines ~81–192) that writes a parsed upload to the DB and then writes per-vertical rows intocompliance_snapshots. The snapshot query at the end of this function counts hostnames per team usingCOUNT(DISTINCT CASE WHEN status = 'X' THEN hostname END), which double-counts a hostname into both thecompliantandnon_compliantcolumns when the hostname has bothactiveandresolvedrows across verticals. - representative row: For a duplicated
(hostname, metric_id), the row chosen byDISTINCT ON (hostname, metric_id) ... ORDER BY hostname, metric_id, seen_count DESC, upload_id DESC. This is deterministic and aligns with the "use the row with the highestseen_countor most recentupload_id" rule from the requirements.
Bug Details
Bug Condition
The bug manifests when two or more rows in compliance_items share the same (hostname, metric_id) pair across different vertical values. This happens whenever the same device fails the same metric in more than one vertical's xlsx — including the originally reported case where a legacy vertical IS NULL AEO upload and a newer vertical = 'NTS_AEO' multi-vertical upload both contain STEAM-INTERSIGHT failing 7.1.1.
Formal Specification:
FUNCTION isBugCondition(items)
INPUT: items — list of compliance_items rows
OUTPUT: boolean
// The bug condition is triggered for any (hostname, metric_id) pair with more than one row
GROUP items BY (hostname, metric_id) INTO groups
RETURN EXISTS group IN groups WHERE COUNT(group) > 1
END FUNCTION
For a single endpoint response to be considered buggy, the API output must additionally fail one of the following invariants (the per-endpoint manifestation of the same root cause):
FUNCTION isBuggyResponse(endpoint, response)
CASE endpoint OF
'/items': RETURN EXISTS device IN response.devices WHERE
COUNT(device.failing_metrics) != COUNT(DISTINCT metric_id IN device.failing_metrics)
'/items/:hostname': RETURN EXISTS (metric_id, status) WITH COUNT(*) > 1 IN response.metrics
'/vcl/stats heavy_hitters': RETURN SUM(hh.non_compliant FOR hh IN response.heavy_hitters) >
COUNT(DISTINCT hostname WHERE has_active_violation)
'/vcl/stats forecast': RETURN response.vertical_breakdown[i].blockers < 0
FOR SOME i WITH duplicated (hostname, metric_id) AND non-null resolution_date
'/mttr': RETURN SUM(b.total FOR b IN response.aging) >
COUNT(DISTINCT (hostname, metric_id) WHERE status = 'active')
'persistUpload snapshot': RETURN EXISTS row IN compliance_snapshots WHERE
compliant + non_compliant > total_devices
END CASE
END FUNCTION
Examples
The originally reported case (GitLab issue #13, STEAM-INTERSIGHT / 172.16.30.40 / metric 7.1.1) and the four sibling manifestations:
-
/items—STEAM-INTERSIGHThas two active rows formetric_id = '7.1.1'(one withvertical IS NULL, one withvertical = 'NTS_AEO'). The handler'sWHEREclause admits both via(ci.vertical IS NULL OR ci.vertical = 'NTS_AEO'), andgroupByHostname()does an unconditionaldev.failing_metrics.push(...)per row. The device'sfailing_metricsarray contains two entries withmetric_id = '7.1.1'. Expected: one entry per uniquemetric_id. -
/items/:hostname— Same hostname, but the detail query has no vertical filter at all (justWHERE ci.hostname = $1). Both rows come back, the response builder mapsmetricRows.map(...)over both, and the frontend'sMetricChiprenders7.1.1twice in the "Failing Metrics" section. Expected: one entry per(metric_id, status)pair. -
/vcl/statsheavy-hitters and per-team totals — A device whoseteamdiffers between verticals (e.g.,STEAMin the legacy row andACCESS-ENGin the NTS_AEO row) is counted under both teams byCOUNT(DISTINCT hostname) ... GROUP BY team. The sum acrossheavy_hitters[*].non_compliantexceeds the dashboard'sstats.non_compliant. Expected: the hostname is assigned to exactly one team — the team from its representative row — and the per-team sums reconcile with the global non-compliant count. -
/vcl/statsforecast-burndown — For a team with one duplicated(hostname, metric_id)whoseresolution_dateis non-null in both rows, the forecast querySELECT resolution_date FROM compliance_items WHERE status = 'active' AND team = $1 AND resolution_date IS NOT NULLreturns two rows.forecastItems.lengthis 2, butteamNonCompliant(from the de-team-counted DISTINCT-hostname query) is 1, soblockers = 1 - 2 = -1. The route then clamps to 0, hiding the underlying inconsistency. Expected: forecast is deduped by(hostname, metric_id)so the count matchesteamNonCompliant's scoping andblockersis non-negative. -
/mttr— Same duplicated(hostname, metric_id).bucketAgingItems()receives both rows, increments the bucket twice. The team total forSTEAM(or whichever team appears in the duplicate row) is inflated. Expected: each unique(hostname, metric_id)contributes to exactly one bucket using a single representativeseen_count. -
Edge case —
persistUpload()snapshot — A hostname has two rows: one withstatus = 'resolved'(legacy vertical) and one withstatus = 'active'(NTS_AEO vertical). The snapshot queryCOUNT(DISTINCT CASE WHEN status = 'resolved' THEN hostname END) AS compliant, COUNT(DISTINCT CASE WHEN status = 'active' THEN hostname END) AS non_compliantcounts the hostname once in
compliantand once innon_compliant, socompliant + non_compliant > total_devices. Expected: the hostname is classified by its worst-case status (active wins over resolved) inside a CTE so it appears in exactly one column.
Expected Behavior
Preservation Requirements
Unchanged Behaviors:
- Rows where
(hostname, metric_id)is unique across verticals: every endpoint returns the same numbers, in the same shape, in the same order as before the fix. - Empty-data responses:
/itemsreturns{ devices: [], team, status },/items/:hostnamereturns404for unknown hostnames,/vcl/statsreturns its zero-state shape,/mttrreturns{ aging: [] }. /itemsteamandstatusquery parameters: still validated againstALLOWED_TEAMSand['active', 'resolved'], still reject invalid values with HTTP 400./items(ci.vertical IS NULL OR ci.vertical = 'NTS_AEO')predicate: kept as-is so AEO-only legacy data continues to surface alongside NTS_AEO multi-vertical data. The fix only adds dedup on top, it does not narrow the set of admitted verticals./items/:hostnameordering bystatus DESC, metric_id: unchanged soactivemetrics remain listed beforeresolvedones./vcl/statsdonut categorization (blocked/in_progressbyMAX(resolution_date)per hostname): already deduped byGROUP BY hostnameand stays unchanged./vcl/statsglobalstats.compliant/stats.non_compliant: already useCOUNT(DISTINCT hostname)and stay unchanged.- Frontend components (
ComplianceDetailPanel,ComplianceCharts,CompliancePage): no changes required. They already render one chip permetric_id; the bug was purely an upstream data issue. persistUpload()error handling: snapshot creation remains wrapped intry/catchthat logs but does not fail the upload commit.
Scope:
All endpoint inputs that do not involve cross-vertical duplicate (hostname, metric_id) rows must be byte-for-byte identical to the pre-fix output. The fix only changes what happens when two or more compliance_items rows share (hostname, metric_id) across verticals.
Hypothesized Root Cause
All five sites have the same shape of bug — missing dedup on (hostname, metric_id) after the multi-vertical migration admitted two-row scenarios — but with slightly different mechanics. Listing them explicitly so the test plan can confirm or refute each one:
-
/items—groupByHostname()pushes every row. The handler'sWHEREclause(ci.vertical IS NULL OR ci.vertical = 'NTS_AEO')admits both verticals' rows.groupByHostname()then iterates each row and unconditionally callsdev.failing_metrics.push({ metric_id, ... }). There is noSet/Mapkeyed onmetric_id, so a duplicate row produces a duplicate metric in the array. -
/items/:hostname— no vertical filter, no dedup. The detail query isWHERE ci.hostname = $1with noverticalpredicate at all, so every vertical's row for that hostname is returned. The response builder doesmetricRows.map(r => ({ ...r }))— there is no dedup step. Every duplicate row becomes a duplicate entry inmetrics. -
/vcl/statsheavy-hitters and per-team totals — team chosen per row, not per hostname. TheGROUP BY team ... COUNT(DISTINCT hostname)query is correct for "how many distinct hostnames does each team see," but a hostname with rows under two different teams (becauseteamdiffers across verticals) is counted in both groups. The dashboard's globalstats.non_compliant(a singleCOUNT(DISTINCT hostname)with no team scoping) does not matchSUM(heavy_hitters[*].non_compliant). -
/vcl/statsforecast-burndown — duplicate rows inflate forecast count. The querySELECT resolution_date FROM compliance_items WHERE status = 'active' AND team = $1 AND resolution_date IS NOT NULLreturns one row percompliance_itemsrow, not one row per(hostname, metric_id). The downstreamblockers = teamNonCompliant - forecastItems.lengthcalculation can go negative becauseforecastItems.lengthis inflated relative to the dedupedteamNonCompliant. -
/mttr—bucketAgingItems()iterates rows directly. The querySELECT seen_count, team FROM compliance_items WHERE status = 'active'returns every active row, andbucketAgingItems()doesfor (const item of items) { buckets[label].total += 1; ... }. There is noSetkeyed on(hostname, metric_id), so each duplicate row increments its bucket twice. -
persistUpload()snapshot —CASE WHEN status =double-counts. The snapshot queryCOUNT(DISTINCT CASE WHEN status = 'resolved' THEN hostname END) AS compliant, COUNT(DISTINCT CASE WHEN status = 'active' THEN hostname END) AS non_compliantcounts a hostname in
compliantif any of its rows isresolvedAND innon_compliantif any of its rows isactive. With duplicate rows that disagree on status across verticals, the same hostname lands in both columns.
The common structural cause is that the multi-vertical migration (add_vcl_multi_vertical.js) added a vertical column to compliance_items but did not retrofit existing read queries — which assumed (hostname, metric_id) was effectively a unique key — to either dedupe explicitly or scope to a single vertical.
Correctness Properties
Property 1: Bug Condition — /items failing_metrics array contains at most one entry per metric_id
For any set of compliance_items rows including cross-vertical duplicates of (hostname, metric_id), the response from GET /items SHALL contain, for every device in response.devices, exactly one entry per unique metric_id in device.failing_metrics. Formally: device.failing_metrics.length == new Set(device.failing_metrics.map(m => m.metric_id)).size.
Validates: Requirements 2.1
Property 2: Bug Condition — /items/:hostname returns one metric per (metric_id, status)
For any device with cross-vertical duplicate (hostname, metric_id) rows, the response from GET /items/:hostname SHALL contain exactly one entry per unique (metric_id, status) pair in response.metrics. Each entry's seen_count SHALL equal the maximum seen_count across the duplicate rows for that pair.
Validates: Requirements 2.2, 2.3
Property 3: Bug Condition — /vcl/stats per-team device counts equal COUNT(DISTINCT hostname)
For any set of compliance_items rows including cross-vertical duplicates where a hostname's team differs across verticals, the response from GET /vcl/stats SHALL satisfy SUM(heavy_hitters[*].non_compliant) == stats.non_compliant. Each hostname SHALL be assigned to exactly one team — the team from its representative row (highest seen_count, then most recent upload_id) — regardless of how many verticals contain rows for that hostname.
Validates: Requirements 2.5
Property 4: Bug Condition — /vcl/stats forecast-burndown is deduped by (hostname, metric_id) and blockers is non-negative
For any team with cross-vertical duplicate (hostname, metric_id) rows where both rows have a non-null resolution_date, the deduped forecast row count for that team SHALL contribute at most one entry per unique (hostname, metric_id), and blockers = teamNonCompliant - dedupedForecastCount SHALL be >= 0 (no clamp required to satisfy the invariant).
Validates: Requirements 2.6
Property 5: Bug Condition — /mttr aging buckets count each unique (hostname, metric_id) exactly once
For any set of active compliance_items rows including cross-vertical duplicates, the response from GET /mttr SHALL satisfy SUM(aging[*].total) == COUNT(DISTINCT (hostname, metric_id) WHERE status = 'active'). Each unique (hostname, metric_id) SHALL be bucketed exactly once using a single representative seen_count.
Validates: Requirements 2.7
Property 6: Bug Condition — persistUpload() snapshot rows satisfy compliant + non_compliant <= total_devices
For any persistUpload() invocation, every row written into compliance_snapshots for the current (snapshot_month, vertical) pair SHALL satisfy compliant + non_compliant <= total_devices. A hostname with both active and resolved rows for the same team SHALL be classified as non_compliant (active wins over resolved) and SHALL appear in exactly one of the two columns.
Validates: Requirements 2.4
Property 7: Preservation — non-duplicated rows are unchanged
For any set of compliance_items rows where every (hostname, metric_id) is unique across verticals, the responses from /items, /items/:hostname, /vcl/stats, /mttr, and the compliance_snapshots rows written by persistUpload() SHALL be identical to the pre-fix output for the same input. The fix SHALL NOT change behavior on the unique-(hostname, metric_id) case.
Validates: Requirements 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7
Property 8: Preservation — seen_count, first_seen, and last_seen are correctly aggregated for the representative row
For any duplicated (hostname, metric_id), the surviving entry SHALL carry seen_count = MAX(seen_count) across the duplicates, first_seen = MIN(first_seen) across the duplicates, and last_seen = MAX(last_seen) across the duplicates. Active/resolved status separation SHALL be preserved (a metric still appears in the active section if any of its duplicate rows is active).
Validates: Requirements 3.2, 3.3
Fix Implementation
Changes Required
All changes are in backend/routes/compliance.js. No schema migration, no new column, no frontend change. SQL-level dedup is preferred wherever possible because it avoids materialising duplicates into Node memory and centralises the representative-row policy at the data layer.
Fix 1: GET /items — DISTINCT ON (hostname, metric_id) in SQL, defensive dedupe in groupByHostname
File: backend/routes/compliance.js
Function: router.get('/items', ...) (around line 535) and helper groupByHostname (around line 213)
Specific Changes:
- Rewrite the items query to use
DISTINCT ON (hostname, metric_id)so each unique violation contributes exactly one row, choosing the representative row by highestseen_countand most recentupload_id:TheSELECT DISTINCT ON (ci.hostname, ci.metric_id) ci.hostname, ci.ip_address, ci.device_type, ci.team, ci.metric_id, ci.metric_desc, ci.category, ci.status, ci.seen_count, fu.report_date AS first_seen, lu.report_date AS last_seen, ru.report_date AS resolved_on FROM compliance_items ci LEFT JOIN compliance_uploads fu ON ci.first_seen_upload_id = fu.id LEFT JOIN compliance_uploads lu ON ci.upload_id = lu.id LEFT JOIN compliance_uploads ru ON ci.resolved_upload_id = ru.id WHERE ci.team = $1 AND ci.status = $2 AND (ci.vertical IS NULL OR ci.vertical = 'NTS_AEO') ORDER BY ci.hostname, ci.metric_id, ci.seen_count DESC, ci.upload_id DESCORDER BYlead expressions match theDISTINCT ONcolumns; the trailing expressions select the representative. - Add a defensive in-memory dedupe in
groupByHostname()keyed onmetric_idper device, so any future code path that bypasses the SQL dedupe still cannot duplicate a chip:Thefunction groupByHostname(rows, noteHostnames) { const deviceMap = {}; for (const row of rows) { if (!deviceMap[row.hostname]) { deviceMap[row.hostname] = { hostname: row.hostname, ip_address: row.ip_address || '', device_type: row.device_type || '', team: row.team || '', status: row.status, failing_metrics: [], _seenMetricIds: new Set(), seen_count: row.seen_count || 1, first_seen: row.first_seen || null, last_seen: row.last_seen || null, resolved_on: row.resolved_on || null, has_notes: noteHostnames.has(row.hostname), }; } const dev = deviceMap[row.hostname]; if (!dev._seenMetricIds.has(row.metric_id)) { dev._seenMetricIds.add(row.metric_id); dev.failing_metrics.push({ metric_id: row.metric_id, metric_desc: row.metric_desc || '', category: row.category || '' }); } if ((row.seen_count || 1) > dev.seen_count) dev.seen_count = row.seen_count; if (row.first_seen && (!dev.first_seen || row.first_seen < dev.first_seen)) dev.first_seen = row.first_seen; if (row.last_seen && (!dev.last_seen || row.last_seen > dev.last_seen)) dev.last_seen = row.last_seen; } return Object.values(deviceMap).map(({ _seenMetricIds, ...dev }) => dev); }_seenMetricIdsset is stripped before the device is returned so the response shape is unchanged.
Fix 2: GET /items/:hostname — DISTINCT ON (metric_id, status) in SQL
File: backend/routes/compliance.js
Function: router.get('/items/:hostname', ...) (around line 575)
Specific Changes:
- Rewrite the metrics query to dedupe on
(metric_id, status)so a metric that appears as bothactiveandresolvedis preserved (one row each), but cross-vertical duplicates of the same(metric_id, status)collapse to one:SELECT DISTINCT ON (ci.metric_id, ci.status) ci.metric_id, ci.metric_desc, ci.category, ci.status, ci.ip_address, ci.device_type, ci.team, ci.seen_count, ci.extra_json, ci.resolution_date, ci.remediation_plan, fu.report_date AS first_seen, fu.uploaded_at AS first_seen_at, lu.report_date AS last_seen, lu.uploaded_at AS last_seen_at, ru.report_date AS resolved_on FROM compliance_items ci LEFT JOIN compliance_uploads fu ON ci.first_seen_upload_id = fu.id LEFT JOIN compliance_uploads lu ON ci.upload_id = lu.id LEFT JOIN compliance_uploads ru ON ci.resolved_upload_id = ru.id WHERE ci.hostname = $1 ORDER BY ci.metric_id, ci.status, ci.seen_count DESC, ci.upload_id DESC - The existing post-query sort (
status DESCfor grouping, thenmetric_id) is rebuilt in JavaScript on the deduped result to preserve the response ordering:(Note: the original SQL usedmetricRows.sort((a, b) => { if (a.status !== b.status) return b.status.localeCompare(a.status); // 'resolved' < 'active' alphabetically; we want 'active' first return a.metric_id.localeCompare(b.metric_id); });ORDER BY ci.status DESC, ci.metric_id, which placedresolvedbeforeactivebecause ofDESCon the alphabetic order. The post-fix sort reproduces that exact ordering on the deduped rows.) - The
identitylookupmetricRows.find(r => r.status === 'active') || metricRows[0]is unchanged — it still picks the active representative forip_address,device_type,team,resolution_date, andremediation_plan.
Fix 3: GET /vcl/stats heavy-hitters and per-team totals — dedupe to one team per hostname via CTE
File: backend/routes/compliance.js
Function: router.get('/vcl/stats', ...) (around line 990, the teamRows query)
Specific Changes:
- Replace the heavy-hitters query with a CTE that first picks one representative row per hostname (using the same
DISTINCT ON (hostname)policy: highestseen_count, then most recentupload_id), then groups by the representative's team:Because the CTE already collapses each hostname to one row,WITH device_team AS ( SELECT DISTINCT ON (hostname) hostname, COALESCE(team, 'Unknown') AS team, resolution_date FROM compliance_items WHERE status = 'active' ORDER BY hostname, seen_count DESC, upload_id DESC ) SELECT team, COUNT(DISTINCT hostname)::int AS non_compliant, MAX(resolution_date) AS compliance_date FROM device_team GROUP BY team ORDER BY COUNT(DISTINCT hostname) DESCCOUNT(DISTINCT hostname)per team is equivalent toCOUNT(*)per team, butDISTINCTis kept for defensive symmetry. - The dashboard's existing
stats.non_compliantquery (alsoCOUNT(DISTINCT hostname)over all active rows, no team scoping) is unchanged — but now the propertySUM(heavy_hitters[*].non_compliant) == stats.non_compliantholds because both numerators and denominators agree on "one hostname, one team." - The per-team-total query inside the
for (const teamRow of teamRows)loop (SELECT COUNT(DISTINCT hostname) AS total FROM compliance_items WHERE COALESCE(team, 'Unknown') = $1) is similarly rewritten to use the samedevice_team-style CTE so the team'stotal_devicesmatches the team'snon_compliant. The simplest form:Note: the loop runs once per team, so this CTE is computed N times — acceptable given the small team count (4 teams). If profiling shows this is a hot path, hoist theWITH device_team AS ( SELECT DISTINCT ON (hostname) hostname, COALESCE(team, 'Unknown') AS team FROM compliance_items ORDER BY hostname, seen_count DESC, upload_id DESC ) SELECT COUNT(*)::int AS total FROM device_team WHERE team = $1device_teamCTE out of the loop and compute totals in a single grouped query.
Fix 4: GET /vcl/stats forecast-burndown — dedupe by (hostname, metric_id) in SQL
File: backend/routes/compliance.js
Function: router.get('/vcl/stats', ...) (around line 1015, the forecastItems query)
Specific Changes:
- Rewrite the forecast query to
DISTINCT ON (hostname, metric_id)so each unique violation contributes at most oneresolution_dateto the forecast bucketing:SELECT DISTINCT ON (hostname, metric_id) resolution_date FROM compliance_items WHERE status = 'active' AND COALESCE(team, 'Unknown') = $1 AND resolution_date IS NOT NULL ORDER BY hostname, metric_id, seen_count DESC, upload_id DESC - The downstream
blockers = teamNonCompliant - forecastItems.lengthcalculation is unchanged. With both sides now deduped consistently —teamNonCompliantfrom thedevice_teamCTE in Fix 3,forecastItems.lengthfrom the deduped query above — the difference is non-negative without any clamp. - The
Math.max(blockers, 0)clamp at the existing call site is left in place as a belt-and-braces safeguard. It SHOULD be a no-op after the fix; if a regression introduces inconsistency again, the property test for Property 4 will catch it.
Fix 5: GET /mttr — dedupe by (hostname, metric_id) in SQL
File: backend/routes/compliance.js
Function: router.get('/mttr', ...) (around line 824)
Specific Changes:
- Rewrite the query to
DISTINCT ON (hostname, metric_id)so each unique active violation contributes exactly one(seen_count, team)row tobucketAgingItems():SELECT DISTINCT ON (hostname, metric_id) COALESCE(seen_count, 1) AS seen_count, team FROM compliance_items WHERE status = 'active' ORDER BY hostname, metric_id, seen_count DESC, upload_id DESC bucketAgingItems()itself does not change. It already does the right thing when fed one row per unique violation.- SQL-level dedup is preferred over a JavaScript-side
Sethere because the helper is also called from/vcl/statsand changing its contract risks regressions in the other caller. Pushing the dedup into SQL keepsbucketAgingItems()pure and reusable.
Fix 6: persistUpload() snapshot block — classify hostnames via MIN(status) in a CTE
File: backend/routes/compliance.js
Function: persistUpload() (lines 81–192), specifically the verticalStats query at line 157
Specific Changes:
- Rewrite the snapshot query so each hostname is classified by its worst-case status (active wins over resolved) inside a CTE before counting. Because
'active' < 'resolved'lexicographically,MIN(status)returns'active'for a hostname that has any active row and'resolved'only if all rows are'resolved':Each hostname appears exactly once inWITH hostname_status AS ( SELECT team, hostname, MIN(status) AS status -- 'active' beats 'resolved' alphabetically FROM compliance_items WHERE team IS NOT NULL GROUP BY team, hostname ) SELECT team AS vertical, COUNT(*)::int AS total_devices, COUNT(*) FILTER (WHERE status = 'resolved')::int AS compliant, COUNT(*) FILTER (WHERE status = 'active')::int AS non_compliant FROM hostname_status GROUP BY teamhostname_status, socompliant + non_compliant == total_devicesis structurally guaranteed. - The downstream
INSERT ... ON CONFLICT (snapshot_month, vertical) DO UPDATEblock is unchanged. It already keys snapshots onvertical, and the per-team counts now satisfycompliant + non_compliant <= total_devices. - The
compliance_pctcalculationMath.round((vs.compliant / total) * 100 * 100) / 100is unchanged. It is now defended against the previous off-by-one: with the old query a hostname could be in bothcompliantandnon_compliant, double-counting itself incompliant's numerator while only contributing once tototal_devices's denominator. The fix removes that.
Note: Fix 6 is conceptually adjacent to but mechanically distinct from the
persistUploadfix incompliance-duplicate-chart-entries. That spec adds aWHERE vertical = $1filter to scope the snapshot to one vertical at a time. This spec adds thehostname_statusCTE so that within whichever vertical is snapshotted, each hostname is classified once. Both fixes are independently necessary; landing them together is preferred but each is correct on its own.
Testing Strategy
Validation Approach
The bug condition is straightforward to construct: insert two compliance_items rows for the same (hostname, metric_id) with different vertical values, then call each affected endpoint. The two-phase approach is to first run the tests against the unfixed code to confirm the duplication counterexamples, then run the same tests against the fixed code and add property-based tests that explore the input space more broadly.
Exploratory Bug Condition Checking
Goal: Surface counterexamples that demonstrate each of the six manifestations BEFORE implementing the fix. Confirm or refute the root cause analysis for each endpoint independently — they share a structural cause but the SQL details differ.
Test Plan: Seed a clean test database with a fixture representing the original GitLab #13 scenario (STEAM-INTERSIGHT failing 7.1.1 in both vertical IS NULL and vertical = 'NTS_AEO'), plus targeted variants for each sibling endpoint. Call each affected endpoint and assert the buggy invariants. Run on UNFIXED code first.
Test Cases:
-
/itemsDuplicate Failing Metric Test — Insert two rows for(STEAM-INTERSIGHT, 7.1.1): one withvertical IS NULL, status = 'active', team = 'STEAM', another withvertical = 'NTS_AEO', status = 'active', team = 'STEAM'. CallGET /items?team=STEAM&status=active. Assert thatresponse.devices[0].failing_metrics.filter(m => m.metric_id === '7.1.1').length === 1. (will fail on unfixed code — returns 2) -
/items/:hostnameDuplicate Metric Entry Test — Same fixture. CallGET /items/STEAM-INTERSIGHT. Assertresponse.metrics.filter(m => m.metric_id === '7.1.1' && m.status === 'active').length === 1. Assertresponse.metrics[0].seen_count === MAX(seen_count across the duplicate rows). (will fail on unfixed code — returns 2) -
/vcl/statsCross-Team Hostname Test — Insert two rows for(SOME-DEVICE, 7.1.1): one withteam = 'STEAM', vertical IS NULL, another withteam = 'ACCESS-ENG', vertical = 'NTS_AEO'. CallGET /vcl/stats. AssertSUM(heavy_hitters[*].non_compliant) === stats.non_compliant. (will fail on unfixed code — sum exceeds the global count by 1) -
/vcl/statsForecast Negative Blockers Test — Insert two rows for(SOME-DEVICE, 7.1.1)both withteam = 'STEAM', status = 'active', both withresolution_date = '2025-09-30', but different verticals. CallGET /vcl/stats. Assertvertical_breakdown.find(v => v.team === 'STEAM').blockers >= 0AND that the unclampedteamNonCompliant - dedupedForecastCountequals the reportedblockers. (will fail on unfixed code — clamped from-1to0, hiding the inconsistency; the unclamped check makes the failure visible) -
/mttrInflated Bucket Test — Insert two rows for(SOME-DEVICE, 7.1.1)withseen_count = 5, team = 'STEAM', status = 'active', different verticals. CompareSUM(aging[*].total)toCOUNT(DISTINCT (hostname, metric_id) WHERE status = 'active'). Assert equality. (will fail on unfixed code — bucket total exceeds distinct count by 1) -
persistUpload()Snapshot Inconsistency Test — Pre-populatecompliance_itemswith(SOME-DEVICE, 7.1.1)having onestatus = 'active'row and onestatus = 'resolved'row, bothteam = 'STEAM', different verticals. CallpersistUpload()with a no-op upload. Read back thecompliance_snapshotsrow for the current month andSTEAM. Assertcompliant + non_compliant <= total_devices. (will fail on unfixed code — the hostname is counted in both columns so the sum exceedstotal_devicesby 1) -
Edge Case — Single-Vertical Regression Test — Insert a fixture where every
(hostname, metric_id)is unique (no cross-vertical duplicates). Call all five read endpoints and capture responses. Apply the fix, re-run, and assert response equality (byte-for-byte). RunpersistUpload()on a single-vertical fixture and assert the resultingcompliance_snapshotsrows are identical pre-fix and post-fix. (should pass on unfixed code; will pass on fixed code; protects the preservation property)
Expected Counterexamples:
/itemsreturnsfailing_metricsarrays wherelength > Set(metric_ids).size. Cause:groupByHostname()pushes per row instead of per uniquemetric_id./items/:hostnamereturnsmetricsarrays where(metric_id, status)collisions exist. Cause: no vertical filter and no dedup in the detail query or response builder./vcl/statsreturnsheavy_hitterswhosenon_compliantvalues sum to more than the globalstats.non_compliant. Cause: a hostname'steamdiffers across verticals and the per-teamCOUNT(DISTINCT hostname)counts it under both teams./vcl/statsreportsblockersclamped to 0 when the unclamped expression is negative. Cause: the forecast query is not deduped on(hostname, metric_id)so its row count exceedsteamNonCompliant./mttrreturnsagingwhose total exceeds the distinct-violation count. Cause:bucketAgingItems()iterates rows directly with no dedup.persistUpload()writescompliance_snapshotsrows wherecompliant + non_compliant > total_devices. Cause: the snapshot query'sCASE WHEN status = 'X'pattern lets a hostname appear in both columns.
Fix Checking
Goal: Verify that for all inputs where the bug condition holds (any (hostname, metric_id) shared by two or more rows across verticals), each fixed endpoint produces the expected deduped result.
Pseudocode:
FOR ALL items WHERE EXISTS (h, m) WITH COUNT(items WHERE hostname = h AND metric_id = m) > 1 DO
items_response := GET_items_fixed(items)
detail_response := GET_items_hostname_fixed(items, some_hostname_with_dups)
stats_response := GET_vcl_stats_fixed(items)
mttr_response := GET_mttr_fixed(items)
snapshot_rows := persistUpload_fixed(no_op_upload, items)
ASSERT no_duplicate_metrics_per_device(items_response.devices)
ASSERT no_duplicate_metric_status_pairs(detail_response.metrics)
ASSERT sum_of_team_counts_equals_global(stats_response)
ASSERT forecast_blockers_non_negative(stats_response.vertical_breakdown)
ASSERT mttr_total_equals_distinct_violations(mttr_response.aging, items)
ASSERT snapshot_invariant_holds(snapshot_rows)
END FOR
Preservation Checking
Goal: Verify that for all inputs where the bug condition does NOT hold (every (hostname, metric_id) is unique across verticals), the fixed endpoints produce results identical to the original endpoints.
Pseudocode:
FOR ALL items WHERE FORALL (h, m), COUNT(items WHERE hostname = h AND metric_id = m) <= 1 DO
ASSERT GET_items_original(items) = GET_items_fixed(items)
ASSERT GET_items_hostname_original(items) = GET_items_hostname_fixed(items)
ASSERT GET_vcl_stats_original(items) = GET_vcl_stats_fixed(items)
ASSERT GET_mttr_original(items) = GET_mttr_fixed(items)
ASSERT persistUpload_original(items).snapshots = persistUpload_fixed(items).snapshots
END FOR
Testing Approach: Property-based testing is the right fit for preservation checking here:
- The unique-
(hostname, metric_id)input space is large (any number of hostnames, any combination of metrics, any team and vertical mix) and exhaustive enumeration is impractical. - The preservation property is strict equality, which is well-suited to PBT shrinking — any counterexample is a small fixture demonstrating a behavior change.
- The legacy AEO-only data shape (
vertical IS NULL) must be exercised, which falls naturally out of generators that includenullverticals.
Test Plan: Capture responses from the unfixed code on unique-(hostname, metric_id) fixtures (snapshot tests). After applying the fix, re-run the same fixtures and assert equality. Then run a property-based generator that produces random unique-key scenarios and asserts the same equality.
Test Cases:
- Snapshot Equality — Empty State — Empty
compliance_items./items?team=STEAMreturns{ devices: [], team: 'STEAM', status: 'active' }./items/:hostnamereturns 404./vcl/statsreturns its zero-state shape./mttrreturns{ aging: [] }.persistUpload()writes no snapshot rows. Snapshot-test before and after the fix. - Snapshot Equality — Single AEO-Only Items — Items with
vertical IS NULLonly. Capture pre-fix responses, apply fix, assert equality across all five endpoints. - Snapshot Equality — Multiple Unique Verticals — A mix of items with
vertical IS NULLandvertical = 'NTS_AEO', but no(hostname, metric_id)collision across verticals. Capture pre-fix responses, apply fix, assert equality. /itemsTeam and Status Filter Preservation — Active and resolved items mixed across teams. Assert?team=STEAM&status=active,?team=STEAM&status=resolved, and?team=ACCESS-ENG&status=activeeach return the same devices pre-fix and post-fix on a unique-key fixture. Assert non-ALLOWED_TEAMSvalue (e.g.,?team=OTHER) returns HTTP 400./items/:hostnameActive-Then-Resolved Ordering Preservation — A device with both active and resolved metrics. Assert the response'smetricsarray has allactiveentries before allresolvedentries, sorted bymetric_idwithin each group, identical to pre-fix./vcl/statsDonut Categorization Preservation — Active items with mixed null/non-nullresolution_date. Assertdonut.blockedanddonut.in_progresscounts match pre-fix on a unique-key fixture (this is already deduped viaGROUP BY hostnamein the existing query).persistUpload()Snapshot Equality — Single-Status-Per-Hostname Fixture — Pre-populatecompliance_itemsso every hostname has rows of only one status (all active or all resolved within a team). RunpersistUpload(). Assertcompliance_snapshotsrows are identical pre-fix and post-fix.
Test Fixtures Needed
The following fixtures must be reusable across unit, integration, and property-based tests. Place them under backend/__tests__/fixtures/compliance-duplicate-failing-metrics/ (or inline as factory functions in the test file, matching the convention used in vcl-compliance-reporting.test.js).
-
fixtureCrossVerticalDuplicateActive— A single hostname with two rows for the samemetric_id, bothstatus = 'active', bothteam = 'STEAM', one withvertical IS NULLand one withvertical = 'NTS_AEO'. Differentseen_count(e.g., 3 and 5) so the representative-row policy is exercised. Used by/items,/items/:hostname,/mttr, and the/vcl/statsforecast tests. -
fixtureCrossVerticalTeamMismatch— A single hostname with two active rows for the samemetric_idbut differentteamacross verticals (STEAMin legacy,ACCESS-ENGin NTS_AEO). Used by the/vcl/statsheavy-hitters and per-team-totals test (Property 3). -
fixtureCrossVerticalStatusMismatch— A single hostname with two rows for the samemetric_idand same team, but differentstatusacross verticals (activein legacy,resolvedin NTS_AEO). Used by thepersistUpload()snapshot test (Property 6). -
fixtureMultiHostnameMixed— A combination of unique-key hostnames and one duplicated(hostname, metric_id). Used to verify that the dedup applies only to the duplicates and leaves unique entries untouched. -
fixtureLegacyOnly— All rows withvertical IS NULL. Used for preservation checks that confirm legacy data flows are unchanged. -
fixtureNTSAEOOnly— All rows withvertical = 'NTS_AEO'. Used for preservation checks that confirm multi-vertical-only data flows are unchanged. -
fixtureForecastDuplicateResolutionDate— A duplicated(hostname, metric_id)where both rows have a non-nullresolution_date(same value). Used to verify Property 4 (blockers >= 0). -
fixtureSeenCountTiebreak— A duplicated(hostname, metric_id)where both rows have the sameseen_countbut differentupload_id. Used to verify theupload_id DESCtiebreaker in theDISTINCT ON ... ORDER BYpolicy.
Unit Tests
/items: insertfixtureCrossVerticalDuplicateActive, assertfailing_metrics.length === 1andseen_count === 5./items/:hostname: insertfixtureCrossVerticalDuplicateActive, assertmetrics.length === 1for(7.1.1, active)./items/:hostnameordering: insert a fixture with one active and one resolved metric, assertmetrics[0].status === 'active'andmetrics[1].status === 'resolved'./vcl/statsheavy-hitters: insertfixtureCrossVerticalTeamMismatch, assert hostname counted exactly once in the team derived from the representative row./vcl/statsforecast: insertfixtureForecastDuplicateResolutionDate, assertblockers >= 0andforecastItems.length === teamNonCompliant./mttr: insertfixtureCrossVerticalDuplicateActivewithseen_count = 5, assert exactly one increment in the4–6 cyclesbucket forSTEAM.persistUpload()snapshot: insertfixtureCrossVerticalStatusMismatch, runpersistUpload(), assert the resulting snapshot row hascompliant === 0,non_compliant === 1,total_devices === 1.groupByHostname()direct: pass a list with two rows for the same(hostname, metric_id), assert one entry infailing_metrics. This protects the helper's contract independently of the SQL dedup.
Property-Based Tests
Use fast-check (already in use in backend/__tests__/*.property.test.js). Generators should mix vertical IS NULL and vertical = 'NTS_AEO' rows, with controllable rates of (hostname, metric_id) collision.
- Property 1 —
/items: For any list ofcompliance_itemsrows including cross-vertical duplicates, assert that for every device in the response,device.failing_metrics.length === new Set(device.failing_metrics.map(m => m.metric_id)).size. - Property 2 —
/items/:hostname: For any duplicated(hostname, metric_id, status), assertresponse.metrics.filter(m => m.metric_id === target_metric && m.status === target_status).length === 1. Assert the surviving entry'sseen_countis the maximum across duplicates. - Property 3 —
/vcl/stats: For any input including team-mismatched cross-vertical duplicates, assertSUM(heavy_hitters[*].non_compliant) === stats.non_compliant. Generate inputs that vary the rate of team mismatch from 0% to 100%. - Property 4 —
/vcl/statsforecast: For any input including duplicated(hostname, metric_id)with non-null resolution dates, assertvertical_breakdown[*].blockers >= 0and assert the unclamped expression equalsteamNonCompliant - dedupedForecastCount. - Property 5 —
/mttr: For any input including cross-vertical duplicate active rows, assertSUM(aging[*].total) === COUNT(DISTINCT (hostname, metric_id) WHERE status = 'active')over the input. Per-team totals: assertSUM(aging[*][team])over each team equals the distinct count for that team. - Property 6 —
persistUpload()snapshot invariant: For any combination of cross-vertical status mismatches, assert every row written intocompliance_snapshotssatisfiescompliant + non_compliant <= total_devicesandcompliant + non_compliant === total_devices(equality, since every hostname is classified into exactly one column). - Property 7 — Preservation: For any input where every
(hostname, metric_id)is unique across verticals, the responses from all five endpoints SHALL equal the responses on the same input from the original (unfixed) implementations. Use fast-check's shrinking to surface the smallest counterexample if the property fails. - Property 8 — Representative-row policy: For any duplicated
(hostname, metric_id), the surviving entry'sseen_countSHALL equalmax(duplicates.seen_count), and ties SHALL be broken bymax(duplicates.upload_id).
Integration Tests
- Full flow: upload a legacy AEO xlsx, then upload an NTS_AEO multi-vertical xlsx that overlaps on at least one
(hostname, metric_id). Call/items,/items/:hostname,/vcl/stats,/mttr, and readcompliance_snapshotsfor the current month. Assert all six post-fix invariants hold. - Cross-page consistency: load
/vcl/statsand/items?team=STEAMfor the same backing data. Assert that the device count from/vcl/statsheavy-hitters forSTEAMequals the number of devices in/items?team=STEAM&status=active. - ComplianceDetailPanel render: load a device with cross-vertical duplicates via
/items/:hostname, renderComplianceDetailPanel, assert exactly oneMetricChipwith the duplicatedmetric_idis present in the "Failing Metrics" section. persistUpload()end-to-end: with cross-vertical status mismatch present, run a fresh upload and assert the newcompliance_snapshotsrow's invariant holds.