Files
cve-dashboard/.kiro/specs/compliance-duplicate-chart-entries/bugfix.md
2026-05-19 15:01:25 -06:00

9.6 KiB

Bugfix Requirements Document

Introduction

Multiple compliance endpoints incorrectly key their queries by compliance_uploads.id (or by individual upload row) instead of by compliance_uploads.report_date. The compliance pipeline accepts one xlsx file per vertical (e.g., NTS_AEO, SDIT_CISO, TSI), so a single calendar date typically produces several compliance_uploads rows. Any query, aggregation, or "pick latest" logic that treats each upload as a distinct date — instead of grouping all uploads sharing a report_date — produces duplicated, fragmented, or silently dropped data.

The originally reported defect (GitLab issue #12, reported by nkapur) was the "Active Findings Over Time" chart on the Compliance page showing 3 entries for 5/11 after STEAM uploaded three vertical data sets that day. Investigation found that the same root cause — keying by upload_id instead of report_date — affects GET /trends, GET /waterfall (route handler GET /top-recurring), GET /category-trend, GET /summary, and the compliance_snapshots block in persistUpload(). This spec covers fixes for all five.

Bug Analysis

Current Behavior (Defect)

1.1 WHEN multiple compliance uploads exist with the same report_date (due to per-vertical uploads) THEN the system returns one trend data point per upload row, producing duplicate x-axis entries on the chart

1.2 WHEN the chart renders multiple entries for the same date THEN the x-axis displays repeated date labels (e.g., three "05/11/25" entries) making the trend line misleading and unreadable

1.3 WHEN per-team counts are computed for duplicate-date uploads THEN the system counts items per individual upload_id rather than aggregating across all uploads sharing that date, resulting in fragmented per-team totals

/waterfall (route handler GET /top-recurring)

1.4 WHEN multiple compliance uploads exist with the same report_date THEN the underlying query SELECT id, report_date, ... FROM compliance_uploads ORDER BY report_date ASC returns one row per upload and computeWaterfall() emits one bar per row, producing multiple bars stacked under the same date label

1.5 WHEN computeWaterfall() carries start forward across multiple rows that share a report_date THEN each per-vertical row's new_count/recurring_count/resolved_count deltas are applied sequentially as if they were separate cycles, so the running start and end totals for that date are wrong (they reflect the last row's running balance rather than the date-level aggregate)

/category-trend

1.6 WHEN multiple compliance uploads exist with the same report_date THEN the query grouped by cu.id, cu.report_date, category returns one row per (upload, category) pair, producing duplicated stacked bars per date when the chart is keyed on report_date

1.7 WHEN per-category counts are surfaced for a date with multiple uploads THEN counts are reported per-vertical instead of aggregated across all verticals sharing that report_date, so no row in the response represents the full date-level category total

/summary

1.8 WHEN multiple uploads exist for the latest report_date THEN the query WHERE vertical IS NULL ORDER BY id DESC LIMIT 1 (with fallback to vertical = 'NTS_AEO') selects a single upload for that date and discards the summary_json of all other verticals, silently dropping their data

1.9 WHEN the summary returned by /summary is compared against /trends, /waterfall, or /category-trend for the same latest date THEN the figures do not reconcile, because /summary reflects one vertical's upload while the other endpoints aggregate (or duplicate) across all verticals

compliance_snapshots creation in persistUpload()

1.10 WHEN persistUpload() computes per-vertical compliance stats THEN the query filters only WHERE team IS NOT NULL and groups by team, with no filter or grouping on vertical, so item counts pulled from compliance_items are aggregated across every vertical present in the table

1.11 WHEN the resulting per-team totals are written into compliance_snapshots for a single vertical's upload THEN the total_devices, compliant, and non_compliant columns reflect cross-vertical totals rather than the snapshotted vertical, corrupting the monthly snapshot record

Expected Behavior (Correct)

2.1 WHEN multiple compliance uploads exist with the same report_date THEN the system SHALL aggregate their counts (new_count, recurring_count, resolved_count, total_active) into a single trend data point per unique date

2.2 WHEN the chart renders trend data THEN each unique report_date SHALL appear exactly once on the x-axis regardless of how many upload records exist for that date

2.3 WHEN per-team counts are computed for a date with multiple uploads THEN the system SHALL aggregate team item counts across all uploads sharing that report_date, producing a single per-team total per date

/waterfall (route handler GET /top-recurring)

2.4 WHEN multiple compliance uploads exist with the same report_date THEN the system SHALL aggregate new_count, recurring_count, and resolved_count across all uploads sharing that report_date into a single per-date row before passing rows to computeWaterfall()

2.5 WHEN computeWaterfall() consumes the aggregated rows THEN it SHALL emit exactly one waterfall entry per unique report_date and the running start/end totals SHALL advance using each date's date-level aggregate deltas (not per-upload deltas)

/category-trend

2.6 WHEN multiple compliance uploads exist with the same report_date THEN the query SHALL group by cu.report_date, category (without cu.id in the GROUP BY) and SUM/COUNT items across all uploads sharing the date, producing one row per (date, category) pair

2.7 WHEN per-category counts are returned for a date with multiple uploads THEN the count field SHALL be the sum of items in that category across every upload for that report_date

/summary

2.8 WHEN multiple uploads exist for the latest report_date THEN the system SHALL either (a) merge the summary_json of all uploads sharing that date into a single combined summary response, or (b) return a documented, well-defined selection (e.g., a named "primary" vertical) along with metadata indicating which uploads were considered, rather than silently picking one by ORDER BY id DESC LIMIT 1

2.9 WHEN the response is constructed for a date with multiple uploads THEN the upload field SHALL identify the set of uploads that contributed to the response (or, if a single representative is returned, the response SHALL include a flag/field indicating other uploads exist for the same date that were not merged)

compliance_snapshots creation in persistUpload()

2.10 WHEN persistUpload() computes per-vertical compliance stats THEN the query SHALL filter compliance_items by the vertical of the upload being persisted (in addition to team IS NOT NULL) and group by vertical, team, so each snapshot row reflects only the items belonging to that vertical

2.11 WHEN snapshots are written into compliance_snapshots THEN the total_devices, compliant, and non_compliant values SHALL match the items belonging to the snapshotted vertical only and SHALL NOT be inflated by items from other verticals

Unchanged Behavior (Regression Prevention)

3.1 WHEN only one compliance upload exists per report_date (single-file upload workflow) THEN the system SHALL CONTINUE TO return that date's counts unchanged as a single trend data point

3.2 WHEN the chart displays trend data THEN the system SHALL CONTINUE TO show all existing data fields (new_count, recurring_count, resolved_count, total_active, per-team breakdowns) with correct values

3.3 WHEN no compliance uploads exist THEN the system SHALL CONTINUE TO return an empty trends array and the chart SHALL CONTINUE TO display the "no data" state

3.4 WHEN only one compliance upload exists per report_date THEN GET /waterfall SHALL CONTINUE TO emit one entry per date with the same start, new_count, recurring_count, resolved_count, and end fields and the same running-total semantics as before

3.5 WHEN only one compliance upload exists per report_date THEN GET /category-trend SHALL CONTINUE TO return one row per (date, category) pair with the same report_date, category, and count field shape as before

3.6 WHEN only one compliance upload exists for the latest report_date THEN GET /summary SHALL CONTINUE TO return the same entries, overall_scores, and upload shape as before, including the existing vertical IS NULLvertical = 'NTS_AEO' fallback for selecting which upload's summary to surface

3.7 WHEN /summary is called with a team query parameter THEN the system SHALL CONTINUE TO filter entries by the requested team and SHALL CONTINUE TO reject teams not in ALLOWED_TEAMS with HTTP 400

3.8 WHEN persistUpload() writes a snapshot for a vertical that is the only vertical present in compliance_items for that month THEN the snapshot row's total_devices, compliant, non_compliant, and compliance_pct SHALL CONTINUE TO be identical to the pre-fix values (no behavioural change in the single-vertical case)

3.9 WHEN persistUpload() encounters an error during snapshot creation THEN the system SHALL CONTINUE TO log the error and complete the upload commit successfully (snapshot creation remains non-critical)

3.10 WHEN any of these endpoints are queried with no matching data (no uploads, no items for a vertical, no items in a category) THEN the system SHALL CONTINUE TO return the existing empty-state response shapes