# Bugfix Requirements Document

## Introduction

Several Compliance backend endpoints select from `compliance_items` without scoping to a single vertical, so when the same `(hostname, metric_id)` exists in both a legacy `vertical IS NULL` upload and a multi-vertical `vertical = 'NTS_AEO'` upload, the duplicate rows distort the response. The originally reported symptom is on the device-level violation view: hostname `STEAM-INTERSIGHT` (IP 172.16.30.40) shows metric `7.1.1` listed twice in its failing metrics list (GitLab issue #13, reported by nkapur). Investigation found the same duplication pattern in additional endpoints that drive per-team and per-vertical reporting.

This spec covers the full class of "duplicate `(hostname, metric_id)` rows across verticals" bugs in `backend/routes/compliance.js`. The affected surfaces are:

1. `GET /items` — failing metrics list per device (originally reported)
2. `GET /items/:hostname` — device detail metrics array (originally reported)
3. `persistUpload()` `compliance_snapshots` creation block — per-vertical compliant/non-compliant counts
4. `GET /vcl/stats` — heavy-hitters team counts, per-team device totals, and forecast-burndown row counts
5. `GET /mttr` — per-team aging bucket counts

**Root Cause:** Each affected query selects from `compliance_items` either with a vertical filter that admits both legacy and multi-vertical rows (`vertical IS NULL OR vertical = 'NTS_AEO'`) or with no vertical filter at all. Some queries dedupe at the hostname level via `COUNT(DISTINCT hostname)`, which protects against per-team device totals being inflated, but does not protect aggregations that depend on `(hostname, metric_id)` uniqueness, status uniqueness per hostname, or team uniqueness per hostname. The `groupByHostname` helper and the `/items/:hostname` query likewise have no deduplication at all, so every duplicate row becomes a duplicate metric in the response.

## Bug Analysis

### Current Behavior (Defect)

1.1 WHEN a device has compliance_items rows for the same (hostname, metric_id) pair across multiple verticals (e.g., one row with `vertical IS NULL` and another with `vertical = 'NTS_AEO'`) THEN the `/items` endpoint returns both rows and the `groupByHostname` function adds the same metric_id to `failing_metrics` multiple times

1.2 WHEN the `/items/:hostname` detail endpoint is called for a device that has compliance_items rows across multiple verticals THEN the system returns duplicate metric entries in the `metrics` array because the query has no vertical filter or deduplication

1.3 WHEN the ComplianceDetailPanel renders the metrics array for a device with duplicate entries THEN the same metric_id chip appears multiple times in the "Failing Metrics" section, confusing users about the actual number of distinct violations

1.4 WHEN `persistUpload()` builds per-team rows for `compliance_snapshots` and the same hostname has compliance_items rows in both a legacy `vertical IS NULL` upload and an `vertical = 'NTS_AEO'` upload with different statuses (e.g., `active` in one vertical and `resolved` in the other) THEN the snapshot query counts that hostname in BOTH the `compliant` and `non_compliant` columns for the team, inflating per-team totals and producing a row where `compliant + non_compliant > total_devices`

1.5 WHEN `/vcl/stats` computes the heavy-hitters table and per-team totals and the same hostname has rows in two verticals where the `team` column differs (e.g., `team = 'STEAM'` in the legacy row and `team = 'ACCESS-ENG'` in the NTS_AEO row) THEN the `COUNT(DISTINCT hostname)` aggregate counts the hostname under both team groups, double-counting the device across teams

1.6 WHEN `/vcl/stats` builds the forecast-burndown for a team by selecting `resolution_date` rows from compliance_items without `DISTINCT` AND the same `(hostname, metric_id)` has duplicate active rows across verticals, both with a non-null `resolution_date` THEN the forecast row count is inflated and the `blockers = teamNonCompliant - forecastItems.length` calculation can go negative or report a misleadingly low blocker count

1.7 WHEN `/mttr` selects `seen_count, team` from active compliance_items without deduplication AND the same (hostname, metric_id) has duplicate active rows across verticals THEN each duplicate row is bucketed independently in `bucketAgingItems`, inflating per-team aging totals for that team

### Expected Behavior (Correct)

2.1 WHEN a device has compliance_items rows for the same (hostname, metric_id) pair across multiple verticals THEN the `/items` endpoint SHALL return only one entry per unique (hostname, metric_id) combination in the `failing_metrics` array, using the row with the highest `seen_count` or most recent `upload_id` as the representative

2.2 WHEN the `/items/:hostname` detail endpoint is called for a device with rows across multiple verticals THEN the system SHALL return only one metric entry per unique (metric_id, status) combination, preferring the row with the highest `seen_count` or most recent data

2.3 WHEN the ComplianceDetailPanel renders the metrics for a device THEN each distinct metric_id SHALL appear exactly once in the "Failing Metrics" section regardless of how many underlying compliance_items rows exist for that metric across verticals

2.4 WHEN `persistUpload()` writes a per-team row to `compliance_snapshots` THEN the system SHALL count each unique hostname at most once across the (compliant, non_compliant) columns for that team, classifying a hostname as `non_compliant` if it has any active row in any vertical for the team and `compliant` only if all of its rows for the team are resolved, so that `compliant + non_compliant ≤ total_devices` always holds

2.5 WHEN `/vcl/stats` computes heavy-hitters and per-team totals THEN the system SHALL count each unique hostname under exactly one team — the team derived from the most recent (or otherwise canonical) compliance_items row for that hostname across all verticals — so that summing `non_compliant` across teams equals the total non-compliant device count

2.6 WHEN `/vcl/stats` builds the forecast-burndown for a team THEN the forecast row count SHALL be deduplicated by `(hostname, metric_id)` so that cross-vertical duplicate rows contribute at most one entry per unique violation, and `blockers = teamNonCompliant - dedupedForecastCount` SHALL never be negative

2.7 WHEN `/mttr` computes aging buckets per team THEN each unique (hostname, metric_id) active violation SHALL be bucketed exactly once using a single representative `seen_count` value, regardless of how many duplicate rows exist across verticals

### Unchanged Behavior (Regression Prevention)

3.1 WHEN a device has multiple distinct failing metric_ids (e.g., 7.1.1 and 7.2.1) THEN the system SHALL CONTINUE TO display each distinct metric_id separately in the failing metrics list

3.2 WHEN a device has both active and resolved entries for the same metric_id THEN the system SHALL CONTINUE TO show the metric in the appropriate section (active or resolved) based on its status

3.3 WHEN only one compliance upload exists per vertical for a device (no cross-vertical duplication) THEN the system SHALL CONTINUE TO display metrics unchanged with correct seen_count, first_seen, and last_seen values

3.4 WHEN the `/items` list endpoint is called with a team filter THEN the system SHALL CONTINUE TO return all devices for that team with their correct (now deduplicated) failing metrics and accurate seen_count values

3.5 WHEN `persistUpload()` builds `compliance_snapshots` for a team whose devices exist in only one vertical THEN per-team `total_devices`, `compliant`, `non_compliant`, and `compliance_pct` SHALL CONTINUE TO match their pre-fix values

3.6 WHEN `/vcl/stats` computes overall stats, donut categorization, heavy-hitters, per-team totals, and forecast-burndown for devices that exist in only one vertical THEN every field in the response SHALL CONTINUE TO match its pre-fix value

3.7 WHEN `/mttr` computes aging buckets for teams whose active items exist in only one vertical THEN per-team and total bucket counts SHALL CONTINUE TO match their pre-fix values