Add Kiro specs, hooks, and steering files — development tooling archive

This commit is contained in:
Jordan Ramos
2026-05-01 21:30:05 +00:00
parent b9fa1281a9
commit 05d95309b4
90 changed files with 12312 additions and 0 deletions

View File

@@ -0,0 +1 @@
{"specId": "e83a2e8f-4508-4669-9697-41219c8a7c71", "workflowType": "requirements-first", "specType": "feature"}

View File

@@ -0,0 +1,364 @@
# Design Document: Compliance Schema Drift Check
## Overview
This feature adds schema drift detection to the compliance xlsx upload flow. When a user uploads a weekly NTS_AEO report, the backend extracts the xlsx structural schema (sheet names, column headers, metric values) and compares it against a shared parser configuration file. The comparison produces a categorised drift report with three severity levels: breaking (blocks upload), silent-miss (warns but allows proceeding), and cosmetic (informational). The frontend displays these findings in a new drift review phase inside the upload modal, inserted between the upload spinner and the existing diff preview.
The parser configuration dicts (`METRIC_CATEGORIES`, `CORE_COLS`, `SKIP_SHEETS`) currently defined inline in `parse_compliance_xlsx.py` are extracted into a shared JSON file (`backend/scripts/compliance_config.json`) that both the Python parser and the Node.js drift checker read. This establishes a single source of truth for parser configuration.
### Design Decisions
1. **Shared JSON config over database storage**: The parser config is a developer-maintained mapping, not user data. A JSON file is version-controllable, diffable, and readable by both Python and Node.js without additional dependencies.
2. **Python subprocess for schema extraction**: The existing `dump_xlsx_schema.py` already uses openpyxl to extract xlsx structure. We adapt this into a new `extract_xlsx_schema.py` script that the Node.js backend invokes as a subprocess, consistent with how `parse_compliance_xlsx.py` is already called.
3. **Node.js drift comparison logic**: The drift comparison is pure object comparison (sets of strings) with no xlsx parsing. Implementing it in Node.js avoids a second Python subprocess call and keeps the logic co-located with the route handler.
4. **Graceful degradation**: If the drift check fails, the upload flow proceeds normally with `drift: null` and a `drift_error` message. The drift check is additive and must never block the existing workflow.
## Architecture
```mermaid
sequenceDiagram
participant User
participant Modal as ComplianceUploadModal
participant API as POST /api/compliance/preview
participant Schema as extract_xlsx_schema.py
participant Drift as driftChecker (Node.js)
participant Config as compliance_config.json
participant Parser as parse_compliance_xlsx.py
User->>Modal: Drops xlsx file
Modal->>API: POST /preview (multipart)
API->>Schema: spawn python3 extract_xlsx_schema.py <file>
Schema-->>API: JSON { sheets: [...] }
API->>Config: fs.readFileSync(compliance_config.json)
API->>Drift: compareSchemaToDrift(schema, config)
Drift-->>API: { breaking: [...], silent_miss: [...], cosmetic: [...] }
API->>Parser: spawn python3 parse_compliance_xlsx.py <file>
Parser->>Config: reads compliance_config.json
Parser-->>API: JSON { items, summary, ... }
API->>API: computeDiff(db, items)
API-->>Modal: { drift, diff, tempFile, ... }
alt drift has findings
Modal->>User: Show drift review phase
alt breaking findings exist
Modal->>User: Block "Continue to Preview"
else no breaking findings
User->>Modal: Click "Continue to Preview"
Modal->>User: Show diff preview
end
else no drift findings
Modal->>User: Show diff preview directly
end
```
### File Layout
```
backend/
scripts/
compliance_config.json # NEW — shared parser config (single source of truth)
extract_xlsx_schema.py # NEW — extracts xlsx structure as JSON
parse_compliance_xlsx.py # MODIFIED — reads config from JSON file
dump_xlsx_schema.py # UNCHANGED — standalone diagnostic tool
routes/
compliance.js # MODIFIED — drift check in /preview, new driftChecker module
helpers/
driftChecker.js # NEW — compareSchemaToDrift() function
frontend/
src/components/pages/
ComplianceUploadModal.js # MODIFIED — new drift-review phase
```
## Components and Interfaces
### 1. Shared Parser Configuration (`compliance_config.json`)
```json
{
"metric_categories": {
"2.3.4i": "Vulnerability Management",
"2.3.6i": "Vulnerability Management",
"5.2.4": "Access & MFA"
},
"core_cols": [
"Preferred - Hostname",
"GRANITE - IPv4_Address",
"GRANITE - Type",
"Team",
"Compliant",
"Source_Network",
"Vertical",
"GRANITE - Equip_Inst_ID",
"GRANITE - RESPONSIBLE_TEAM"
],
"skip_sheets": ["Summary", "CMDB_9box", "Vulns", "Aging Dashboard"]
}
```
### 2. Schema Extractor (`extract_xlsx_schema.py`)
**Input**: File path as CLI argument.
**Output** (stdout JSON):
```json
{
"sheets": [
{
"name": "Summary",
"columns": ["Metric", "Non-Compliant", "..."],
"metric_values": ["2.3.4i", "5.2.4", "..."]
},
{
"name": "2.3.4i",
"columns": ["Preferred - Hostname", "GRANITE - IPv4_Address", "..."]
}
]
}
```
- Uses openpyxl in read-only mode.
- Extracts sheet names, first-row column headers per sheet, and unique metric values from the Summary sheet (header at row 4, data from row 5 onward).
- On error, returns `{ "error": "..." }` on stdout and exits with non-zero code.
### 3. Drift Checker (`backend/helpers/driftChecker.js`)
**Function**: `compareSchemaToDrift(schema, config) => DriftReport`
**Parameters**:
- `schema` — object returned by `extract_xlsx_schema.py`
- `config` — object parsed from `compliance_config.json`
**Returns** (`DriftReport`):
```javascript
{
breaking: [
{ severity: 'breaking', message: 'Detail sheet "2.3.4i" is missing core column "Team"', value: 'Team', sheet: '2.3.4i' }
],
silent_miss: [
{ severity: 'silent_miss', message: 'Unknown metric "9.1.2" in Summary — not in metric_categories', value: '9.1.2' }
],
cosmetic: [
{ severity: 'cosmetic', message: 'New column "Extra_Field" in sheet "2.3.4i" — will be captured in extra_json', value: 'Extra_Field', sheet: '2.3.4i' }
]
}
```
**Drift rules**:
| Rule | Severity | Condition |
|---|---|---|
| Missing core column | `breaking` | A detail sheet (not in `skip_sheets`, present in xlsx) is missing a column from `core_cols` |
| Missing detail sheet | `breaking` | A sheet name in `metric_categories` (and not in `skip_sheets`) is absent from the xlsx |
| Unknown metric value | `silent_miss` | A metric value in the Summary sheet is not a key in `metric_categories` |
| Unknown sheet | `silent_miss` | An xlsx sheet is not in `skip_sheets` and not in `metric_categories` |
| New column in detail sheet | `cosmetic` | A detail sheet has columns not in `core_cols` |
| Stale metric category | `cosmetic` | A key in `metric_categories` does not appear in the Summary sheet's metric values |
### 4. Preview Endpoint Changes (`POST /api/compliance/preview`)
The existing `/preview` handler is modified to:
1. After receiving the uploaded file, spawn `extract_xlsx_schema.py` to get the xlsx schema.
2. Read `compliance_config.json` from disk.
3. Call `compareSchemaToDrift(schema, config)` to produce the drift report.
4. Proceed with the existing `parseXlsx()` call and `computeDiff()`.
5. Include `drift` (the DriftReport object) and optionally `drift_error` (string) in the response.
If the schema extraction or drift check throws, set `drift: null` and `drift_error: <message>`, then continue with the normal flow.
**Updated response shape**:
```json
{
"drift": {
"breaking": [],
"silent_miss": [],
"cosmetic": []
},
"drift_error": null,
"diff": { "new_count": 5, "recurring_count": 120, "resolved_count": 3 },
"tempFile": "/path/to/temp.json",
"filename": "NTS_AEO_2026_03_25.xlsx",
"report_date": "2026-03-25",
"total_items": 125
}
```
### 5. Upload Modal Changes (`ComplianceUploadModal.js`)
**New phase**: `drift-review` inserted between `uploading` and `preview`.
**Phase flow**:
```
idle → uploading → drift-review (if findings) → preview → committing → done
→ preview (if no findings)
```
**Drift review UI**:
- Findings grouped by severity: breaking first, then silent-miss, then cosmetic.
- Each group has a header with severity label and count badge.
- Groups with more than 5 findings collapse with a "Show N more" toggle.
- Each finding shows the message text and the triggering value.
- Breaking findings: red text (`#EF4444`), red left-border accent.
- Silent-miss findings: amber text (`#F59E0B`), amber left-border accent.
- Cosmetic findings: muted text (`#94A3B8`), subtle left-border accent.
- "Cancel" button returns to idle. "Continue to Preview" button advances to diff preview.
- "Continue to Preview" is disabled when breaking findings exist, with a message explaining the block.
- When `drift` is `null` (drift check failed), skip drift-review and go straight to preview.
## Data Models
### DriftFinding
```javascript
{
severity: 'breaking' | 'silent_miss' | 'cosmetic',
message: string, // Human-readable description
value: string, // The specific column/sheet/metric that triggered the finding
sheet: string|null // Sheet name context (when applicable)
}
```
### DriftReport
```javascript
{
breaking: DriftFinding[],
silent_miss: DriftFinding[],
cosmetic: DriftFinding[]
}
```
### ParserConfig
```javascript
{
metric_categories: { [metricId: string]: string }, // metric ID → category name
core_cols: string[], // column names for main item fields
skip_sheets: string[] // sheet names excluded from parsing
}
```
### XlsxSchema (output of extract_xlsx_schema.py)
```javascript
{
sheets: [
{
name: string,
columns: string[],
metric_values?: string[] // only present on Summary sheet
}
]
}
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Breaking drift completeness
*For any* xlsx schema and parser config, the drift checker SHALL produce a breaking finding for every core column missing from every detail sheet, and for every detail sheet (present in `metric_categories` but not in `skip_sheets`) absent from the xlsx — and no other breaking findings. The set of breaking findings is exactly the union of missing-core-column findings and missing-detail-sheet findings.
**Validates: Requirements 3.1, 3.2, 3.3**
### Property 2: Silent-miss drift completeness
*For any* xlsx schema and parser config, the drift checker SHALL produce a silent-miss finding for every metric value in the Summary sheet not present in `metric_categories`, and for every xlsx sheet not in `skip_sheets` and not in `metric_categories` — and no other silent-miss findings. The set of silent-miss findings is exactly the union of unknown-metric findings and unknown-sheet findings.
**Validates: Requirements 4.1, 4.2, 4.3**
### Property 3: Cosmetic drift completeness
*For any* xlsx schema and parser config, the drift checker SHALL produce a cosmetic finding for every column in a detail sheet not present in `core_cols`, and for every key in `metric_categories` not present in the Summary sheet's metric values — and no other cosmetic findings. The set of cosmetic findings is exactly the union of new-column findings and stale-metric findings.
**Validates: Requirements 5.1, 5.2, 5.3**
### Property 4: Drift severity ordering
*For any* drift report containing a mix of breaking, silent-miss, and cosmetic findings, the grouping function SHALL always return findings ordered by severity: all breaking findings first, then all silent-miss findings, then all cosmetic findings.
**Validates: Requirements 8.1**
## Error Handling
### Python Script Failures
| Failure | Handling |
|---|---|
| `extract_xlsx_schema.py` exits non-zero | Preview endpoint sets `drift: null`, `drift_error: <stderr message>`, continues with normal parse flow |
| `extract_xlsx_schema.py` returns invalid JSON | Same as above — caught in JSON.parse, treated as drift check failure |
| `compliance_config.json` missing or invalid (Node.js read) | Preview endpoint returns 500 with message "Configuration file could not be loaded" |
| `compliance_config.json` missing or invalid (Python parser read) | Parser exits non-zero, stderr describes the error, preview endpoint returns 500 with parse error |
| xlsx file cannot be opened by schema extractor | Schema extractor returns `{ "error": "..." }` on stdout, exits non-zero; drift check skipped gracefully |
### Frontend Error States
| Condition | Behavior |
|---|---|
| `drift` is `null` in preview response | Skip drift-review phase, proceed directly to diff preview |
| `drift_error` is present | Optionally display a subtle warning in the diff preview that drift check was skipped |
| Network error during upload | Existing error phase handling (unchanged) |
### Config File Validation
The Node.js config loader validates that:
- The file exists and is readable.
- The content parses as valid JSON.
- The parsed object contains `metric_categories` (object), `core_cols` (array), and `skip_sheets` (array).
If any check fails, the loader throws with a descriptive message. The preview handler catches this and returns a 500 response.
## Testing Strategy
### Unit Tests
**Drift checker (`driftChecker.js`)**:
- Breaking: missing core column produces finding with correct severity, message, value, and sheet.
- Breaking: missing detail sheet produces finding.
- Silent-miss: unknown metric value produces finding.
- Silent-miss: unknown sheet produces finding.
- Cosmetic: new column in detail sheet produces finding.
- Cosmetic: stale metric category produces finding.
- Empty schema (no sheets) produces appropriate findings.
- Config with empty metric_categories, core_cols, or skip_sheets.
- Schema and config that are perfectly aligned produce zero findings.
**Config loader**:
- Valid config file loads correctly.
- Missing file throws descriptive error.
- Invalid JSON throws descriptive error.
- Config missing required keys throws descriptive error.
**Frontend drift review component**:
- Drift review phase renders when findings exist.
- "Continue to Preview" button disabled when breaking findings present.
- "Continue to Preview" button enabled when no breaking findings.
- Groups collapse at 5+ findings with correct "Show N more" count.
- Cancel returns to idle phase.
- Skips drift review when drift is null or has no findings.
### Property-Based Tests
Property-based tests use `fast-check` (JavaScript) to verify the four correctness properties defined above. Each test generates random schema and config objects and verifies the drift checker output against the expected set-theoretic result.
**Configuration**:
- Minimum 100 iterations per property test.
- Each test tagged with: **Feature: compliance-schema-drift-check, Property {N}: {title}**
**Generators**:
- `arbitraryParserConfig`: generates random `metric_categories` (object with 020 string keys mapped to category strings), `core_cols` (array of 015 unique column name strings), `skip_sheets` (array of 05 unique sheet name strings).
- `arbitraryXlsxSchema`: generates random sheets array, each with a name, columns array, and optionally metric_values (for the Summary sheet). Sheet names, column names, and metric values drawn from a shared pool to ensure meaningful overlap with the config.
### Integration Tests
- Preview endpoint returns drift report alongside existing diff data.
- Preview endpoint returns 200 with breaking drift (does not error).
- Preview endpoint gracefully degrades when drift check fails (`drift: null`, `drift_error` present).
- Preview endpoint returns 500 when config file is missing.
- Python parser reads from `compliance_config.json` and produces same output as before.
- Commit endpoint is unchanged and does not reference drift.

View File

@@ -0,0 +1,128 @@
# Requirements Document
## Introduction
The compliance upload flow in the STEAM Security Dashboard parses weekly NTS_AEO xlsx reports using a Python parser (`parse_compliance_xlsx.py`) that relies on three hand-maintained configuration dicts: `METRIC_CATEGORIES` (metric ID to category mapping), `CORE_COLS` (column names that become main item fields), and `SKIP_SHEETS` (sheet names excluded from parsing). When the xlsx report structure changes — new metrics appear, sheets are renamed, columns are added or removed — the parser silently miscategorises data, drops fields, or fails outright. Currently, detecting this drift requires a separate manual agent workflow.
This feature builds schema drift detection directly into the upload flow. During the preview step, the backend extracts the xlsx structure and compares it against the parser configuration. The frontend displays categorised drift findings (breaking, silent-miss, cosmetic) in the upload modal before the user sees the diff preview. Breaking findings block the upload; silent-miss findings warn but allow proceeding; cosmetic findings are informational. The parser configuration dicts are extracted into a shared JSON config file that both the Python parser and the Node.js backend can read, establishing a single source of truth.
## Glossary
- **Drift_Checker**: The backend module that compares an xlsx file's structural schema against the Parser_Config and produces a categorised Drift_Report.
- **Parser_Config**: A shared JSON configuration file (`backend/scripts/compliance_config.json`) containing `metric_categories`, `core_cols`, and `skip_sheets`. This file is the single source of truth read by both the Python parser and the Node.js backend.
- **Drift_Report**: A structured object returned by the Drift_Checker containing arrays of findings grouped by severity: `breaking`, `silent_miss`, and `cosmetic`.
- **Drift_Finding**: A single entry in the Drift_Report, containing a severity level, a human-readable message, and the specific value that triggered the finding (e.g., a column name, sheet name, or metric ID).
- **Breaking_Finding**: A Drift_Finding indicating the xlsx structure will cause parse errors or data loss. Examples: a core column missing from a detail sheet, a previously existing sheet removed or renamed.
- **Silent_Miss_Finding**: A Drift_Finding indicating data exists in the xlsx but will be dropped or miscategorised by the parser. Examples: a new metric value in the Summary sheet not present in `metric_categories`, a new sheet not in `skip_sheets` and not in `metric_categories`.
- **Cosmetic_Finding**: A Drift_Finding indicating a minor discrepancy worth noting but not blocking. Examples: new columns in known sheets (automatically captured in `extra_json`), stale entries in `metric_categories` that no longer appear in the xlsx.
- **Upload_Modal**: The `ComplianceUploadModal.js` component that manages the file upload flow through phases: idle, uploading, drift-review, preview, committing, done, and error.
- **Preview_Endpoint**: The `POST /api/compliance/preview` endpoint that parses the uploaded xlsx, runs the drift check, computes the diff, and returns both the Drift_Report and diff counts.
- **Schema_Extractor**: The logic (adapted from `dump_xlsx_schema.py`) that reads an xlsx file using openpyxl and extracts sheet names, column headers per sheet, and metric values from the Summary sheet.
- **Detail_Sheet**: Any sheet in the xlsx that is not in the `skip_sheets` set and is parsed for non-compliant item rows.
## Requirements
### Requirement 1: Shared Parser Configuration File
**User Story:** As a developer, I want the parser configuration dicts extracted into a shared JSON file, so that both the Python parser and the Node.js backend read from a single source of truth.
#### Acceptance Criteria
1. THE Parser_Config SHALL be stored at `backend/scripts/compliance_config.json` as a JSON file containing three keys: `metric_categories` (object mapping metric ID strings to category name strings), `core_cols` (array of column name strings), and `skip_sheets` (array of sheet name strings).
2. THE Parser_Config SHALL contain the same values currently defined inline in `METRIC_CATEGORIES`, `CORE_COLS`, and `SKIP_SHEETS` in `parse_compliance_xlsx.py`.
3. WHEN the Python parser starts, THE Python parser SHALL read `metric_categories`, `core_cols`, and `skip_sheets` from the Parser_Config file instead of using inline dict definitions.
4. IF the Parser_Config file is missing or contains invalid JSON, THEN THE Python parser SHALL exit with a non-zero exit code and print a descriptive error message to stderr.
5. WHEN the Node.js backend handles a preview request, THE Drift_Checker SHALL read the Parser_Config file to obtain the current metric categories, core columns, and skip sheets.
6. IF the Parser_Config file is missing or contains invalid JSON when the Node.js backend reads it, THEN THE Preview_Endpoint SHALL return a 500 error with a message indicating the configuration file could not be loaded.
### Requirement 2: Schema Extraction from Uploaded xlsx
**User Story:** As a developer, I want the backend to extract the structural schema from an uploaded xlsx file, so that the drift checker can compare it against the parser configuration.
#### Acceptance Criteria
1. WHEN an xlsx file is uploaded to the Preview_Endpoint, THE Schema_Extractor SHALL extract the list of sheet names, the column headers from the first row of each sheet, and the unique metric values from the Summary sheet's Metric column (header at row 4, data from row 5 onward).
2. THE Schema_Extractor SHALL use openpyxl in read-only mode to extract the xlsx structure, reusing the approach from `dump_xlsx_schema.py`.
3. THE Schema_Extractor SHALL run as a Python subprocess invoked by the Node.js backend, returning the extracted schema as JSON on stdout.
4. IF the xlsx file cannot be opened or contains no sheets, THEN THE Schema_Extractor SHALL return a JSON error object on stdout and exit with a non-zero exit code.
### Requirement 3: Drift Detection — Breaking Findings
**User Story:** As a compliance analyst, I want the system to detect structural changes that will cause parse failures or data loss, so that I do not upload a report that produces corrupt data.
#### Acceptance Criteria
1. WHEN a Detail_Sheet is missing one or more columns listed in `core_cols` of the Parser_Config, THE Drift_Checker SHALL produce a Breaking_Finding for each missing column, identifying the sheet name and column name.
2. WHEN a sheet name that previously existed as a Detail_Sheet (present in `metric_categories` but not in `skip_sheets`) is absent from the uploaded xlsx, THE Drift_Checker SHALL produce a Breaking_Finding identifying the missing sheet name.
3. THE Drift_Checker SHALL classify all Breaking_Findings with severity `"breaking"`.
### Requirement 4: Drift Detection — Silent-Miss Findings
**User Story:** As a compliance analyst, I want the system to detect when new data in the xlsx will be silently miscategorised or dropped, so that I can update the parser configuration before proceeding.
#### Acceptance Criteria
1. WHEN the Summary sheet contains metric values not present as keys in `metric_categories` of the Parser_Config, THE Drift_Checker SHALL produce a Silent_Miss_Finding for each unknown metric value.
2. WHEN the xlsx contains sheets that are not in `skip_sheets` and whose names do not appear as keys in `metric_categories`, THE Drift_Checker SHALL produce a Silent_Miss_Finding for each unknown sheet, indicating it will be parsed with an 'Other' category.
3. THE Drift_Checker SHALL classify all Silent_Miss_Findings with severity `"silent_miss"`.
### Requirement 5: Drift Detection — Cosmetic Findings
**User Story:** As a compliance analyst, I want to see informational notes about minor schema differences, so that I have full visibility into how the xlsx structure has evolved.
#### Acceptance Criteria
1. WHEN a Detail_Sheet contains columns not present in `core_cols` of the Parser_Config, THE Drift_Checker SHALL produce a Cosmetic_Finding for each new column, noting that the column data will be captured in `extra_json`.
2. WHEN `metric_categories` in the Parser_Config contains metric IDs that do not appear in the Summary sheet's metric values, THE Drift_Checker SHALL produce a Cosmetic_Finding for each stale metric ID.
3. THE Drift_Checker SHALL classify all Cosmetic_Findings with severity `"cosmetic"`.
### Requirement 6: Preview Endpoint Drift Integration
**User Story:** As a developer, I want the preview endpoint to include the drift report in its response, so that the frontend can display drift findings before showing the diff preview.
#### Acceptance Criteria
1. WHEN the Preview_Endpoint processes an uploaded xlsx file, THE Preview_Endpoint SHALL run the Schema_Extractor and Drift_Checker before running the existing parser and diff computation.
2. THE Preview_Endpoint SHALL include a `drift` field in the JSON response containing the Drift_Report with `breaking`, `silent_miss`, and `cosmetic` arrays.
3. WHEN the drift check produces Breaking_Findings, THE Preview_Endpoint SHALL still return a 200 response with the Drift_Report, allowing the frontend to display the findings and block the commit.
4. IF the Schema_Extractor or Drift_Checker fails unexpectedly, THEN THE Preview_Endpoint SHALL proceed with the normal parse and diff flow, returning a `drift` field set to `null` and a `drift_error` field with a descriptive message, so that the upload flow is not blocked by drift check failures.
### Requirement 7: Upload Modal Drift Review Phase
**User Story:** As a compliance analyst, I want to see drift findings in the upload modal after file upload and before the diff preview, so that I can assess schema compatibility before deciding to proceed.
#### Acceptance Criteria
1. WHEN the Preview_Endpoint returns a Drift_Report with one or more findings, THE Upload_Modal SHALL display a drift review phase between the uploading spinner and the diff preview.
2. THE Upload_Modal SHALL display Breaking_Findings with red text and a red left-border accent, using the dashboard danger color (`#EF4444`).
3. THE Upload_Modal SHALL display Silent_Miss_Findings with amber text and an amber left-border accent, using the dashboard warning color (`#F59E0B`).
4. THE Upload_Modal SHALL display Cosmetic_Findings with muted text and a subtle left-border accent, using the dashboard muted text color (`#94A3B8`).
5. WHEN the Drift_Report contains one or more Breaking_Findings, THE Upload_Modal SHALL disable the "Continue to Preview" button and display a message indicating the upload is blocked until the parser configuration is updated.
6. WHEN the Drift_Report contains Silent_Miss_Findings but no Breaking_Findings, THE Upload_Modal SHALL enable the "Continue to Preview" button and display a warning message advising the user to review the findings.
7. WHEN the Drift_Report contains only Cosmetic_Findings, THE Upload_Modal SHALL enable the "Continue to Preview" button without a warning message.
8. WHEN the Drift_Report contains no findings, THE Upload_Modal SHALL skip the drift review phase and proceed directly to the diff preview.
### Requirement 8: Drift Review UI Layout and Interaction
**User Story:** As a compliance analyst, I want the drift findings to be clearly organised and scannable, so that I can quickly understand what changed in the xlsx structure.
#### Acceptance Criteria
1. THE Upload_Modal SHALL group drift findings by severity, displaying Breaking_Findings first, then Silent_Miss_Findings, then Cosmetic_Findings.
2. THE Upload_Modal SHALL display a count badge next to each severity group header showing the number of findings in that group.
3. WHEN a severity group contains more than five findings, THE Upload_Modal SHALL collapse the group to show the first five findings with an expandable "Show N more" toggle.
4. EACH Drift_Finding displayed in the Upload_Modal SHALL include the finding message and the specific value (column name, sheet name, or metric ID) that triggered the finding.
5. THE Upload_Modal SHALL display a "Cancel" button that returns the modal to the idle phase, and a "Continue to Preview" button (when enabled) that advances to the diff preview phase.
6. THE Upload_Modal drift review phase SHALL follow the dashboard's dark theme and monospace typography conventions defined in `DESIGN_SYSTEM.md`.
### Requirement 9: Existing Upload Flow Preservation
**User Story:** As a compliance analyst, I want the existing upload flow to remain intact, so that the drift check is an additive enhancement and does not disrupt the current preview-then-commit workflow.
#### Acceptance Criteria
1. WHEN the user clicks "Continue to Preview" from the drift review phase, THE Upload_Modal SHALL display the same diff preview (recurring, new, resolved counts) and "Confirm Upload" button as the current implementation.
2. THE Preview_Endpoint SHALL continue to return `diff`, `tempFile`, `filename`, `report_date`, and `total_items` fields in the response alongside the new `drift` field.
3. THE commit flow (`POST /api/compliance/commit`) SHALL remain unchanged and SHALL NOT perform any drift checking.
4. WHEN the `drift` field in the preview response is `null` (drift check failed or was skipped), THE Upload_Modal SHALL proceed directly to the diff preview phase as if no drift was detected.

View File

@@ -0,0 +1,154 @@
# Implementation Plan: Compliance Schema Drift Check
## Overview
This plan implements schema drift detection in the compliance upload flow. The work proceeds in layers: first extract the shared config file, then build the Python schema extractor, then the Node.js drift checker, then wire it into the preview endpoint, and finally update the upload modal with the drift-review phase. Property-based tests validate the drift checker's correctness properties using fast-check.
## Tasks
- [x] 1. Create shared parser configuration file and update Python parser
- [x] 1.1 Create `backend/scripts/compliance_config.json` with `metric_categories`, `core_cols`, and `skip_sheets`
- Extract the exact values from the inline dicts `METRIC_CATEGORIES`, `CORE_COLS`, and `SKIP_SHEETS` in `parse_compliance_xlsx.py`
- `metric_categories` is an object mapping metric ID strings to category strings
- `core_cols` is an array of column name strings
- `skip_sheets` is an array of sheet name strings
- _Requirements: 1.1, 1.2_
- [x] 1.2 Modify `backend/scripts/parse_compliance_xlsx.py` to read config from JSON file
- Remove the inline `METRIC_CATEGORIES`, `CORE_COLS`, and `SKIP_SHEETS` definitions
- Load them from `compliance_config.json` (resolved relative to the script's directory)
- If the config file is missing or contains invalid JSON, print a descriptive error to stderr and exit with non-zero code
- Ensure `CORE_COLS` is converted to a set after loading from the JSON array
- _Requirements: 1.3, 1.4_
- [ ]* 1.3 Write unit tests for Python parser config loading
- Test that parser loads config correctly and produces same output as before
- Test that missing config file causes non-zero exit with descriptive stderr
- Test that invalid JSON in config file causes non-zero exit with descriptive stderr
- _Requirements: 1.3, 1.4_
- [x] 2. Create Python schema extractor script
- [x] 2.1 Create `backend/scripts/extract_xlsx_schema.py`
- Accept file path as CLI argument
- Use openpyxl in read-only mode to extract: sheet names, first-row column headers per sheet, and unique metric values from the Summary sheet (header at row 4, data from row 5 onward)
- Output JSON to stdout with shape `{ "sheets": [{ "name", "columns", "metric_values?" }] }`
- On error, return `{ "error": "..." }` on stdout and exit with non-zero code
- Reuse the approach from `dump_xlsx_schema.py` for Summary sheet metric extraction
- _Requirements: 2.1, 2.2, 2.3, 2.4_
- [ ]* 2.2 Write unit tests for schema extractor
- Test that valid xlsx produces correct schema JSON
- Test that missing file returns error JSON and non-zero exit
- Test that file with no sheets returns error JSON
- _Requirements: 2.1, 2.4_
- [x] 3. Implement Node.js drift checker module
- [x] 3.1 Create `backend/helpers/driftChecker.js` with `compareSchemaToDrift(schema, config)` function
- Implement breaking rules: missing core column in detail sheets, missing detail sheet (in `metric_categories` but not `skip_sheets` and absent from xlsx)
- Implement silent-miss rules: unknown metric value in Summary not in `metric_categories`, unknown sheet not in `skip_sheets` and not in `metric_categories`
- Implement cosmetic rules: new column in detail sheet not in `core_cols`, stale metric in `metric_categories` not in Summary metric values
- Each finding has shape `{ severity, message, value, sheet }` (sheet is null when not applicable)
- Return `{ breaking: [], silent_miss: [], cosmetic: [] }`
- Export `compareSchemaToDrift` and a `loadConfig(configPath)` function that reads and validates `compliance_config.json`
- Config loader validates: file exists, parses as JSON, contains `metric_categories` (object), `core_cols` (array), `skip_sheets` (array)
- _Requirements: 3.1, 3.2, 3.3, 4.1, 4.2, 4.3, 5.1, 5.2, 5.3, 1.5, 1.6_
- [ ] 3.2 Write property test: Breaking drift completeness (Property 1)
- **Property 1: Breaking drift completeness**
- For any generated schema and config, the set of breaking findings equals exactly the union of missing-core-column findings and missing-detail-sheet findings — no more, no fewer
- Use fast-check with arbitrary generators for schema and config objects
- Minimum 100 iterations
- **Validates: Requirements 3.1, 3.2, 3.3**
- [ ]* 3.3 Write property test: Silent-miss drift completeness (Property 2)
- **Property 2: Silent-miss drift completeness**
- For any generated schema and config, the set of silent-miss findings equals exactly the union of unknown-metric findings and unknown-sheet findings
- Use fast-check with arbitrary generators for schema and config objects
- Minimum 100 iterations
- **Validates: Requirements 4.1, 4.2, 4.3**
- [ ]* 3.4 Write property test: Cosmetic drift completeness (Property 3)
- **Property 3: Cosmetic drift completeness**
- For any generated schema and config, the set of cosmetic findings equals exactly the union of new-column findings and stale-metric findings
- Use fast-check with arbitrary generators for schema and config objects
- Minimum 100 iterations
- **Validates: Requirements 5.1, 5.2, 5.3**
- [ ]* 3.5 Write property test: Drift severity ordering (Property 4)
- **Property 4: Drift severity ordering**
- For any drift report, the grouped output always returns all breaking findings first, then all silent-miss, then all cosmetic
- Use fast-check to generate mixed drift reports and verify ordering
- Minimum 100 iterations
- **Validates: Requirements 8.1**
- [ ]* 3.6 Write unit tests for drift checker and config loader
- Test each drift rule individually with hand-crafted schema/config pairs
- Test config loader with valid file, missing file, invalid JSON, and missing required keys
- Test that perfectly aligned schema and config produce zero findings
- Test edge cases: empty metric_categories, empty core_cols, empty skip_sheets
- _Requirements: 3.1, 3.2, 3.3, 4.1, 4.2, 4.3, 5.1, 5.2, 5.3, 1.5, 1.6_
- [x] 4. Checkpoint — Verify backend modules
- Ensure all tests pass, ask the user if questions arise.
- [x] 5. Integrate drift check into preview endpoint
- [x] 5.1 Modify `backend/routes/compliance.js` to add drift checking in `POST /preview`
- After receiving the uploaded file, spawn `extract_xlsx_schema.py` as a Python subprocess to get the xlsx schema
- Read `compliance_config.json` using the `loadConfig()` function from `driftChecker.js`
- Call `compareSchemaToDrift(schema, config)` to produce the drift report
- Proceed with the existing `parseXlsx()` call and `computeDiff()`
- Include `drift` (DriftReport object) and `drift_error` (string or null) in the response
- If schema extraction or drift check throws, set `drift: null` and `drift_error: <message>`, then continue with normal flow
- If config file is missing or invalid, return 500 with descriptive message
- Preserve all existing response fields: `diff`, `tempFile`, `filename`, `report_date`, `total_items`
- _Requirements: 6.1, 6.2, 6.3, 6.4, 9.2_
- [ ]* 5.2 Write integration tests for preview endpoint drift behavior
- Test that preview response includes `drift` field alongside existing `diff` data
- Test that breaking drift still returns 200 (not an error)
- Test graceful degradation when drift check fails (`drift: null`, `drift_error` present)
- Test 500 response when config file is missing
- Test that commit endpoint is unchanged and does not reference drift
- _Requirements: 6.1, 6.2, 6.3, 6.4, 9.3_
- [x] 6. Update upload modal with drift-review phase
- [x] 6.1 Modify `frontend/src/components/pages/ComplianceUploadModal.js` to add drift-review phase
- Add `drift-review` phase between `uploading` and `preview` in the phase flow
- After upload response, check if `drift` is non-null and has findings — if so, enter `drift-review`; otherwise skip to `preview`
- When `drift` is `null` (drift check failed), skip drift-review and go straight to preview
- Display findings grouped by severity: breaking first, then silent-miss, then cosmetic
- Each severity group has a header with label and count badge
- Groups with more than 5 findings collapse with a "Show N more" toggle
- Each finding shows the message and the triggering value
- Breaking findings: red text (`#EF4444`), red left-border accent
- Silent-miss findings: amber text (`#F59E0B`), amber left-border accent
- Cosmetic findings: muted text (`#94A3B8`), subtle left-border accent
- "Cancel" button returns to idle phase; "Continue to Preview" button advances to diff preview
- "Continue to Preview" disabled when breaking findings exist, with a message explaining the block
- When no breaking findings but silent-miss exist, show warning message and enable "Continue to Preview"
- When only cosmetic findings, enable "Continue to Preview" without warning
- Follow dashboard dark theme and monospace typography from `DESIGN_SYSTEM.md`
- Preserve existing diff preview, commit flow, done, and error phases unchanged
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 9.1, 9.4_
- [ ]* 6.2 Write unit tests for upload modal drift-review phase
- Test drift-review phase renders when findings exist
- Test "Continue to Preview" button disabled when breaking findings present
- Test "Continue to Preview" button enabled when no breaking findings
- Test groups collapse at 5+ findings with correct "Show N more" count
- Test cancel returns to idle phase
- Test skips drift-review when drift is null or has no findings
- _Requirements: 7.1, 7.5, 7.6, 7.7, 7.8, 8.3_
- [x] 7. Final checkpoint — Ensure all tests pass
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation
- Property tests (3.23.5) validate the four correctness properties from the design using fast-check
- Unit tests validate specific examples and edge cases
- The Python parser modification (1.2) must produce identical output to the current inline-dict version — this is a refactor, not a behavior change
- The commit endpoint (`POST /api/compliance/commit`) is intentionally unchanged