Add admin page overhaul and compliance schema drift check specs, compliance upload improvements, drift checker helper

2026-04-20 20:12:12 +00:00
parent 6082721452
commit 043c85cc69
20 changed files with 56814 additions and 59 deletions
--- a/.kiro/specs/compliance-schema-drift-check/design.md
+++ b/.kiro/specs/compliance-schema-drift-check/design.md
@@ -0,0 +1,364 @@
+# Design Document: Compliance Schema Drift Check
+
+## Overview
+
+This feature adds schema drift detection to the compliance xlsx upload flow. When a user uploads a weekly NTS_AEO report, the backend extracts the xlsx structural schema (sheet names, column headers, metric values) and compares it against a shared parser configuration file. The comparison produces a categorised drift report with three severity levels: breaking (blocks upload), silent-miss (warns but allows proceeding), and cosmetic (informational). The frontend displays these findings in a new drift review phase inside the upload modal, inserted between the upload spinner and the existing diff preview.
+
+The parser configuration dicts (`METRIC_CATEGORIES`, `CORE_COLS`, `SKIP_SHEETS`) currently defined inline in `parse_compliance_xlsx.py` are extracted into a shared JSON file (`backend/scripts/compliance_config.json`) that both the Python parser and the Node.js drift checker read. This establishes a single source of truth for parser configuration.
+
+### Design Decisions
+
+1. **Shared JSON config over database storage**: The parser config is a developer-maintained mapping, not user data. A JSON file is version-controllable, diffable, and readable by both Python and Node.js without additional dependencies.
+
+2. **Python subprocess for schema extraction**: The existing `dump_xlsx_schema.py` already uses openpyxl to extract xlsx structure. We adapt this into a new `extract_xlsx_schema.py` script that the Node.js backend invokes as a subprocess, consistent with how `parse_compliance_xlsx.py` is already called.
+
+3. **Node.js drift comparison logic**: The drift comparison is pure object comparison (sets of strings) with no xlsx parsing. Implementing it in Node.js avoids a second Python subprocess call and keeps the logic co-located with the route handler.
+
+4. **Graceful degradation**: If the drift check fails, the upload flow proceeds normally with `drift: null` and a `drift_error` message. The drift check is additive and must never block the existing workflow.
+
+## Architecture
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Modal as ComplianceUploadModal
+    participant API as POST /api/compliance/preview
+    participant Schema as extract_xlsx_schema.py
+    participant Drift as driftChecker (Node.js)
+    participant Config as compliance_config.json
+    participant Parser as parse_compliance_xlsx.py
+
+    User->>Modal: Drops xlsx file
+    Modal->>API: POST /preview (multipart)
+    API->>Schema: spawn python3 extract_xlsx_schema.py <file>
+    Schema-->>API: JSON { sheets: [...] }
+    API->>Config: fs.readFileSync(compliance_config.json)
+    API->>Drift: compareSchemaToDrift(schema, config)
+    Drift-->>API: { breaking: [...], silent_miss: [...], cosmetic: [...] }
+    API->>Parser: spawn python3 parse_compliance_xlsx.py <file>
+    Parser->>Config: reads compliance_config.json
+    Parser-->>API: JSON { items, summary, ... }
+    API->>API: computeDiff(db, items)
+    API-->>Modal: { drift, diff, tempFile, ... }
+    alt drift has findings
+        Modal->>User: Show drift review phase
+        alt breaking findings exist
+            Modal->>User: Block "Continue to Preview"
+        else no breaking findings
+            User->>Modal: Click "Continue to Preview"
+            Modal->>User: Show diff preview
+        end
+    else no drift findings
+        Modal->>User: Show diff preview directly
+    end
+```
+
+### File Layout
+
+```
+backend/
+  scripts/
+    compliance_config.json          # NEW — shared parser config (single source of truth)
+    extract_xlsx_schema.py          # NEW — extracts xlsx structure as JSON
+    parse_compliance_xlsx.py        # MODIFIED — reads config from JSON file
+    dump_xlsx_schema.py             # UNCHANGED — standalone diagnostic tool
+  routes/
+    compliance.js                   # MODIFIED — drift check in /preview, new driftChecker module
+  helpers/
+    driftChecker.js                 # NEW — compareSchemaToDrift() function
+
+frontend/
+  src/components/pages/
+    ComplianceUploadModal.js        # MODIFIED — new drift-review phase
+```
+
+## Components and Interfaces
+
+### 1. Shared Parser Configuration (`compliance_config.json`)
+
+```json
+{
+  "metric_categories": {
+    "2.3.4i": "Vulnerability Management",
+    "2.3.6i": "Vulnerability Management",
+    "5.2.4": "Access & MFA"
+  },
+  "core_cols": [
+    "Preferred - Hostname",
+    "GRANITE - IPv4_Address",
+    "GRANITE - Type",
+    "Team",
+    "Compliant",
+    "Source_Network",
+    "Vertical",
+    "GRANITE - Equip_Inst_ID",
+    "GRANITE - RESPONSIBLE_TEAM"
+  ],
+  "skip_sheets": ["Summary", "CMDB_9box", "Vulns", "Aging Dashboard"]
+}
+```
+
+### 2. Schema Extractor (`extract_xlsx_schema.py`)
+
+**Input**: File path as CLI argument.
+
+**Output** (stdout JSON):
+```json
+{
+  "sheets": [
+    {
+      "name": "Summary",
+      "columns": ["Metric", "Non-Compliant", "..."],
+      "metric_values": ["2.3.4i", "5.2.4", "..."]
+    },
+    {
+      "name": "2.3.4i",
+      "columns": ["Preferred - Hostname", "GRANITE - IPv4_Address", "..."]
+    }
+  ]
+}
+```
+
+- Uses openpyxl in read-only mode.
+- Extracts sheet names, first-row column headers per sheet, and unique metric values from the Summary sheet (header at row 4, data from row 5 onward).
+- On error, returns `{ "error": "..." }` on stdout and exits with non-zero code.
+
+### 3. Drift Checker (`backend/helpers/driftChecker.js`)
+
+**Function**: `compareSchemaToDrift(schema, config) => DriftReport`
+
+**Parameters**:
+- `schema` — object returned by `extract_xlsx_schema.py`
+- `config` — object parsed from `compliance_config.json`
+
+**Returns** (`DriftReport`):
+```javascript
+{
+  breaking: [
+    { severity: 'breaking', message: 'Detail sheet "2.3.4i" is missing core column "Team"', value: 'Team', sheet: '2.3.4i' }
+  ],
+  silent_miss: [
+    { severity: 'silent_miss', message: 'Unknown metric "9.1.2" in Summary — not in metric_categories', value: '9.1.2' }
+  ],
+  cosmetic: [
+    { severity: 'cosmetic', message: 'New column "Extra_Field" in sheet "2.3.4i" — will be captured in extra_json', value: 'Extra_Field', sheet: '2.3.4i' }
+  ]
+}
+```
+
+**Drift rules**:
+
+| Rule | Severity | Condition |
+|---|---|---|
+| Missing core column | `breaking` | A detail sheet (not in `skip_sheets`, present in xlsx) is missing a column from `core_cols` |
+| Missing detail sheet | `breaking` | A sheet name in `metric_categories` (and not in `skip_sheets`) is absent from the xlsx |
+| Unknown metric value | `silent_miss` | A metric value in the Summary sheet is not a key in `metric_categories` |
+| Unknown sheet | `silent_miss` | An xlsx sheet is not in `skip_sheets` and not in `metric_categories` |
+| New column in detail sheet | `cosmetic` | A detail sheet has columns not in `core_cols` |
+| Stale metric category | `cosmetic` | A key in `metric_categories` does not appear in the Summary sheet's metric values |
+
+### 4. Preview Endpoint Changes (`POST /api/compliance/preview`)
+
+The existing `/preview` handler is modified to:
+
+1. After receiving the uploaded file, spawn `extract_xlsx_schema.py` to get the xlsx schema.
+2. Read `compliance_config.json` from disk.
+3. Call `compareSchemaToDrift(schema, config)` to produce the drift report.
+4. Proceed with the existing `parseXlsx()` call and `computeDiff()`.
+5. Include `drift` (the DriftReport object) and optionally `drift_error` (string) in the response.
+
+If the schema extraction or drift check throws, set `drift: null` and `drift_error: <message>`, then continue with the normal flow.
+
+**Updated response shape**:
+```json
+{
+  "drift": {
+    "breaking": [],
+    "silent_miss": [],
+    "cosmetic": []
+  },
+  "drift_error": null,
+  "diff": { "new_count": 5, "recurring_count": 120, "resolved_count": 3 },
+  "tempFile": "/path/to/temp.json",
+  "filename": "NTS_AEO_2026_03_25.xlsx",
+  "report_date": "2026-03-25",
+  "total_items": 125
+}
+```
+
+### 5. Upload Modal Changes (`ComplianceUploadModal.js`)
+
+**New phase**: `drift-review` inserted between `uploading` and `preview`.
+
+**Phase flow**:
+```
+idle → uploading → drift-review (if findings) → preview → committing → done
+                 → preview (if no findings)
+```
+
+**Drift review UI**:
+- Findings grouped by severity: breaking first, then silent-miss, then cosmetic.
+- Each group has a header with severity label and count badge.
+- Groups with more than 5 findings collapse with a "Show N more" toggle.
+- Each finding shows the message text and the triggering value.
+- Breaking findings: red text (`#EF4444`), red left-border accent.
+- Silent-miss findings: amber text (`#F59E0B`), amber left-border accent.
+- Cosmetic findings: muted text (`#94A3B8`), subtle left-border accent.
+- "Cancel" button returns to idle. "Continue to Preview" button advances to diff preview.
+- "Continue to Preview" is disabled when breaking findings exist, with a message explaining the block.
+- When `drift` is `null` (drift check failed), skip drift-review and go straight to preview.
+
+## Data Models
+
+### DriftFinding
+
+```javascript
+{
+  severity: 'breaking' | 'silent_miss' | 'cosmetic',
+  message: string,    // Human-readable description
+  value: string,      // The specific column/sheet/metric that triggered the finding
+  sheet: string|null   // Sheet name context (when applicable)
+}
+```
+
+### DriftReport
+
+```javascript
+{
+  breaking: DriftFinding[],
+  silent_miss: DriftFinding[],
+  cosmetic: DriftFinding[]
+}
+```
+
+### ParserConfig
+
+```javascript
+{
+  metric_categories: { [metricId: string]: string },  // metric ID → category name
+  core_cols: string[],                                  // column names for main item fields
+  skip_sheets: string[]                                 // sheet names excluded from parsing
+}
+```
+
+### XlsxSchema (output of extract_xlsx_schema.py)
+
+```javascript
+{
+  sheets: [
+    {
+      name: string,
+      columns: string[],
+      metric_values?: string[]  // only present on Summary sheet
+    }
+  ]
+}
+```
+
+
+## Correctness Properties
+
+*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
+
+### Property 1: Breaking drift completeness
+
+*For any* xlsx schema and parser config, the drift checker SHALL produce a breaking finding for every core column missing from every detail sheet, and for every detail sheet (present in `metric_categories` but not in `skip_sheets`) absent from the xlsx — and no other breaking findings. The set of breaking findings is exactly the union of missing-core-column findings and missing-detail-sheet findings.
+
+**Validates: Requirements 3.1, 3.2, 3.3**
+
+### Property 2: Silent-miss drift completeness
+
+*For any* xlsx schema and parser config, the drift checker SHALL produce a silent-miss finding for every metric value in the Summary sheet not present in `metric_categories`, and for every xlsx sheet not in `skip_sheets` and not in `metric_categories` — and no other silent-miss findings. The set of silent-miss findings is exactly the union of unknown-metric findings and unknown-sheet findings.
+
+**Validates: Requirements 4.1, 4.2, 4.3**
+
+### Property 3: Cosmetic drift completeness
+
+*For any* xlsx schema and parser config, the drift checker SHALL produce a cosmetic finding for every column in a detail sheet not present in `core_cols`, and for every key in `metric_categories` not present in the Summary sheet's metric values — and no other cosmetic findings. The set of cosmetic findings is exactly the union of new-column findings and stale-metric findings.
+
+**Validates: Requirements 5.1, 5.2, 5.3**
+
+### Property 4: Drift severity ordering
+
+*For any* drift report containing a mix of breaking, silent-miss, and cosmetic findings, the grouping function SHALL always return findings ordered by severity: all breaking findings first, then all silent-miss findings, then all cosmetic findings.
+
+**Validates: Requirements 8.1**
+
+## Error Handling
+
+### Python Script Failures
+
+| Failure | Handling |
+|---|---|
+| `extract_xlsx_schema.py` exits non-zero | Preview endpoint sets `drift: null`, `drift_error: <stderr message>`, continues with normal parse flow |
+| `extract_xlsx_schema.py` returns invalid JSON | Same as above — caught in JSON.parse, treated as drift check failure |
+| `compliance_config.json` missing or invalid (Node.js read) | Preview endpoint returns 500 with message "Configuration file could not be loaded" |
+| `compliance_config.json` missing or invalid (Python parser read) | Parser exits non-zero, stderr describes the error, preview endpoint returns 500 with parse error |
+| xlsx file cannot be opened by schema extractor | Schema extractor returns `{ "error": "..." }` on stdout, exits non-zero; drift check skipped gracefully |
+
+### Frontend Error States
+
+| Condition | Behavior |
+|---|---|
+| `drift` is `null` in preview response | Skip drift-review phase, proceed directly to diff preview |
+| `drift_error` is present | Optionally display a subtle warning in the diff preview that drift check was skipped |
+| Network error during upload | Existing error phase handling (unchanged) |
+
+### Config File Validation
+
+The Node.js config loader validates that:
+- The file exists and is readable.
+- The content parses as valid JSON.
+- The parsed object contains `metric_categories` (object), `core_cols` (array), and `skip_sheets` (array).
+
+If any check fails, the loader throws with a descriptive message. The preview handler catches this and returns a 500 response.
+
+## Testing Strategy
+
+### Unit Tests
+
+**Drift checker (`driftChecker.js`)**:
+- Breaking: missing core column produces finding with correct severity, message, value, and sheet.
+- Breaking: missing detail sheet produces finding.
+- Silent-miss: unknown metric value produces finding.
+- Silent-miss: unknown sheet produces finding.
+- Cosmetic: new column in detail sheet produces finding.
+- Cosmetic: stale metric category produces finding.
+- Empty schema (no sheets) produces appropriate findings.
+- Config with empty metric_categories, core_cols, or skip_sheets.
+- Schema and config that are perfectly aligned produce zero findings.
+
+**Config loader**:
+- Valid config file loads correctly.
+- Missing file throws descriptive error.
+- Invalid JSON throws descriptive error.
+- Config missing required keys throws descriptive error.
+
+**Frontend drift review component**:
+- Drift review phase renders when findings exist.
+- "Continue to Preview" button disabled when breaking findings present.
+- "Continue to Preview" button enabled when no breaking findings.
+- Groups collapse at 5+ findings with correct "Show N more" count.
+- Cancel returns to idle phase.
+- Skips drift review when drift is null or has no findings.
+
+### Property-Based Tests
+
+Property-based tests use `fast-check` (JavaScript) to verify the four correctness properties defined above. Each test generates random schema and config objects and verifies the drift checker output against the expected set-theoretic result.
+
+**Configuration**:
+- Minimum 100 iterations per property test.
+- Each test tagged with: **Feature: compliance-schema-drift-check, Property {N}: {title}**
+
+**Generators**:
+- `arbitraryParserConfig`: generates random `metric_categories` (object with 0–20 string keys mapped to category strings), `core_cols` (array of 0–15 unique column name strings), `skip_sheets` (array of 0–5 unique sheet name strings).
+- `arbitraryXlsxSchema`: generates random sheets array, each with a name, columns array, and optionally metric_values (for the Summary sheet). Sheet names, column names, and metric values drawn from a shared pool to ensure meaningful overlap with the config.
+
+### Integration Tests
+
+- Preview endpoint returns drift report alongside existing diff data.
+- Preview endpoint returns 200 with breaking drift (does not error).
+- Preview endpoint gracefully degrades when drift check fails (`drift: null`, `drift_error` present).
+- Preview endpoint returns 500 when config file is missing.
+- Python parser reads from `compliance_config.json` and produces same output as before.
+- Commit endpoint is unchanged and does not reference drift.