Add admin page overhaul and compliance schema drift check specs, compliance upload improvements, drift checker helper
This commit is contained in:
364
.kiro/specs/compliance-schema-drift-check/design.md
Normal file
364
.kiro/specs/compliance-schema-drift-check/design.md
Normal file
@@ -0,0 +1,364 @@
|
||||
# Design Document: Compliance Schema Drift Check
|
||||
|
||||
## Overview
|
||||
|
||||
This feature adds schema drift detection to the compliance xlsx upload flow. When a user uploads a weekly NTS_AEO report, the backend extracts the xlsx structural schema (sheet names, column headers, metric values) and compares it against a shared parser configuration file. The comparison produces a categorised drift report with three severity levels: breaking (blocks upload), silent-miss (warns but allows proceeding), and cosmetic (informational). The frontend displays these findings in a new drift review phase inside the upload modal, inserted between the upload spinner and the existing diff preview.
|
||||
|
||||
The parser configuration dicts (`METRIC_CATEGORIES`, `CORE_COLS`, `SKIP_SHEETS`) currently defined inline in `parse_compliance_xlsx.py` are extracted into a shared JSON file (`backend/scripts/compliance_config.json`) that both the Python parser and the Node.js drift checker read. This establishes a single source of truth for parser configuration.
|
||||
|
||||
### Design Decisions
|
||||
|
||||
1. **Shared JSON config over database storage**: The parser config is a developer-maintained mapping, not user data. A JSON file is version-controllable, diffable, and readable by both Python and Node.js without additional dependencies.
|
||||
|
||||
2. **Python subprocess for schema extraction**: The existing `dump_xlsx_schema.py` already uses openpyxl to extract xlsx structure. We adapt this into a new `extract_xlsx_schema.py` script that the Node.js backend invokes as a subprocess, consistent with how `parse_compliance_xlsx.py` is already called.
|
||||
|
||||
3. **Node.js drift comparison logic**: The drift comparison is pure object comparison (sets of strings) with no xlsx parsing. Implementing it in Node.js avoids a second Python subprocess call and keeps the logic co-located with the route handler.
|
||||
|
||||
4. **Graceful degradation**: If the drift check fails, the upload flow proceeds normally with `drift: null` and a `drift_error` message. The drift check is additive and must never block the existing workflow.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Modal as ComplianceUploadModal
|
||||
participant API as POST /api/compliance/preview
|
||||
participant Schema as extract_xlsx_schema.py
|
||||
participant Drift as driftChecker (Node.js)
|
||||
participant Config as compliance_config.json
|
||||
participant Parser as parse_compliance_xlsx.py
|
||||
|
||||
User->>Modal: Drops xlsx file
|
||||
Modal->>API: POST /preview (multipart)
|
||||
API->>Schema: spawn python3 extract_xlsx_schema.py <file>
|
||||
Schema-->>API: JSON { sheets: [...] }
|
||||
API->>Config: fs.readFileSync(compliance_config.json)
|
||||
API->>Drift: compareSchemaToDrift(schema, config)
|
||||
Drift-->>API: { breaking: [...], silent_miss: [...], cosmetic: [...] }
|
||||
API->>Parser: spawn python3 parse_compliance_xlsx.py <file>
|
||||
Parser->>Config: reads compliance_config.json
|
||||
Parser-->>API: JSON { items, summary, ... }
|
||||
API->>API: computeDiff(db, items)
|
||||
API-->>Modal: { drift, diff, tempFile, ... }
|
||||
alt drift has findings
|
||||
Modal->>User: Show drift review phase
|
||||
alt breaking findings exist
|
||||
Modal->>User: Block "Continue to Preview"
|
||||
else no breaking findings
|
||||
User->>Modal: Click "Continue to Preview"
|
||||
Modal->>User: Show diff preview
|
||||
end
|
||||
else no drift findings
|
||||
Modal->>User: Show diff preview directly
|
||||
end
|
||||
```
|
||||
|
||||
### File Layout
|
||||
|
||||
```
|
||||
backend/
|
||||
scripts/
|
||||
compliance_config.json # NEW — shared parser config (single source of truth)
|
||||
extract_xlsx_schema.py # NEW — extracts xlsx structure as JSON
|
||||
parse_compliance_xlsx.py # MODIFIED — reads config from JSON file
|
||||
dump_xlsx_schema.py # UNCHANGED — standalone diagnostic tool
|
||||
routes/
|
||||
compliance.js # MODIFIED — drift check in /preview, new driftChecker module
|
||||
helpers/
|
||||
driftChecker.js # NEW — compareSchemaToDrift() function
|
||||
|
||||
frontend/
|
||||
src/components/pages/
|
||||
ComplianceUploadModal.js # MODIFIED — new drift-review phase
|
||||
```
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### 1. Shared Parser Configuration (`compliance_config.json`)
|
||||
|
||||
```json
|
||||
{
|
||||
"metric_categories": {
|
||||
"2.3.4i": "Vulnerability Management",
|
||||
"2.3.6i": "Vulnerability Management",
|
||||
"5.2.4": "Access & MFA"
|
||||
},
|
||||
"core_cols": [
|
||||
"Preferred - Hostname",
|
||||
"GRANITE - IPv4_Address",
|
||||
"GRANITE - Type",
|
||||
"Team",
|
||||
"Compliant",
|
||||
"Source_Network",
|
||||
"Vertical",
|
||||
"GRANITE - Equip_Inst_ID",
|
||||
"GRANITE - RESPONSIBLE_TEAM"
|
||||
],
|
||||
"skip_sheets": ["Summary", "CMDB_9box", "Vulns", "Aging Dashboard"]
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Schema Extractor (`extract_xlsx_schema.py`)
|
||||
|
||||
**Input**: File path as CLI argument.
|
||||
|
||||
**Output** (stdout JSON):
|
||||
```json
|
||||
{
|
||||
"sheets": [
|
||||
{
|
||||
"name": "Summary",
|
||||
"columns": ["Metric", "Non-Compliant", "..."],
|
||||
"metric_values": ["2.3.4i", "5.2.4", "..."]
|
||||
},
|
||||
{
|
||||
"name": "2.3.4i",
|
||||
"columns": ["Preferred - Hostname", "GRANITE - IPv4_Address", "..."]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
- Uses openpyxl in read-only mode.
|
||||
- Extracts sheet names, first-row column headers per sheet, and unique metric values from the Summary sheet (header at row 4, data from row 5 onward).
|
||||
- On error, returns `{ "error": "..." }` on stdout and exits with non-zero code.
|
||||
|
||||
### 3. Drift Checker (`backend/helpers/driftChecker.js`)
|
||||
|
||||
**Function**: `compareSchemaToDrift(schema, config) => DriftReport`
|
||||
|
||||
**Parameters**:
|
||||
- `schema` — object returned by `extract_xlsx_schema.py`
|
||||
- `config` — object parsed from `compliance_config.json`
|
||||
|
||||
**Returns** (`DriftReport`):
|
||||
```javascript
|
||||
{
|
||||
breaking: [
|
||||
{ severity: 'breaking', message: 'Detail sheet "2.3.4i" is missing core column "Team"', value: 'Team', sheet: '2.3.4i' }
|
||||
],
|
||||
silent_miss: [
|
||||
{ severity: 'silent_miss', message: 'Unknown metric "9.1.2" in Summary — not in metric_categories', value: '9.1.2' }
|
||||
],
|
||||
cosmetic: [
|
||||
{ severity: 'cosmetic', message: 'New column "Extra_Field" in sheet "2.3.4i" — will be captured in extra_json', value: 'Extra_Field', sheet: '2.3.4i' }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Drift rules**:
|
||||
|
||||
| Rule | Severity | Condition |
|
||||
|---|---|---|
|
||||
| Missing core column | `breaking` | A detail sheet (not in `skip_sheets`, present in xlsx) is missing a column from `core_cols` |
|
||||
| Missing detail sheet | `breaking` | A sheet name in `metric_categories` (and not in `skip_sheets`) is absent from the xlsx |
|
||||
| Unknown metric value | `silent_miss` | A metric value in the Summary sheet is not a key in `metric_categories` |
|
||||
| Unknown sheet | `silent_miss` | An xlsx sheet is not in `skip_sheets` and not in `metric_categories` |
|
||||
| New column in detail sheet | `cosmetic` | A detail sheet has columns not in `core_cols` |
|
||||
| Stale metric category | `cosmetic` | A key in `metric_categories` does not appear in the Summary sheet's metric values |
|
||||
|
||||
### 4. Preview Endpoint Changes (`POST /api/compliance/preview`)
|
||||
|
||||
The existing `/preview` handler is modified to:
|
||||
|
||||
1. After receiving the uploaded file, spawn `extract_xlsx_schema.py` to get the xlsx schema.
|
||||
2. Read `compliance_config.json` from disk.
|
||||
3. Call `compareSchemaToDrift(schema, config)` to produce the drift report.
|
||||
4. Proceed with the existing `parseXlsx()` call and `computeDiff()`.
|
||||
5. Include `drift` (the DriftReport object) and optionally `drift_error` (string) in the response.
|
||||
|
||||
If the schema extraction or drift check throws, set `drift: null` and `drift_error: <message>`, then continue with the normal flow.
|
||||
|
||||
**Updated response shape**:
|
||||
```json
|
||||
{
|
||||
"drift": {
|
||||
"breaking": [],
|
||||
"silent_miss": [],
|
||||
"cosmetic": []
|
||||
},
|
||||
"drift_error": null,
|
||||
"diff": { "new_count": 5, "recurring_count": 120, "resolved_count": 3 },
|
||||
"tempFile": "/path/to/temp.json",
|
||||
"filename": "NTS_AEO_2026_03_25.xlsx",
|
||||
"report_date": "2026-03-25",
|
||||
"total_items": 125
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Upload Modal Changes (`ComplianceUploadModal.js`)
|
||||
|
||||
**New phase**: `drift-review` inserted between `uploading` and `preview`.
|
||||
|
||||
**Phase flow**:
|
||||
```
|
||||
idle → uploading → drift-review (if findings) → preview → committing → done
|
||||
→ preview (if no findings)
|
||||
```
|
||||
|
||||
**Drift review UI**:
|
||||
- Findings grouped by severity: breaking first, then silent-miss, then cosmetic.
|
||||
- Each group has a header with severity label and count badge.
|
||||
- Groups with more than 5 findings collapse with a "Show N more" toggle.
|
||||
- Each finding shows the message text and the triggering value.
|
||||
- Breaking findings: red text (`#EF4444`), red left-border accent.
|
||||
- Silent-miss findings: amber text (`#F59E0B`), amber left-border accent.
|
||||
- Cosmetic findings: muted text (`#94A3B8`), subtle left-border accent.
|
||||
- "Cancel" button returns to idle. "Continue to Preview" button advances to diff preview.
|
||||
- "Continue to Preview" is disabled when breaking findings exist, with a message explaining the block.
|
||||
- When `drift` is `null` (drift check failed), skip drift-review and go straight to preview.
|
||||
|
||||
## Data Models
|
||||
|
||||
### DriftFinding
|
||||
|
||||
```javascript
|
||||
{
|
||||
severity: 'breaking' | 'silent_miss' | 'cosmetic',
|
||||
message: string, // Human-readable description
|
||||
value: string, // The specific column/sheet/metric that triggered the finding
|
||||
sheet: string|null // Sheet name context (when applicable)
|
||||
}
|
||||
```
|
||||
|
||||
### DriftReport
|
||||
|
||||
```javascript
|
||||
{
|
||||
breaking: DriftFinding[],
|
||||
silent_miss: DriftFinding[],
|
||||
cosmetic: DriftFinding[]
|
||||
}
|
||||
```
|
||||
|
||||
### ParserConfig
|
||||
|
||||
```javascript
|
||||
{
|
||||
metric_categories: { [metricId: string]: string }, // metric ID → category name
|
||||
core_cols: string[], // column names for main item fields
|
||||
skip_sheets: string[] // sheet names excluded from parsing
|
||||
}
|
||||
```
|
||||
|
||||
### XlsxSchema (output of extract_xlsx_schema.py)
|
||||
|
||||
```javascript
|
||||
{
|
||||
sheets: [
|
||||
{
|
||||
name: string,
|
||||
columns: string[],
|
||||
metric_values?: string[] // only present on Summary sheet
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Correctness Properties
|
||||
|
||||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||||
|
||||
### Property 1: Breaking drift completeness
|
||||
|
||||
*For any* xlsx schema and parser config, the drift checker SHALL produce a breaking finding for every core column missing from every detail sheet, and for every detail sheet (present in `metric_categories` but not in `skip_sheets`) absent from the xlsx — and no other breaking findings. The set of breaking findings is exactly the union of missing-core-column findings and missing-detail-sheet findings.
|
||||
|
||||
**Validates: Requirements 3.1, 3.2, 3.3**
|
||||
|
||||
### Property 2: Silent-miss drift completeness
|
||||
|
||||
*For any* xlsx schema and parser config, the drift checker SHALL produce a silent-miss finding for every metric value in the Summary sheet not present in `metric_categories`, and for every xlsx sheet not in `skip_sheets` and not in `metric_categories` — and no other silent-miss findings. The set of silent-miss findings is exactly the union of unknown-metric findings and unknown-sheet findings.
|
||||
|
||||
**Validates: Requirements 4.1, 4.2, 4.3**
|
||||
|
||||
### Property 3: Cosmetic drift completeness
|
||||
|
||||
*For any* xlsx schema and parser config, the drift checker SHALL produce a cosmetic finding for every column in a detail sheet not present in `core_cols`, and for every key in `metric_categories` not present in the Summary sheet's metric values — and no other cosmetic findings. The set of cosmetic findings is exactly the union of new-column findings and stale-metric findings.
|
||||
|
||||
**Validates: Requirements 5.1, 5.2, 5.3**
|
||||
|
||||
### Property 4: Drift severity ordering
|
||||
|
||||
*For any* drift report containing a mix of breaking, silent-miss, and cosmetic findings, the grouping function SHALL always return findings ordered by severity: all breaking findings first, then all silent-miss findings, then all cosmetic findings.
|
||||
|
||||
**Validates: Requirements 8.1**
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Python Script Failures
|
||||
|
||||
| Failure | Handling |
|
||||
|---|---|
|
||||
| `extract_xlsx_schema.py` exits non-zero | Preview endpoint sets `drift: null`, `drift_error: <stderr message>`, continues with normal parse flow |
|
||||
| `extract_xlsx_schema.py` returns invalid JSON | Same as above — caught in JSON.parse, treated as drift check failure |
|
||||
| `compliance_config.json` missing or invalid (Node.js read) | Preview endpoint returns 500 with message "Configuration file could not be loaded" |
|
||||
| `compliance_config.json` missing or invalid (Python parser read) | Parser exits non-zero, stderr describes the error, preview endpoint returns 500 with parse error |
|
||||
| xlsx file cannot be opened by schema extractor | Schema extractor returns `{ "error": "..." }` on stdout, exits non-zero; drift check skipped gracefully |
|
||||
|
||||
### Frontend Error States
|
||||
|
||||
| Condition | Behavior |
|
||||
|---|---|
|
||||
| `drift` is `null` in preview response | Skip drift-review phase, proceed directly to diff preview |
|
||||
| `drift_error` is present | Optionally display a subtle warning in the diff preview that drift check was skipped |
|
||||
| Network error during upload | Existing error phase handling (unchanged) |
|
||||
|
||||
### Config File Validation
|
||||
|
||||
The Node.js config loader validates that:
|
||||
- The file exists and is readable.
|
||||
- The content parses as valid JSON.
|
||||
- The parsed object contains `metric_categories` (object), `core_cols` (array), and `skip_sheets` (array).
|
||||
|
||||
If any check fails, the loader throws with a descriptive message. The preview handler catches this and returns a 500 response.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
|
||||
**Drift checker (`driftChecker.js`)**:
|
||||
- Breaking: missing core column produces finding with correct severity, message, value, and sheet.
|
||||
- Breaking: missing detail sheet produces finding.
|
||||
- Silent-miss: unknown metric value produces finding.
|
||||
- Silent-miss: unknown sheet produces finding.
|
||||
- Cosmetic: new column in detail sheet produces finding.
|
||||
- Cosmetic: stale metric category produces finding.
|
||||
- Empty schema (no sheets) produces appropriate findings.
|
||||
- Config with empty metric_categories, core_cols, or skip_sheets.
|
||||
- Schema and config that are perfectly aligned produce zero findings.
|
||||
|
||||
**Config loader**:
|
||||
- Valid config file loads correctly.
|
||||
- Missing file throws descriptive error.
|
||||
- Invalid JSON throws descriptive error.
|
||||
- Config missing required keys throws descriptive error.
|
||||
|
||||
**Frontend drift review component**:
|
||||
- Drift review phase renders when findings exist.
|
||||
- "Continue to Preview" button disabled when breaking findings present.
|
||||
- "Continue to Preview" button enabled when no breaking findings.
|
||||
- Groups collapse at 5+ findings with correct "Show N more" count.
|
||||
- Cancel returns to idle phase.
|
||||
- Skips drift review when drift is null or has no findings.
|
||||
|
||||
### Property-Based Tests
|
||||
|
||||
Property-based tests use `fast-check` (JavaScript) to verify the four correctness properties defined above. Each test generates random schema and config objects and verifies the drift checker output against the expected set-theoretic result.
|
||||
|
||||
**Configuration**:
|
||||
- Minimum 100 iterations per property test.
|
||||
- Each test tagged with: **Feature: compliance-schema-drift-check, Property {N}: {title}**
|
||||
|
||||
**Generators**:
|
||||
- `arbitraryParserConfig`: generates random `metric_categories` (object with 0–20 string keys mapped to category strings), `core_cols` (array of 0–15 unique column name strings), `skip_sheets` (array of 0–5 unique sheet name strings).
|
||||
- `arbitraryXlsxSchema`: generates random sheets array, each with a name, columns array, and optionally metric_values (for the Summary sheet). Sheet names, column names, and metric values drawn from a shared pool to ensure meaningful overlap with the config.
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- Preview endpoint returns drift report alongside existing diff data.
|
||||
- Preview endpoint returns 200 with breaking drift (does not error).
|
||||
- Preview endpoint gracefully degrades when drift check fails (`drift: null`, `drift_error` present).
|
||||
- Preview endpoint returns 500 when config file is missing.
|
||||
- Python parser reads from `compliance_config.json` and produces same output as before.
|
||||
- Commit endpoint is unchanged and does not reference drift.
|
||||
Reference in New Issue
Block a user