16 KiB
Design Document: Compliance Schema Drift Check
Overview
This feature adds schema drift detection to the compliance xlsx upload flow. When a user uploads a weekly NTS_AEO report, the backend extracts the xlsx structural schema (sheet names, column headers, metric values) and compares it against a shared parser configuration file. The comparison produces a categorised drift report with three severity levels: breaking (blocks upload), silent-miss (warns but allows proceeding), and cosmetic (informational). The frontend displays these findings in a new drift review phase inside the upload modal, inserted between the upload spinner and the existing diff preview.
The parser configuration dicts (METRIC_CATEGORIES, CORE_COLS, SKIP_SHEETS) currently defined inline in parse_compliance_xlsx.py are extracted into a shared JSON file (backend/scripts/compliance_config.json) that both the Python parser and the Node.js drift checker read. This establishes a single source of truth for parser configuration.
Design Decisions
-
Shared JSON config over database storage: The parser config is a developer-maintained mapping, not user data. A JSON file is version-controllable, diffable, and readable by both Python and Node.js without additional dependencies.
-
Python subprocess for schema extraction: The existing
dump_xlsx_schema.pyalready uses openpyxl to extract xlsx structure. We adapt this into a newextract_xlsx_schema.pyscript that the Node.js backend invokes as a subprocess, consistent with howparse_compliance_xlsx.pyis already called. -
Node.js drift comparison logic: The drift comparison is pure object comparison (sets of strings) with no xlsx parsing. Implementing it in Node.js avoids a second Python subprocess call and keeps the logic co-located with the route handler.
-
Graceful degradation: If the drift check fails, the upload flow proceeds normally with
drift: nulland adrift_errormessage. The drift check is additive and must never block the existing workflow.
Architecture
sequenceDiagram
participant User
participant Modal as ComplianceUploadModal
participant API as POST /api/compliance/preview
participant Schema as extract_xlsx_schema.py
participant Drift as driftChecker (Node.js)
participant Config as compliance_config.json
participant Parser as parse_compliance_xlsx.py
User->>Modal: Drops xlsx file
Modal->>API: POST /preview (multipart)
API->>Schema: spawn python3 extract_xlsx_schema.py <file>
Schema-->>API: JSON { sheets: [...] }
API->>Config: fs.readFileSync(compliance_config.json)
API->>Drift: compareSchemaToDrift(schema, config)
Drift-->>API: { breaking: [...], silent_miss: [...], cosmetic: [...] }
API->>Parser: spawn python3 parse_compliance_xlsx.py <file>
Parser->>Config: reads compliance_config.json
Parser-->>API: JSON { items, summary, ... }
API->>API: computeDiff(db, items)
API-->>Modal: { drift, diff, tempFile, ... }
alt drift has findings
Modal->>User: Show drift review phase
alt breaking findings exist
Modal->>User: Block "Continue to Preview"
else no breaking findings
User->>Modal: Click "Continue to Preview"
Modal->>User: Show diff preview
end
else no drift findings
Modal->>User: Show diff preview directly
end
File Layout
backend/
scripts/
compliance_config.json # NEW — shared parser config (single source of truth)
extract_xlsx_schema.py # NEW — extracts xlsx structure as JSON
parse_compliance_xlsx.py # MODIFIED — reads config from JSON file
dump_xlsx_schema.py # UNCHANGED — standalone diagnostic tool
routes/
compliance.js # MODIFIED — drift check in /preview, new driftChecker module
helpers/
driftChecker.js # NEW — compareSchemaToDrift() function
frontend/
src/components/pages/
ComplianceUploadModal.js # MODIFIED — new drift-review phase
Components and Interfaces
1. Shared Parser Configuration (compliance_config.json)
{
"metric_categories": {
"2.3.4i": "Vulnerability Management",
"2.3.6i": "Vulnerability Management",
"5.2.4": "Access & MFA"
},
"core_cols": [
"Preferred - Hostname",
"GRANITE - IPv4_Address",
"GRANITE - Type",
"Team",
"Compliant",
"Source_Network",
"Vertical",
"GRANITE - Equip_Inst_ID",
"GRANITE - RESPONSIBLE_TEAM"
],
"skip_sheets": ["Summary", "CMDB_9box", "Vulns", "Aging Dashboard"]
}
2. Schema Extractor (extract_xlsx_schema.py)
Input: File path as CLI argument.
Output (stdout JSON):
{
"sheets": [
{
"name": "Summary",
"columns": ["Metric", "Non-Compliant", "..."],
"metric_values": ["2.3.4i", "5.2.4", "..."]
},
{
"name": "2.3.4i",
"columns": ["Preferred - Hostname", "GRANITE - IPv4_Address", "..."]
}
]
}
- Uses openpyxl in read-only mode.
- Extracts sheet names, first-row column headers per sheet, and unique metric values from the Summary sheet (header at row 4, data from row 5 onward).
- On error, returns
{ "error": "..." }on stdout and exits with non-zero code.
3. Drift Checker (backend/helpers/driftChecker.js)
Function: compareSchemaToDrift(schema, config) => DriftReport
Parameters:
schema— object returned byextract_xlsx_schema.pyconfig— object parsed fromcompliance_config.json
Returns (DriftReport):
{
breaking: [
{ severity: 'breaking', message: 'Detail sheet "2.3.4i" is missing core column "Team"', value: 'Team', sheet: '2.3.4i' }
],
silent_miss: [
{ severity: 'silent_miss', message: 'Unknown metric "9.1.2" in Summary — not in metric_categories', value: '9.1.2' }
],
cosmetic: [
{ severity: 'cosmetic', message: 'New column "Extra_Field" in sheet "2.3.4i" — will be captured in extra_json', value: 'Extra_Field', sheet: '2.3.4i' }
]
}
Drift rules:
| Rule | Severity | Condition |
|---|---|---|
| Missing core column | breaking |
A detail sheet (not in skip_sheets, present in xlsx) is missing a column from core_cols |
| Missing detail sheet | breaking |
A sheet name in metric_categories (and not in skip_sheets) is absent from the xlsx |
| Unknown metric value | silent_miss |
A metric value in the Summary sheet is not a key in metric_categories |
| Unknown sheet | silent_miss |
An xlsx sheet is not in skip_sheets and not in metric_categories |
| New column in detail sheet | cosmetic |
A detail sheet has columns not in core_cols |
| Stale metric category | cosmetic |
A key in metric_categories does not appear in the Summary sheet's metric values |
4. Preview Endpoint Changes (POST /api/compliance/preview)
The existing /preview handler is modified to:
- After receiving the uploaded file, spawn
extract_xlsx_schema.pyto get the xlsx schema. - Read
compliance_config.jsonfrom disk. - Call
compareSchemaToDrift(schema, config)to produce the drift report. - Proceed with the existing
parseXlsx()call andcomputeDiff(). - Include
drift(the DriftReport object) and optionallydrift_error(string) in the response.
If the schema extraction or drift check throws, set drift: null and drift_error: <message>, then continue with the normal flow.
Updated response shape:
{
"drift": {
"breaking": [],
"silent_miss": [],
"cosmetic": []
},
"drift_error": null,
"diff": { "new_count": 5, "recurring_count": 120, "resolved_count": 3 },
"tempFile": "/path/to/temp.json",
"filename": "NTS_AEO_2026_03_25.xlsx",
"report_date": "2026-03-25",
"total_items": 125
}
5. Upload Modal Changes (ComplianceUploadModal.js)
New phase: drift-review inserted between uploading and preview.
Phase flow:
idle → uploading → drift-review (if findings) → preview → committing → done
→ preview (if no findings)
Drift review UI:
- Findings grouped by severity: breaking first, then silent-miss, then cosmetic.
- Each group has a header with severity label and count badge.
- Groups with more than 5 findings collapse with a "Show N more" toggle.
- Each finding shows the message text and the triggering value.
- Breaking findings: red text (
#EF4444), red left-border accent. - Silent-miss findings: amber text (
#F59E0B), amber left-border accent. - Cosmetic findings: muted text (
#94A3B8), subtle left-border accent. - "Cancel" button returns to idle. "Continue to Preview" button advances to diff preview.
- "Continue to Preview" is disabled when breaking findings exist, with a message explaining the block.
- When
driftisnull(drift check failed), skip drift-review and go straight to preview.
Data Models
DriftFinding
{
severity: 'breaking' | 'silent_miss' | 'cosmetic',
message: string, // Human-readable description
value: string, // The specific column/sheet/metric that triggered the finding
sheet: string|null // Sheet name context (when applicable)
}
DriftReport
{
breaking: DriftFinding[],
silent_miss: DriftFinding[],
cosmetic: DriftFinding[]
}
ParserConfig
{
metric_categories: { [metricId: string]: string }, // metric ID → category name
core_cols: string[], // column names for main item fields
skip_sheets: string[] // sheet names excluded from parsing
}
XlsxSchema (output of extract_xlsx_schema.py)
{
sheets: [
{
name: string,
columns: string[],
metric_values?: string[] // only present on Summary sheet
}
]
}
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Breaking drift completeness
For any xlsx schema and parser config, the drift checker SHALL produce a breaking finding for every core column missing from every detail sheet, and for every detail sheet (present in metric_categories but not in skip_sheets) absent from the xlsx — and no other breaking findings. The set of breaking findings is exactly the union of missing-core-column findings and missing-detail-sheet findings.
Validates: Requirements 3.1, 3.2, 3.3
Property 2: Silent-miss drift completeness
For any xlsx schema and parser config, the drift checker SHALL produce a silent-miss finding for every metric value in the Summary sheet not present in metric_categories, and for every xlsx sheet not in skip_sheets and not in metric_categories — and no other silent-miss findings. The set of silent-miss findings is exactly the union of unknown-metric findings and unknown-sheet findings.
Validates: Requirements 4.1, 4.2, 4.3
Property 3: Cosmetic drift completeness
For any xlsx schema and parser config, the drift checker SHALL produce a cosmetic finding for every column in a detail sheet not present in core_cols, and for every key in metric_categories not present in the Summary sheet's metric values — and no other cosmetic findings. The set of cosmetic findings is exactly the union of new-column findings and stale-metric findings.
Validates: Requirements 5.1, 5.2, 5.3
Property 4: Drift severity ordering
For any drift report containing a mix of breaking, silent-miss, and cosmetic findings, the grouping function SHALL always return findings ordered by severity: all breaking findings first, then all silent-miss findings, then all cosmetic findings.
Validates: Requirements 8.1
Error Handling
Python Script Failures
| Failure | Handling |
|---|---|
extract_xlsx_schema.py exits non-zero |
Preview endpoint sets drift: null, drift_error: <stderr message>, continues with normal parse flow |
extract_xlsx_schema.py returns invalid JSON |
Same as above — caught in JSON.parse, treated as drift check failure |
compliance_config.json missing or invalid (Node.js read) |
Preview endpoint returns 500 with message "Configuration file could not be loaded" |
compliance_config.json missing or invalid (Python parser read) |
Parser exits non-zero, stderr describes the error, preview endpoint returns 500 with parse error |
| xlsx file cannot be opened by schema extractor | Schema extractor returns { "error": "..." } on stdout, exits non-zero; drift check skipped gracefully |
Frontend Error States
| Condition | Behavior |
|---|---|
drift is null in preview response |
Skip drift-review phase, proceed directly to diff preview |
drift_error is present |
Optionally display a subtle warning in the diff preview that drift check was skipped |
| Network error during upload | Existing error phase handling (unchanged) |
Config File Validation
The Node.js config loader validates that:
- The file exists and is readable.
- The content parses as valid JSON.
- The parsed object contains
metric_categories(object),core_cols(array), andskip_sheets(array).
If any check fails, the loader throws with a descriptive message. The preview handler catches this and returns a 500 response.
Testing Strategy
Unit Tests
Drift checker (driftChecker.js):
- Breaking: missing core column produces finding with correct severity, message, value, and sheet.
- Breaking: missing detail sheet produces finding.
- Silent-miss: unknown metric value produces finding.
- Silent-miss: unknown sheet produces finding.
- Cosmetic: new column in detail sheet produces finding.
- Cosmetic: stale metric category produces finding.
- Empty schema (no sheets) produces appropriate findings.
- Config with empty metric_categories, core_cols, or skip_sheets.
- Schema and config that are perfectly aligned produce zero findings.
Config loader:
- Valid config file loads correctly.
- Missing file throws descriptive error.
- Invalid JSON throws descriptive error.
- Config missing required keys throws descriptive error.
Frontend drift review component:
- Drift review phase renders when findings exist.
- "Continue to Preview" button disabled when breaking findings present.
- "Continue to Preview" button enabled when no breaking findings.
- Groups collapse at 5+ findings with correct "Show N more" count.
- Cancel returns to idle phase.
- Skips drift review when drift is null or has no findings.
Property-Based Tests
Property-based tests use fast-check (JavaScript) to verify the four correctness properties defined above. Each test generates random schema and config objects and verifies the drift checker output against the expected set-theoretic result.
Configuration:
- Minimum 100 iterations per property test.
- Each test tagged with: Feature: compliance-schema-drift-check, Property {N}: {title}
Generators:
arbitraryParserConfig: generates randommetric_categories(object with 0–20 string keys mapped to category strings),core_cols(array of 0–15 unique column name strings),skip_sheets(array of 0–5 unique sheet name strings).arbitraryXlsxSchema: generates random sheets array, each with a name, columns array, and optionally metric_values (for the Summary sheet). Sheet names, column names, and metric values drawn from a shared pool to ensure meaningful overlap with the config.
Integration Tests
- Preview endpoint returns drift report alongside existing diff data.
- Preview endpoint returns 200 with breaking drift (does not error).
- Preview endpoint gracefully degrades when drift check fails (
drift: null,drift_errorpresent). - Preview endpoint returns 500 when config file is missing.
- Python parser reads from
compliance_config.jsonand produces same output as before. - Commit endpoint is unchanged and does not reference drift.