Clicking a metric now shows a sub-team breakdown page with totals per team (compliant, non-compliant, total, %) instead of jumping directly to a flat device list. Clicking a sub-team then shows the device list filtered to that team only. Navigation flow: Overview → Vertical → Metric (sub-team totals) → Team (devices) Backend: added optional ?team= query param to the device list endpoint for filtered queries. Frontend: added MetricSubTeamView component with metric-level stats bar and clickable sub-team table. Updated navigation state to include selectedTeam. Also updated design brief to reflect the new drill-down hierarchy.
292 lines
16 KiB
Markdown
292 lines
16 KiB
Markdown
# VCL Multi-Vertical Upload — Design Brief
|
||
|
||
## Purpose
|
||
|
||
This document summarizes the design decisions and architectural choices for the VCL Multi-Vertical Upload feature. It is intended as a reference for presenting the approach to stakeholders and the compliance team.
|
||
|
||
---
|
||
|
||
## What We Are Building
|
||
|
||
A new upload flow on the STEAM Security Dashboard that accepts multiple per-vertical compliance xlsx files (one per organizational vertical), ingests them with vertical-scoped resolution logic, and generates an executive-level VCL compliance report across all organizations — with drill-down by vertical and by metric.
|
||
|
||
This is a POC. The compliance team currently exports data from CyberMetrics as xlsx files on a 24-hour cycle. This feature lets them upload those files and generate the same reports they currently build manually in PowerPoint/Excel for senior leadership.
|
||
|
||
---
|
||
|
||
## The Problem It Solves
|
||
|
||
Today the compliance team has 14 separate xlsx files — one per vertical (NTS_AEO, SDIT_CISO, TSI, etc.). The existing dashboard upload flow accepts a single consolidated file and treats it as the complete compliance state. If you upload just one vertical's file, the system incorrectly marks every device from the other 13 verticals as "resolved."
|
||
|
||
There is no automated way to:
|
||
- Ingest all 14 files and produce a unified report
|
||
- Drill down from the organizational view into specific metrics and devices
|
||
- Generate burndown forecasts across verticals
|
||
|
||
---
|
||
|
||
## Key Architectural Decisions
|
||
|
||
### 1. Vertical-Scoped Resolution
|
||
|
||
**Decision:** When a file for vertical X is committed, only items belonging to vertical X are evaluated for resolution. All other verticals are untouched.
|
||
|
||
**Why:** This is the fundamental change that makes per-vertical uploads safe. Without it, uploading one file would destroy data from the other 13 verticals.
|
||
|
||
**Implication:** Verticals are independent. You can upload NTS_AEO on Monday and SDIT_CISO on Wednesday without interference. This also supports the daily upload cadence the compliance team wants.
|
||
|
||
### 2. Vertical Identity Comes From the Filename
|
||
|
||
**Decision:** The vertical code is extracted from the filename pattern `<VERTICAL>_YYYY_MM_DD.xlsx`, not from data inside the xlsx.
|
||
|
||
**Why:** The internal xlsx structure is identical across verticals — same Summary sheet, same metric detail sheets, same columns. The only differentiator is the filename. This also means the Python parser requires zero changes.
|
||
|
||
**Implication:** Filenames must follow the convention. If they don't, the system flags them as "unrecognized" and the user can manually assign a vertical. This is a reasonable tradeoff for a POC.
|
||
|
||
### 3. Separate From Existing AEO Upload
|
||
|
||
**Decision:** This is a new flow with its own endpoints (`/api/compliance/vcl-multi/...`), its own UI page, and its own nav entry. The existing AEO compliance upload is unchanged.
|
||
|
||
**Why:**
|
||
- The existing flow works for the STEAM/ACCESS-ENG team's day-to-day operations
|
||
- The compliance team may deploy this on a separate instance to experiment without affecting production
|
||
- Different user groups with different needs — engineers vs. compliance analysts vs. senior leadership
|
||
|
||
**Implication:** There are now two ways to upload compliance data. They coexist via the `vertical` column — existing AEO data has `vertical = NULL`, multi-vertical data has a vertical code. The VCL report page can aggregate either or both.
|
||
|
||
### 4. Two-Dimensional Grouping (Vertical + Team)
|
||
|
||
**Decision:** `vertical` and `team` are separate fields. Vertical is the organizational unit (NTS_AEO, SDIT_CISO). Team is the sub-team within a vertical (STEAM, ACCESS-ENG, ACCESS-OPS).
|
||
|
||
**Why:** NTS_AEO contains multiple sub-teams. Senior leadership wants to see the vertical-level view. The STEAM team wants to see their team-level view. Both are valid groupings on the same data.
|
||
|
||
**Implication:** The cross-organizational report groups by vertical. Drilling into NTS_AEO still shows the STEAM/ACCESS-ENG/ACCESS-OPS breakdown because that data exists in the "Team" column inside the xlsx.
|
||
|
||
### 5. Summary Sheet Data Stored Separately
|
||
|
||
**Decision:** The parsed Summary sheet (metric-level health data) is stored in a dedicated `vcl_multi_vertical_summary` table, not just as JSON on the upload record.
|
||
|
||
**Why:** The metric drill-down view needs to query per-metric compliance percentages and targets efficiently. Storing structured rows enables filtering, sorting, and aggregation at the database level rather than parsing JSON blobs in application code.
|
||
|
||
**Implication:** Slightly more storage, but enables fast queries like "show me all metrics below target across all verticals" without full-table scans.
|
||
|
||
### 6. Batch Upload With Atomic Commit
|
||
|
||
**Decision:** All files in a batch are committed in a single database transaction. If any file fails, the entire batch rolls back.
|
||
|
||
**Why:** Partial commits would leave the report in an inconsistent state — some verticals updated, others stale. The compliance team uploads all 14 files together as a reporting cycle. It should either all succeed or all fail.
|
||
|
||
**Implication:** If one file has a parsing error, the user is shown the error in the preview phase (before commit). They can remove that file from the batch and commit the rest. Once they hit "Commit," it's all-or-nothing.
|
||
|
||
### 7. Daily Upload Support (Idempotent)
|
||
|
||
**Decision:** Re-uploading the same vertical on the same day produces the same final state as uploading it once. The system doesn't create duplicate records.
|
||
|
||
**Why:** CyberMetrics refreshes on a 24-hour cycle. The compliance team may want to upload daily to track movement. They shouldn't have to worry about "did I already upload today?"
|
||
|
||
**Implication:** The resolution logic uses `vertical + hostname + metric_id` as the identity key. Recurring items get their `seen_count` incremented and metadata updated. New items are inserted. Missing items are resolved. Same logic as today, just scoped to the vertical.
|
||
|
||
---
|
||
|
||
## Drill-Down Hierarchy
|
||
|
||
```
|
||
Executive Overview (all verticals aggregated)
|
||
│
|
||
├── Stats: 2.1M devices, 97% compliant, target 95%
|
||
├── Trend: monthly compliance % with forecast
|
||
├── Donut: blocked vs in-progress (non-compliant devices)
|
||
│
|
||
└── Vertical Breakdown Table
|
||
│
|
||
├── NTS_AEO — 99% — 2,163 non-compliant — click to drill down
|
||
│ │
|
||
│ ├── Team Filter: [All (Rollup)] [ACCESS-ENG] [ACCESS-OPS] [INTELDEV] [STEAM]
|
||
│ │
|
||
│ ├── Metric Breakdown (expandable rows)
|
||
│ │ ├── ▸ 5.5.4i (Vulnerability Mgmt) — 97.0% — 1,762 NC — target 80%
|
||
│ │ │ ├── └ ACCESS-ENG: 7 compliant, 1 NC, 8 total — 88.0%
|
||
│ │ │ ├── └ ACCESS-OPS: 64,051 compliant, 1,746 NC, 65,797 total — 97.0%
|
||
│ │ │ ├── └ INTELDEV: 233 compliant, 11 NC, 244 total — 95.0%
|
||
│ │ │ └── └ STEAM: 123 compliant, 4 NC, 127 total — 97.0%
|
||
│ │ │
|
||
│ │ ├── Click metric ID → Metric Sub-Team View
|
||
│ │ │ ├── Stats: total 66,176 | compliant 64,414 | NC 1,762 | 97% | target 80%
|
||
│ │ │ └── Sub-Team Table:
|
||
│ │ │ ├── ACCESS-ENG — 8 total — 88.0% → click
|
||
│ │ │ │ └── Device list (filtered to ACCESS-ENG)
|
||
│ │ │ ├── ACCESS-OPS — 65,797 total — 97.0% → click
|
||
│ │ │ │ └── Device list (filtered to ACCESS-OPS)
|
||
│ │ │ ├── INTELDEV — 244 total — 95.0% → click
|
||
│ │ │ └── STEAM — 127 total — 97.0% → click
|
||
│ │ └── ...
|
||
│ │
|
||
│ └── Burndown: blockers, with dates, projected clear date
|
||
│
|
||
├── SDIT_CISO — 72% — 68 non-compliant
|
||
└── ...
|
||
```
|
||
|
||
---
|
||
|
||
## How Metrics Are Calculated
|
||
|
||
### Data Sources
|
||
|
||
Each vertical's xlsx file contains two types of data:
|
||
|
||
1. **Summary sheet** — one row per metric per sub-team, with pre-calculated totals (compliant, non-compliant, total, compliance %, target). This is the source of truth for aggregate numbers.
|
||
|
||
2. **Detail sheets** — one sheet per metric, listing individual non-compliant devices (hostname, IP, device type, team). These feed the device-level drill-down.
|
||
|
||
### The Double-Counting Problem (and How We Solve It)
|
||
|
||
The Summary sheet contains **two levels of rows** for each metric:
|
||
|
||
| Row Type | Example | Purpose |
|
||
|---|---|---|
|
||
| Sub-team rows | ACCESS-OPS, STEAM, INTELDEV | Individual team breakdown |
|
||
| Rollup row | ALL: NTS-AEO | Sum of all sub-teams for that metric |
|
||
|
||
The rollup row already includes all sub-team totals. If you sum all rows naively, you count every device twice.
|
||
|
||
**Solution:** All aggregate calculations (stats bar, vertical breakdown, category totals, snapshots) use **only the ALL: rollup rows**. Sub-team rows are stored for drill-down display but never included in totals.
|
||
|
||
### What Each Number Means
|
||
|
||
| Metric | Source | Meaning |
|
||
|---|---|---|
|
||
| **Total Devices** | Sum of `total` from ALL: rows across all metrics for a vertical | Total device-metric pairs evaluated (a device appears once per metric it's measured against) |
|
||
| **Compliant** | Sum of `compliant` from ALL: rows | Device-metric pairs that pass the compliance check |
|
||
| **Non-Compliant** | Sum of `non_compliant` from ALL: rows | Device-metric pairs that fail |
|
||
| **Compliance %** | `compliant / total * 100` | Percentage of device-metric pairs passing |
|
||
| **Target %** | Per-metric value from the spreadsheet (e.g., 95%, 80%, 75%) | The threshold set by the compliance program |
|
||
| **Blockers** | Non-compliant devices in `compliance_items` with no `resolution_date` | Devices with no committed remediation timeline |
|
||
| **In-Progress** | Non-compliant devices with a `resolution_date` set | Devices with a planned fix date |
|
||
|
||
### Important: "Total Devices" Is Not Unique Devices
|
||
|
||
A single physical device (hostname) can appear in multiple metrics. For example, one router might be measured against metric 5.5.4i (vulnerability scanning), 7.1.1 (logging), and 2.3.6i (patching). The "Total Devices" count is the sum of all device-metric evaluations, not unique hostnames.
|
||
|
||
This matches how CyberMetrics reports — each metric has its own scope of applicable devices, and the overall compliance percentage reflects performance across all metrics.
|
||
|
||
### Per-Metric Compliance Percentage
|
||
|
||
Each metric row shows its own compliance percentage, which comes directly from the Summary sheet's "Current Compliance" column. This is a decimal between 0 and 1 (displayed as 0–100% in the UI). The target is also per-metric — some metrics have a 95% target, others 80% or 75%, depending on the compliance program's priorities.
|
||
|
||
### Category Aggregation
|
||
|
||
Metrics are grouped into categories (Logging & Monitoring, Vulnerability Management, Access & MFA, Endpoint Protection, etc.) based on a static mapping in `compliance_config.json`. The category cards in the drill-down view show the aggregate compliance % across all metrics in that category, using only rollup rows.
|
||
|
||
---
|
||
|
||
## Sub-Team Drill-Down
|
||
|
||
### How It Works
|
||
|
||
When you click into a vertical (e.g., NTS_AEO), the metrics table shows the **rollup totals** by default — one row per metric with the ALL: numbers. Two mechanisms expose sub-team data:
|
||
|
||
**1. Expand/Collapse (▸ arrow)**
|
||
|
||
Click the arrow on any metric row to reveal sub-team rows inline beneath it. Each sub-team row shows that team's compliant/non-compliant/total/% for that specific metric. The sub-team rows are visually indented and teal-highlighted.
|
||
|
||
This is useful for: "Which team is dragging down metric 5.5.4i?"
|
||
|
||
**2. Team Filter Buttons**
|
||
|
||
A row of filter buttons appears above the metrics table showing all teams in that vertical (e.g., ACCESS-ENG, ACCESS-OPS, INTELDEV, STEAM). Click one to filter the entire table to show only that team's numbers per metric. The "All (Rollup)" button returns to the aggregated view.
|
||
|
||
This is useful for: "Show me STEAM's compliance across all metrics."
|
||
|
||
### What "(Other)" Means
|
||
|
||
Some metrics have a team value of `(Other)` in the Summary sheet. This represents devices that don't map to a named sub-team. These are included in the ALL: rollup total but are not shown as a separate sub-team in the UI — they're noise for the compliance team's purposes.
|
||
|
||
### Device-Level Drill-Down
|
||
|
||
Clicking a sub-team row in the metric sub-team view navigates to the device list — individual non-compliant hostnames for that vertical + metric + team combination. The device list is filtered to only show devices belonging to the selected team. This data comes from the detail sheets (not the Summary sheet) and shows:
|
||
|
||
- Hostname, IP address, device type, team
|
||
- Seen count (how many consecutive uploads this device has been non-compliant)
|
||
- First seen / last seen dates
|
||
- Resolution date (if set)
|
||
- Remediation plan (if documented)
|
||
|
||
If a metric has no sub-team breakdown (e.g., only an "(Other)" team), a "View All Devices" button is shown instead, which loads the full unfiltered device list for that metric.
|
||
|
||
The full navigation path is:
|
||
|
||
```
|
||
Overview → Vertical → Metric (sub-team totals) → Team (device list)
|
||
```
|
||
|
||
---
|
||
|
||
## Burndown Forecast
|
||
|
||
The burndown forecast answers: "When will this vertical reach compliance?"
|
||
|
||
**How it works:**
|
||
1. Each non-compliant device can have a `resolution_date` set (target remediation date)
|
||
2. Devices with dates are bucketed by month → "20 devices expected remediated in June, 35 in July"
|
||
3. Devices without dates are counted as "blockers" — no committed timeline
|
||
4. The trend chart uses linear regression on 3+ months of actual data to project a forecast line
|
||
|
||
**What feeds it:**
|
||
- Resolution dates can be set manually (click device → set date) or via bulk upload (xlsx with Hostname + Resolution Date columns)
|
||
- The existing bulk upload flow on the VCL page already supports this
|
||
|
||
**What the compliance team sees:**
|
||
- Per-vertical: "NTS_AEO has 80 non-compliant, 25 are blockers, 55 have dates, projected clear by August 2026"
|
||
- Aggregated: trend line showing whether the organization is on track to hit 95% target
|
||
|
||
---
|
||
|
||
## What Does NOT Change
|
||
|
||
- Existing AEO compliance upload (single file) — unchanged
|
||
- Existing VCL report page (STEAM/ACCESS-ENG view) — unchanged
|
||
- Existing compliance_items table structure — only adds a nullable `vertical` column
|
||
- Python parser — reused as-is, no modifications
|
||
- Auth model — same groups (Admin, Standard_User) required for upload
|
||
|
||
---
|
||
|
||
## Deployment Options
|
||
|
||
| Option | Description |
|
||
|---|---|
|
||
| Same instance | Add the feature to the existing dashboard. Multi-vertical data coexists with AEO data via the `vertical` column. |
|
||
| Separate instance | Deploy a fresh instance with its own database. Compliance team experiments freely. No risk to dev/production data. |
|
||
| Later: API integration | Replace xlsx upload with direct CyberMetrics API calls. Backend endpoints stay the same — just a different client pushing data. |
|
||
|
||
The architecture supports all three without code changes. The `vertical` column and scoped resolution logic work regardless of deployment topology.
|
||
|
||
---
|
||
|
||
## Open Questions for the Meeting
|
||
|
||
1. **Vertical list** — Are the 14 verticals in the screenshot the complete set, or do new verticals get added periodically? (Affects whether we hardcode a list or keep it dynamic.)
|
||
|
||
2. **Target % per vertical** — Is the 95% target uniform across all verticals, or do different verticals have different targets?
|
||
|
||
3. **Access control** — Should the compliance team have their own user accounts with a specific role, or do they use existing Admin/Standard_User groups?
|
||
|
||
4. **Naming** — What should this page be called in the nav? "CCP Metrics", "VCL Multi-Vertical", "Compliance Reporting", something else?
|
||
|
||
5. **Retention** — How long should historical upload data be kept? (Affects trend chart depth and storage.)
|
||
|
||
---
|
||
|
||
## Timeline Estimate
|
||
|
||
| Phase | Scope | Effort |
|
||
|---|---|---|
|
||
| 1. Migration + backend endpoints | Schema changes, upload flow, scoped resolution, stats/trend/drill-down APIs | 2–3 days |
|
||
| 2. Frontend — upload modal | Multi-file drop, filename parsing, batch preview, commit | 1–2 days |
|
||
| 3. Frontend — report page | Stats bar, vertical table, trend chart, donut, drill-down views | 2–3 days |
|
||
| 4. Frontend — burndown | Per-vertical burndown chart, blocker counts, forecast | 1 day |
|
||
| 5. Testing + polish | Property tests, edge cases, error handling, loading states | 1 day |
|
||
|
||
Total: roughly 7–10 working days for the full POC.
|