Files

Jordan Ramos a95fd03f5e

Rebrand STEAM → AEGIS, fix BU drift checker previous_bu bug

- Replace all STEAM branding with AEGIS (Advanced Engineering Group
  Intelligence System) across login, header, nav drawer, manifest, and
  browser title
- Add shield logo to login page, main header, and nav drawer
- Fix BU drift checker recording incorrect previous_bu values by
  building a previousBuMap snapshot BEFORE the upsert/delete cycle
  instead of querying the DB after rows are already gone
- Clean 526 bogus BU history entries generated by the broken logic
- Add docs and scripts from prior session

2026-06-17 14:40:38 -06:00

32 KiB

Raw Permalink Blame History

Split Architecture Proposal: Collector + Indexer

Author: Infrastructure Team
Date: 2026-06-08
Status: Draft — Pending Review
Scope: Scale CVE Dashboard from 2 teams / ~15 users to company-wide deployment (100+ users, 15+ teams)

Executive Summary

The STEAM Security Dashboard currently runs as a monolithic single-process Express application on CT107 (dashboard-dev, 71.85.90.9). This single process simultaneously serves the frontend, handles all API requests, and performs background data collection from Ivanti, Jira, CARD, Atlas, and NVD APIs.

At current scale (2 teams, <15 users, daily sync), this architecture works. At company-wide scale (15+ teams, hundreds of users, sub-hourly sync), it will not. This document proposes a phased transition to a Collector + API Server architecture that separates data ingestion from request serving.

Critical constraint: CT107 (71.85.90.9) has the firewall rules granting access to the production Ivanti, Jira, and CARD APIs. The collector component must remain on this machine or firewall rules must be extended.

Current Architecture
Problem Statement
Proposed Architecture
Phase Plan
Infrastructure Requirements
Risk Assessment
Decision Points
Appendix: Current Data Flow Analysis

Current Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CT107 (dashboard-dev)                         │
│                    71.85.90.9 — 48 GB RAM, 250 GB Disk          │
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │              Express Process (port 3001/3100)             │  │
│  │                                                           │  │
│  │  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐  │  │
│  │  │  React SPA  │  │   API Routes │  │  Sync Workers  │  │  │
│  │  │  (static)   │  │  (50+ endpts)│  │  (setInterval) │  │  │
│  │  └─────────────┘  └──────────────┘  └────────────────┘  │  │
│  │                          │                    │           │  │
│  │                          │    Shared PG Pool (10 conn)    │  │
│  │                          │          │                     │  │
│  └──────────────────────────┼──────────┼─────────────────────┘  │
│                             │          │                         │
│  ┌──────────────────────────▼──────────▼─────────────────────┐  │
│  │         PostgreSQL 16 (Docker, port 5433)                 │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
│  Firewall Access: Ivanti API, Jira DC, CARD API, Atlas API      │
└─────────────────────────────────────────────────────────────────┘

Key Metrics (Current)

Metric	Current Value	Company-Wide Projection
Concurrent users	5–15	100–300
Teams tracked	2	15+
Ivanti findings (open)	~200–500	2,000–10,000+
Ivanti sync frequency	24h	1–4h desired
PG connection pool	10	Insufficient
Jira API rate limit	1,440/day	Shared across all users
Data sources	5 (Ivanti, NVD, Jira, Atlas, CARD)	8+ (add CrowdStrike, Qualys, Tanium)

Problem Statement

1. Sync Blocks the API Server

syncFindings() runs sequentially through:

Fetch all open findings pages (100/page)
Upsert findings batch into PostgreSQL
Detect archive changes (compare all previous vs current)
Fetch all closed findings pages
Upsert closed findings
Run BU drift checker (makes additional API calls per disappeared finding)
Sync FP workflow counts (sweeps all closed pages again)
Compute and store anomaly summary
Record counts history

At 500 findings, this takes 2–5 minutes. At 10,000 findings across 15 teams, this could take 15–30 minutes. During sync, the Express process is saturated — API responses slow, the connection pool contends.

2. Single Point of Failure

One process handles everything. A memory leak during sync, an unhandled promise rejection in the BU drift checker, or a runaway loop in archive detection crashes the entire dashboard for all users.

3. Connection Pool Exhaustion

10 connections shared between:

User-facing read queries (findings list, compliance items, charts)
Sync bulk upserts (batches of 100 rows × 18 columns)
User writes (notes, overrides, queue operations)

The pool already logs warnings at 8/10 active. At 100+ concurrent users issuing reads while a sync writes thousands of rows, this will deadlock or time out.

4. Rate Limits Shared Across Functions

Jira's 1,440/day limit is consumed by both background sync and user-initiated operations (lookups, ticket creation). A bulk sync could exhaust the daily budget, blocking users from creating tickets the rest of the day.

5. No Horizontal Scaling Path

Cannot add a second API server without also duplicating the sync scheduler, which would cause duplicate syncs, double-writes, and race conditions.

6. Firewall Constraint

CT107 has the only firewall access to production Ivanti, Jira, and CARD APIs. The collector (data fetcher) must run on this machine. The API server could potentially move elsewhere, but the collector cannot without firewall changes.

Proposed Architecture

Target State

┌─────────────────────────────────────────────────────────────────┐
│                    CT107 (dashboard-dev)                         │
│                    71.85.90.9 — 48 GB RAM, 250 GB Disk          │
│                    ★ Firewall access to prod APIs ★             │
│                                                                 │
│  ┌───────────────────────────────────┐  ┌─────────────────────┐│
│  │   API Server (Express, port 3001) │  │  Collector Service  ││
│  │                                   │  │  (Node.js worker)   ││
│  │  • React SPA serving             │  │                     ││
│  │  • All /api/* read endpoints     │  │  • Ivanti sync      ││
│  │  • User writes (notes, queue)    │  │  • Jira bulk sync   ││
│  │  • On-demand lookups (proxied)   │  │  • CARD cache sync  ││
│  │  • Triggers collector via        │  │  • Atlas cache sync ││
│  │    pg NOTIFY                     │  │  • NVD bulk sync    ││
│  │                                   │  │  • Archive detect   ││
│  │  Pool: 15 conn (reads + writes)  │  │  • BU drift checker ││
│  │                                   │  │  • Anomaly compute  ││
│  └───────────────┬───────────────────┘  │  • Compliance parse ││
│                  │                       │                     ││
│                  │                       │  Pool: 10 conn      ││
│                  │                       │  (bulk upserts)     ││
│                  │                       │                     ││
│                  │                       │  Listens:           ││
│                  │                       │    pg LISTEN         ││
│                  │                       │    'sync_trigger'    ││
│                  │                       └──────────┬──────────┘│
│                  │                                  │           │
│  ┌───────────────▼──────────────────────────────────▼─────────┐│
│  │              PostgreSQL 16 (Docker, port 5433)              ││
│  │              Pool: 25 total connections allocated           ││
│  └────────────────────────────────────────────────────────────┘│
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Component Responsibilities

API Server (`cve-api.service`)

Responsibility	Details
Frontend serving	Static React build via `express.static`
Read endpoints	All GET routes — findings, compliance, charts, exports
User writes	Notes, overrides, queue items, ticket CRUD, KB uploads
On-demand lookups	Single NVD lookup, single Jira issue lookup, CARD real-time queries
Sync trigger	`SELECT pg_notify('sync_trigger', '{"type":"findings","user":"admin"}')`
Health/status	Expose collector status via sync_state table reads

Collector (`cve-collector.service`)

Responsibility	Details
Scheduled syncs	Ivanti findings (configurable interval), workflows (24h)
Bulk API operations	Jira JQL sync-all, Atlas cache refresh, NVD bulk sync
Post-sync processing	Archive detection, BU drift classification, closed-gone detection
Anomaly computation	Open/closed deltas, classification breakdown, significance flagging
Compliance parsing	Spawns Python subprocess for xlsx parsing on upload commit
Event-driven triggers	Listens on `pg LISTEN sync_trigger` for on-demand requests
Rate budget management	Owns the Jira daily/burst counters; API server gets a reserved allocation

Communication Pattern

User clicks "Sync" in UI
         │
         ▼
API Server receives POST /api/ivanti/findings/sync
         │
         ▼
API Server: SELECT pg_notify('sync_trigger', '{"type":"findings"}')
         │
         ▼
API Server responds: { status: 'sync_started', message: 'Check /sync-status' }
         │
         ▼
Collector receives NOTIFY, starts syncFindings()
         │
         ▼
Collector updates ivanti_sync_state (status='syncing')
         │
         ▼
Collector completes, updates ivanti_sync_state (status='success')
         │
         ▼
Frontend polls GET /api/ivanti/findings/sync-status → sees 'success' → refreshes

No Redis. No message broker. Just PostgreSQL LISTEN/NOTIFY — zero new infrastructure.

Phase Plan

Phase 0: Immediate Improvements (Week 1–2)

Goal: Reduce risk within the current monolith. No architectural changes.

Task	Effort	Impact
Make `POST /sync` non-blocking — return immediately, let sync run in background	2h	Unblocks users during sync
Add `GET /api/ivanti/findings/sync-status` endpoint	1h	Frontend can poll for completion
Increase PG pool from 10 → 20 connections	10min	Headroom for concurrent operations
Add `pg_stat_activity` monitoring query to health endpoint	30min	Visibility into pool pressure
Update frontend to poll sync-status instead of waiting	2h	UX improvement

Deliverables:

Updated ivantiFindings.js with async sync dispatch
New sync-status polling endpoint
Frontend ReportingPage sync UX updated
Pool configuration change in db.js

Phase 1: Extract Collector (Weeks 3–4)

Goal: Separate data collection into its own process on CT107.

Task	Effort	Impact
Create `backend/collector.js` — standalone Node process	4h	Fault isolation
Move sync functions from route files into shared `lib/sync/` modules	3h	Code reuse between collector and API
Implement pg LISTEN/NOTIFY trigger mechanism	2h	API → Collector communication
Create `cve-collector.service` systemd unit	30min	Process management
Add collector health check and status reporting	1h	Observability
Update `POST /sync` routes to use pg_notify instead of inline sync	1h	Complete decoupling
Add `sync_jobs` table for job tracking (queued, running, complete, failed)	1h	Multi-user sync coordination
Update CI/CD pipeline to deploy collector service	2h	Automated deployment

Deliverables:

backend/collector.js — entry point for collector process
backend/lib/sync/ — shared sync logic (extracted from routes)
systemd/cve-collector.service — systemd unit
Updated .gitlab-ci.yml with collector deploy stage
sync_jobs table for job state tracking

File structure after Phase 1:

backend/
├── server.js                # API server (unchanged entry point)
├── collector.js             # NEW — collector entry point
├── db.js                    # Shared pool config
├── lib/
│   └── sync/
│       ├── ivantiFindings.js    # Extracted from routes/ivantiFindings.js
│       ├── ivantiWorkflows.js   # Extracted from routes/ivantiWorkflows.js
│       ├── jiraBulkSync.js      # Extracted from routes/jiraTickets.js
│       ├── atlasCache.js        # Extracted from routes/atlas.js
│       ├── nvdBulkSync.js       # New — bulk NVD operations
│       ├── archiveDetection.js  # Extracted from routes/ivantiFindings.js
│       └── anomalyCompute.js    # Extracted from routes/ivantiFindings.js
├── routes/                  # API routes — now thin, read-heavy
└── helpers/                 # Shared API client helpers (unchanged)

Phase 2: Multi-Tenancy & Scale Hardening (Weeks 5–8)

Goal: Prepare for 15 teams and hundreds of users.

Task	Effort	Impact
Per-team sync scheduling — stagger syncs to avoid API burst	3h	Spreads load
Jira rate budget partitioning (collector gets 80%, API gets 20%)	2h	Prevents sync from starving users
Per-BU finding isolation — team users only see their findings	4h	Data scoping
Add connection pooling metrics endpoint (`/api/admin/pool-stats`)	1h	Operational visibility
Implement sync queue with priority (user-triggered > scheduled)	3h	Better UX
Add retry logic with exponential backoff to collector	2h	Resilience
Partial-progress persistence — don't lose work on mid-sync failure	4h	Data integrity
PG connection pool separation — API pool (15) + Collector pool (10)	1h	Isolation
Add `pg_bouncer` or similar for connection multiplexing (optional)	4h	Scale past 50 concurrent

Deliverables:

Team-scoped sync scheduler in collector
Rate budget allocation system
Retry/backoff logic
Partial progress tracking
Pool separation

Phase 3: Additional Data Sources (Weeks 9–12)

Goal: Integrate CrowdStrike, Qualys, and Tanium feeds.

Task	Effort	Impact
CrowdStrike Falcon API integration in collector	8h	New vulnerability source
Qualys VMDR API integration in collector	8h	New vulnerability source
Tanium asset inventory sync	6h	Asset correlation
Cross-source finding deduplication logic	6h	Data quality
Unified findings view (merged from all sources)	4h	Single pane of glass
Source-specific sync schedules (configurable per source)	2h	Flexibility

Note: All new API integrations go into the collector. The API server never makes outbound calls to external vulnerability platforms except for single-item on-demand lookups.

Firewall implications: CrowdStrike, Qualys, and Tanium API access will need firewall rules added to CT107 (71.85.90.9). Submit firewall requests in advance.

Phase 4: Horizontal Scaling (Weeks 13+)

Goal: Support 300+ concurrent users if company-wide adoption materializes.

Task	Effort	Impact
Move API server to a separate LXC container (with more resources)	4h	Dedicated API resources
Run multiple API server instances behind a load balancer	8h	Horizontal scale
Keep collector on CT107 (firewall access)	0h	No change needed
Add Redis for session store (replace PG sessions)	4h	Multi-instance sessions
Add read replicas if PG becomes the bottleneck	8h	Read scale
Evaluate moving PG to CT109 (zbl-indexer, 32GB/500GB)	2h	Larger DB host

Architecture at Phase 4:

                    ┌─────────────────┐
                    │  Load Balancer  │
                    │  (nginx/HAProxy)│
                    └────┬───────┬────┘
                         │       │
           ┌─────────────▼─┐   ┌─▼─────────────┐
           │  API Server 1 │   │  API Server 2 │   (New LXC or CT103)
           │  (Express)    │   │  (Express)    │
           └───────┬───────┘   └───────┬───────┘
                   │                   │
                   └─────────┬─────────┘
                             │
┌────────────────────────────▼──────────────────────────────────────┐
│                    CT107 (71.85.90.9)                              │
│                                                                   │
│  ┌─────────────────────────┐    ┌──────────────────────────────┐ │
│  │  Collector Service      │    │  PostgreSQL 16               │ │
│  │  (sole process with     │    │  (or moved to CT109)         │ │
│  │   firewall API access)  │    │                              │ │
│  └─────────────────────────┘    └──────────────────────────────┘ │
│                                                                   │
│  ★ Firewall: Ivanti, Jira, CARD, Atlas, CrowdStrike, Qualys ★   │
└───────────────────────────────────────────────────────────────────┘

Infrastructure Requirements

CT107 Resource Allocation (Current → Phase 2)

Resource	Current	Phase 2 Target	Notes
RAM	48 GB	48 GB (sufficient)	Node processes use <2GB each
CPU	Shared	May need 4+ dedicated cores	Sync is CPU-intensive during transform
Disk	250 GB	250 GB (sufficient)	PG data + uploads + logs
PG connections	10	25 (15 API + 10 collector)	Configure in `postgresql.conf`
Systemd services	2 (backend + frontend)	3 (api + collector + postgres)	Frontend served by API

PostgreSQL Tuning (for 15 teams / hundreds of users)

# postgresql.conf changes
max_connections = 50            # Up from default 100 is fine, need headroom
shared_buffers = 4GB            # 25% of available RAM for PG
effective_cache_size = 12GB     # 75% of RAM PG can expect from OS
work_mem = 64MB                 # Per-sort/hash operation
maintenance_work_mem = 512MB    # For VACUUM, CREATE INDEX
wal_level = replica             # If read replicas needed later

Firewall Dependencies

Service	Endpoint	Required By	Current Access
Ivanti/RiskSense	platform4.risksense.com:443	Collector	✅ CT107 only
Jira Data Center	jira.charter.com:443	Collector + API (lookups)	✅ CT107 only
CARD API	card.charter.com:443	API (real-time)	✅ CT107 only
Atlas InfoSec	(internal)	Collector	✅ CT107 only
NVD API	services.nvd.nist.gov:443	Collector + API	✅ Public
CrowdStrike	api.crowdstrike.com:443	Collector	❌ Firewall request needed
Qualys	qualysapi.qualys.com:443	Collector	❌ Firewall request needed
Tanium	(internal)	Collector	❌ Firewall request needed

Key constraint: If the API server moves off CT107 in Phase 4, you'll need firewall rules for the new host to reach Jira (for user lookups) and CARD (for real-time queries). Alternatively, the collector could proxy those on-demand requests — adds latency but avoids firewall changes.

Risk Assessment

Risk	Likelihood	Impact	Mitigation
Collector crash doesn't affect API users	—	—	This is the primary benefit of splitting
Collector and API race on DB writes	Medium	Low	Collector does bulk upserts; API does single-row writes. Different tables mostly. Use advisory locks for sync_state.
Sync trigger lost (pg NOTIFY missed)	Low	Medium	Collector also runs on a schedule. Missed trigger just delays to next interval.
Phase 1 introduces bugs in extraction	Medium	Medium	Comprehensive test suite exists. Run parallel (old monolith + new split) in staging for 1 week.
Firewall change delays block Phase 4	High	Medium	Start firewall requests early. Phase 4 is optional — single-machine split (Phases 1–3) works fine at 15 teams.
PG becomes bottleneck at 300+ users	Low	High	Phase 4 addresses with read replicas. CT109 (500GB, 32GB) available as larger DB host.

Decision Points

These require team/leadership input before proceeding:

Sync frequency target: Is 1-hour sync acceptable, or do teams need near-real-time (15 min)? This affects collector design complexity and API rate budget math.
API server location: Keep everything on CT107, or move the API server to a separate container? Keeping it on CT107 is simpler (no firewall changes for CARD/Jira lookups) but limits scaling options.
Database location: Keep PG on CT107, or move to CT109 (zbl-indexer, 500GB disk, 32GB RAM)? Moving adds network latency but gives more room for growth.
CrowdStrike/Qualys/Tanium priority: Which new data sources are most urgent? This affects Phase 3 ordering and firewall request timing.
Session management: At 300+ users, PG-backed sessions will be high-churn. Acceptable, or invest in Redis? Redis adds infrastructure but is the industry standard for session stores at scale.
Multi-instance API: Is the goal to survive a single API server restart without downtime? If yes, Phase 4 (load balancer + multiple instances) is needed. If brief restarts during deploys are acceptable, single-instance on CT107 works through Phase 3.

Appendix: Current Data Flow Analysis

Data Collection Patterns

Source	Trigger	Frequency	Data Volume	Processing
Ivanti Findings	Schedule + manual	24h	100–500 findings (all pages)	Extract, upsert, archive detect, BU drift, anomaly
Ivanti Workflows	Schedule + manual	24h	50 workflow batches	Store as JSON blob
Ivanti Closed Findings	During findings sync	24h	All closed pages	Upsert + closed archive detection
Jira Bulk Sync	Manual (admin)	On-demand	All tracked tickets via JQL	Status/summary update per ticket
Jira Single Lookup	User action	Real-time	1 issue	Proxy + display
NVD Lookup	User action	Real-time	1 CVE	Proxy + optional save
NVD Bulk Sync	Manual	On-demand	All CVEs in DB	Batch update metadata
Atlas Action Plans	Cache refresh	Background	Per-host plan data	Cache in `atlas_action_plans_cache`
CARD Operations	User action	Real-time	1 asset at a time	Proxy (confirm/decline/redirect)
Compliance xlsx	Manual upload	Weekly	1 file → hundreds of rows	Python parse → PG upsert (transactional)

What Moves to Collector vs Stays in API

Operation	Collector	API Server	Rationale
Ivanti findings sync (all pages)	✅		Heavy, multi-page, post-processing
Ivanti workflows sync	✅		Scheduled background task
Ivanti closed sweep	✅		Part of findings sync pipeline
Archive detection	✅		CPU-intensive comparison
BU drift checker	✅		Makes additional API calls
Anomaly computation	✅		Depends on sync completion
Jira bulk sync-all	✅		Consumes rate budget, multi-issue
NVD bulk sync	✅		Multi-CVE, rate-limited
Atlas cache refresh	✅		Background, per-host API calls
Compliance xlsx parse	✅		Spawns Python, heavy DB writes
Single Jira lookup		✅	User-initiated, real-time, 1 call
Single NVD lookup		✅	User-initiated, real-time, 1 call
CARD operations		✅	User-initiated, real-time
All GET /api/* reads		✅	Pure DB queries, user-facing
Notes/overrides/queue		✅	Small writes, user-facing
File uploads		✅	User-initiated, disk I/O

Sync Pipeline Detail (becomes collector's core loop)

┌──────────────────────────────────────────────────────────────────┐
│                    Collector Sync Pipeline                        │
│                                                                  │
│  ┌────────────────┐                                              │
│  │ 1. Fetch Open  │ ← Ivanti API (paginated, 100/page)          │
│  │    Findings    │                                              │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 2. Extract &   │ ← Transform raw API → normalized rows       │
│  │    Transform   │                                              │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 3. Upsert to   │ ← Batch INSERT ON CONFLICT (100/batch)      │
│  │    PG          │   Preserves notes + overrides                │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 4. Archive     │ ← Compare previous IDs vs current IDs       │
│  │    Detection   │   Detect disappeared + returned findings     │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 5. Fetch Closed│ ← Ivanti API (all closed pages)             │
│  │    Findings    │   Upsert as state='closed'                   │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 6. BU Drift    │ ← Re-query Ivanti for disappeared IDs       │
│  │    Checker     │   Classify: BU reassign / severity / decom   │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 7. FP Workflow │ ← Sweep closed findings for FP# tickets     │
│  │    Counts      │   Aggregate by state                         │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 8. Anomaly     │ ← Compute deltas, write to anomaly_log      │
│  │    Summary     │                                              │
│  └───────┬────────┘                                              │
│          │                                                       │
│  ┌───────▼────────┐                                              │
│  │ 9. Update      │ ← sync_state status='success'               │
│  │    Sync State  │   Notify API server: pg_notify('sync_done')  │
│  └────────────────┘                                              │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Timeline Summary

Phase	Timeframe	Key Outcome	Required For
0	Weeks 1–2	Non-blocking sync, pool increase	Immediate UX fix
1	Weeks 3–4	Collector extracted, fault isolation	Multi-team onboarding
2	Weeks 5–8	Multi-tenancy, rate budgeting, retries	15 teams / 100+ users
3	Weeks 9–12	New data sources (CS/Qualys/Tanium)	Full vuln coverage
4	Weeks 13+	Horizontal scaling, load balancing	300+ users (if needed)

Phases 0–2 are recommended regardless of company-wide rollout. Phase 3 depends on data source priority decisions. Phase 4 is contingent on actual adoption numbers.

Next Steps

Review this document and provide input on Decision Points
Approve Phase 0 for immediate implementation
Schedule Phase 1 kickoff once Phase 0 is validated in staging
Submit firewall requests for CrowdStrike/Qualys/Tanium access to CT107 (long lead time)

32 KiB Raw Permalink Blame History Unescape Escape

Split Architecture Proposal: Collector + Indexer

Executive Summary

Table of Contents

Current Architecture

Key Metrics (Current)

Problem Statement

1. Sync Blocks the API Server

2. Single Point of Failure

3. Connection Pool Exhaustion

4. Rate Limits Shared Across Functions

5. No Horizontal Scaling Path

6. Firewall Constraint

Proposed Architecture

Target State

Component Responsibilities

API Server (cve-api.service)

Collector (cve-collector.service)

Communication Pattern

Phase Plan

Phase 0: Immediate Improvements (Week 1–2)

Phase 1: Extract Collector (Weeks 3–4)

Phase 2: Multi-Tenancy & Scale Hardening (Weeks 5–8)

Phase 3: Additional Data Sources (Weeks 9–12)

Phase 4: Horizontal Scaling (Weeks 13+)

Infrastructure Requirements

CT107 Resource Allocation (Current → Phase 2)

PostgreSQL Tuning (for 15 teams / hundreds of users)

Firewall Dependencies

Risk Assessment

Decision Points

Appendix: Current Data Flow Analysis

Data Collection Patterns

What Moves to Collector vs Stays in API

Sync Pipeline Detail (becomes collector's core loop)

Timeline Summary

Next Steps

32 KiB

Raw Permalink Blame History

API Server (`cve-api.service`)

Collector (`cve-collector.service`)