- Add SECURITY.md policy with credential management, Docker security, SSL/TLS guidance - Add security audit report (2025-12-20) with 31 findings across 4 severity levels - Add pre-deployment security checklist template - Update CLAUDE_STATUS.md with security audit initiative - Expand services/README.md with comprehensive security sections - Add script validation report and container name fix guide Audit identified 6 CRITICAL, 3 HIGH, 2 MEDIUM findings 4-phase remediation roadmap created (estimated 6-13 min downtime) All security scripts validated and ready for execution Related: Security Audit Q4 2025, CRITICAL-001 through CRITICAL-006 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
30 KiB
Homelab Infrastructure Status
Last Updated: 2025-12-18 17:00:00 Export Reference: disaster-recovery/homelab-export-20251211-144345
Current Infrastructure Snapshot
Proxmox Environment
- Node: serviceslab
- Version: Proxmox VE 8.4.0
- Management IP: 192.168.2.200
- Architecture: Single-node cluster
- Total Resources: 9 VMs, 2 Templates, 5 LXC Containers
Virtual Machines (QEMU/KVM) - 9 VMs
| VM ID | Name | IP Address | Status | Purpose |
|---|---|---|---|---|
| 100 | docker-hub | 192.168.2.XXX | Running | Container registry/Docker hub mirror |
| 101 | monitoring-docker | 192.168.2.114 | Running | Monitoring stack (Grafana/Prometheus/PVE Exporter) |
| 105 | dev | - | Stopped | General-purpose development workstation |
| 106 | Ansible-Control | 192.168.2.XXX | Running | IaC orchestration, configuration management |
| 108 | CML | - | Stopped | Cisco Modeling Labs - network simulation |
| 109 | web-server-01 | 192.168.2.XXX | Running | Web application server (clustered) |
| 110 | web-server-02 | 192.168.2.XXX | Running | Load-balanced pair with web-server-01 |
| 111 | db-server-01 | 192.168.2.XXX | Running | Backend database server |
| 114 | haos | 192.168.2.XXX | Running | Home Assistant OS - smart home automation platform |
Recent Changes:
- Added VM 101 (monitoring-docker) for dedicated monitoring infrastructure
- Removed VM 101 (gitlab) - service decommissioned
VM Templates - 2 Templates
| Template ID | Name | Purpose |
|---|---|---|
| 104 | ubuntu-dev | Ubuntu development environment template for cloning |
| 107 | ubuntu-docker | Ubuntu Docker host template for rapid deployment |
Note: Templates are immutable base images used for cloning new VMs, not running workloads. They provide standardized configurations for consistent infrastructure provisioning.
Containers (LXC) - 5 Containers
| CT ID | Name | IP Address | Status | Purpose |
|---|---|---|---|---|
| 102 | nginx | 192.168.2.101 | Running | Reverse proxy/load balancer & NPM |
| 103 | netbox | 192.168.2.XXX | Running | Network documentation/IPAM |
| 112 | twingate-connector | 192.168.2.XXX | Running | Zero-trust network access connector |
| 113 | n8n | 192.168.2.107 | Running | Workflow automation platform |
| 115 | tinyauth | 192.168.2.10 | Running | SSO authentication layer for NetBox |
Recent Changes:
- Added CT 115 (tinyauth) for SSO authentication integration with NetBox
- Added CT 112 (twingate-connector) for zero-trust network security
- Added CT 113 (n8n) for workflow automation
- Removed CT 112 (Anytype) - replaced by n8n
Storage Architecture
| Storage Pool | Type | Total | Used | % Used | Purpose |
|---|---|---|---|---|---|
| local | Directory | - | - | 19.11% | System files, ISOs, templates |
| local-lvm | LVM-Thin | - | - | 0.01% | VM disk images (thin provisioned) |
| Vault | NFS/Directory | - | - | 12.13% | Secure storage for sensitive data |
| PBS-Backups | PBS | - | - | 28.27% | Automated backup repository |
| iso-share | NFS/CIFS | - | - | 1.45% | Installation media library |
| localnetwork | Network Share | - | - | N/A | Shared resources across infrastructure |
Capacity Notes:
- PBS-Backups utilization increased to 28.27% (healthy retention)
- Vault utilization increased to 12.13% (data growth monitored)
- local storage at 19.11% (system overhead within normal range)
Key Services & Stacks
Monitoring & Observability (NEW)
VM 101 - monitoring-docker (192.168.2.114)
- Grafana: Port 3000 - Visualization and dashboards
- Prometheus: Port 9090 - Metrics collection and time-series database
- PVE Exporter: Port 9221 - Proxmox VE metrics exporter
- Documentation:
/home/jramos/homelab/monitoring/README.md - Status: Fully operational
Network Security (NEW)
CT 112 - twingate-connector
- Purpose: Zero-trust network access
- Type: Lightweight connector
- Status: Running
- Integration: Connects homelab to Twingate network
Automation & Integration
CT 113 - n8n (192.168.2.107)
- Purpose: Workflow automation platform
- Technology: n8n.io
- Database: PostgreSQL 15+
- Features: API integration, scheduled workflows, webhook triggers
- Documentation:
/home/jramos/homelab/services/README.md#n8n-workflow-automation - Status: Operational (resolved database locale issues)
Authentication & SSO
CT 115 - tinyauth (192.168.2.10)
- Purpose: Lightweight SSO authentication layer
- Technology: TinyAuth v4 (Docker container)
- Port: 8000
- Domain: tinyauth.apophisnetworking.net
- Integration: Authentication gateway for NetBox via Nginx Proxy Manager
- Security: Bcrypt-hashed credentials, HTTPS enforcement
- Documentation:
/home/jramos/homelab/services/tinyauth/README.md - Status: Operational
Infrastructure Documentation
CT 103 - netbox
- Purpose: Network documentation and IPAM
- Status: Stopped (on-demand use)
- Function: Infrastructure source of truth
Reverse Proxy & Load Balancing
CT 102 - nginx (192.168.2.101)
- Purpose: Nginx Proxy Manager
- Ports: 80, 81, 443
- Function: SSL termination, reverse proxy, certificate management
- Upstream Services: All web-facing applications
Three-Tier Application Stack
Web Tier:
- VM 109 (web-server-01) - Primary web server
- VM 110 (web-server-02) - Load-balanced pair
Database Tier:
- VM 111 (db-server-01) - Backend database
Proxy Tier:
- CT 102 (nginx) - Load balancer and SSL termination
Development & Automation
VM 106 - Ansible-Control
- Purpose: Infrastructure as Code orchestration
- Tools: Ansible, Terraform/OpenTofu (potential)
- Status: Running
Container Registry
VM 100 - docker-hub
- Purpose: Local Docker registry and hub mirror
- Function: Caching container images for faster deployments
- Status: Running
Network Simulation
VM 108 - CML
- Purpose: Cisco Modeling Labs
- Function: Network topology testing and simulation
- Status: Stopped (resource-intensive, on-demand use)
Architecture Patterns
Monitoring & Observability (NEW)
The infrastructure now implements a comprehensive monitoring stack following industry best practices:
- Metrics Collection: Prometheus scraping Proxmox metrics via PVE Exporter
- Visualization: Grafana providing real-time dashboards and alerting
- Isolation: Dedicated VM for monitoring services (fault isolation)
- Integration: Ready for AlertManager, additional exporters, and integrations
Design Decision: VM-based deployment provides kernel-level isolation and prevents resource contention with critical infrastructure services.
Zero-Trust Security (NEW)
Implementation of zero-trust network access principles:
- Twingate Connector: Lightweight connector providing secure access without VPNs
- Container Deployment: LXC container for minimal resource overhead
- Network Segmentation: Secure access to homelab from external networks
Design Decision: LXC container chosen for quick provisioning and low resource consumption.
Automation-First Approach
Workflow automation and infrastructure orchestration:
- n8n Platform: Visual workflow builder for API integrations
- Scheduled Tasks: Automated backup checks, monitoring alerts, reports
- Integration Hub: Connects monitoring, documentation, and operational tools
Design Decision: PostgreSQL backend ensures data persistence and supports complex workflows.
Tiered Application Architecture
Classic three-tier design for production-like environments:
- Presentation Tier: Paired web servers (109, 110) behind load balancer
- Business Logic: Application processing on web tier
- Data Tier: Dedicated database server (111) with backup strategy
Design Decision: Separation of concerns, scalability testing, high availability patterns.
Selective Containerization Strategy
Hybrid approach balancing performance and resource efficiency:
- LXC Containers: Stateless services (nginx, netbox, twingate, n8n)
- Full VMs: Complex applications, kernel dependencies, heavy workloads
- Rationale: LXC for ~10x lower overhead, VMs for isolation and compatibility
Recent Infrastructure Changes
2025-12-20: Comprehensive Security Audit Completed
Activity: Complete infrastructure security assessment and remediation planning
Audit Scope:
- All Docker Compose services (Portainer, NPM, Paperless-ngx, ByteStash, Speedtest Tracker, FileBrowser)
- Proxmox VE infrastructure and API access
- Network security and segmentation
- Credential management and storage
- SSL/TLS configuration
- Container security and runtime configuration
Findings Summary:
- CRITICAL (6): Docker socket exposure, hardcoded credentials, database passwords in git
- HIGH (3): Missing SSL/TLS, weak passwords, containers running as root
- MEDIUM (2): SSL verification disabled, missing authentication
- LOW (20): Documentation gaps, monitoring improvements, backup encryption
Deliverables:
- Security Policy (
SECURITY.md): 864 lines - Comprehensive security best practices - Audit Report (
troubleshooting/SECURITY_AUDIT_2025-12-20.md): 2,350 lines - Detailed findings and remediation plan - Security Checklist (
templates/SECURITY_CHECKLIST.md): 750 lines - Pre-deployment validation template - Validation Report (
scripts/security/VALIDATION_REPORT.md): 2,092 lines - Script safety assessment - Container Fixes (
scripts/security/CONTAINER_NAME_FIXES.md): 621 lines - Container name verification - Security Scripts (8 total):
verify-service-status.sh- Service health checkerbackup-before-remediation.sh- Comprehensive backup utilityrotate-pve-credentials.sh- Proxmox credential rotationrotate-paperless-password.sh- Database password rotationrotate-bytestash-jwt.sh- JWT secret rotationrotate-logward-credentials.sh- Multi-service credential rotationdocker-socket-proxy/docker-compose.yml- Security proxy deploymentportainer/docker-compose.socket-proxy.yml- Portainer migration config
Script Validation:
- Ready for execution: 5/8 scripts (verify-service-status.sh, rotate-pve-credentials.sh, rotate-bytestash-jwt.sh, backup-before-remediation.sh, docker-socket-proxy)
- Needs container name fixes: 3/8 scripts (see CONTAINER_NAME_FIXES.md)
4-Phase Remediation Roadmap:
- Phase 1 (Week 1): Immediate actions - Backups, secrets migration
- Phase 2 (Weeks 2-3): Low-risk changes - Socket proxy, credential rotation
- Phase 3 (Month 2): High-risk changes - Service migrations, SSL/TLS
- Phase 4 (Quarter 1): Infrastructure - Network segmentation, scanning pipelines
Estimated Timeline:
- Total downtime: 6-13 minutes (sequential script execution)
- Full remediation: 8-16 weeks
Risk Assessment:
- Current risk: HIGH - Multiple CRITICAL vulnerabilities active
- Post-Phase 1 risk: MEDIUM - Credential exposure mitigated
- Post-Phase 3 risk: LOW - All CRITICAL/HIGH findings remediated
- Post-Phase 4 risk: VERY LOW - Defense-in-depth implemented
Status: Documentation complete, awaiting remediation execution approval
2025-12-18: TinyAuth SSO Deployment
Service Deployed: CT 115 - TinyAuth authentication layer
Purpose: Centralized SSO authentication for NetBox and future homelab services
Specifications:
- Container: CT 115 (LXC with Docker)
- IP Address: 192.168.2.10
- Domain: tinyauth.apophisnetworking.net
- Port: 8000 (external), 3000 (internal)
- Docker Image: ghcr.io/steveiliop56/tinyauth:v4
- Resource Usage: ~50-100 MB memory, <1% CPU
Integration Architecture:
- Internet → Nginx Proxy Manager (CT 102) → TinyAuth (CT 115) → NetBox (CT 103)
- NPM uses
auth_requestdirective to validate credentials via TinyAuth - Bcrypt-hashed password storage for security
- HTTPS enforcement via NPM SSL termination
Issues Resolved During Deployment:
- 500 Internal Server Error: Fixed Nginx advanced config syntax
- IP addresses not allowed: Changed APP_URL from IP to domain
- Port mapping: Corrected Docker port mapping from 8000:8000 to 8000:3000
- Invalid password: Implemented bcrypt hash requirement for TinyAuth v4
Integration Impact:
- NetBox now protected by centralized authentication
- Foundation for extending SSO to other services (Grafana, Proxmox UI future candidates)
- Authentication logs available for security auditing
Documentation: Complete guide at /home/jramos/homelab/services/tinyauth/README.md
Status: ✅ Operational - Successfully authenticating NetBox access
2025-12-11: Loki-Stack Monitoring Fully Operational
Issue Resolved: Centralized logging pipeline now receiving syslog from UniFi router
Root Cause: rsyslog filter in /etc/rsyslog.d/unifi-router.conf was configured for wrong source IP (192.168.1.1 instead of 192.168.2.1)
Fix Applied: Updated rsyslog filter to match VLAN 2 gateway IP (192.168.2.1)
Status: ✅ Complete - Logs flowing UniFi → rsyslog → Promtail → Loki → Grafana
Services Affected:
- VM 101 (monitoring-docker): rsyslog configuration updated
- Loki-stack: All components operational
- Grafana: Dashboards receiving real-time syslog data
Technical Details: See troubleshooting/loki-stack-bugfix.md for complete 5-phase troubleshooting history
2025-12-11: Infrastructure Expansion & System Updates
Proxmox VE Platform Upgrade
- Upgraded: Proxmox VE 8.3.3 → 8.4.0
- Kernel: 6.8.12-8-pve
- pve-manager: 8.4.14
- Impact: Enhanced performance, security updates, bug fixes
- Status: ✅ Complete - All VMs and containers operating normally
New VM 114: Home Assistant OS Deployment
- Service: haos (Home Assistant Operating System)
- Purpose: Smart home automation and integration platform
- Specifications:
- Memory: 4 GB (87% utilized)
- CPU: 2 vCPUs
- Boot Disk: 50 GB
- Status: Running (~3 days uptime)
- Rationale: Centralized home automation hub for IoT device management
- Integration: Will integrate with monitoring stack for infrastructure metrics
CT 103: NetBox IPAM Activated
- Service: netbox (Network Documentation & IPAM)
- Status Change: Stopped → Running
- Uptime: ~3.1 days
- Resource Usage: 1.28 GB / 2 GB memory (64%)
- Purpose: Active network documentation and IP address management
- Rationale: Required for ongoing infrastructure expansion planning
Storage Utilization Trends
- PBS-Backups: 27.43% → 28.27% (+0.84%) - Normal backup retention growth
- Vault (ZFS): 10.88% → 12.13% (+1.25%) - Data accumulation monitored
- local: 15.13% → 19.11% (+3.98%) - New VM deployment and system updates
- iso-share: 1.4% → 1.45% (+0.05%) - Minimal change
- local-lvm: 0.0% → 0.01% (+0.01%) - Thin provisioned storage baseline
2025-12-07: Infrastructure Documentation & Monitoring Stack
Additions
-
VM 101 (monitoring-docker): New dedicated monitoring infrastructure
- Grafana for visualization
- Prometheus for metrics collection
- PVE Exporter for Proxmox integration
- IP: 192.168.2.114
-
CT 112 (twingate-connector): Zero-trust network security
- Lightweight connector
- Secure remote access without VPN
-
CT 113 (n8n): Workflow automation platform
- PostgreSQL 15+ backend
- IP: 192.168.2.107
- Resolved database locale issues
Modifications
- Storage utilization updated across all pools
- PBS-Backups now at 27.43% (increased retention)
- Vault optimized to 10.88% (reduced usage)
Removals
- VM 101 (gitlab): Decommissioned (previously at this ID)
- CT 112 (Anytype): Replaced by n8n for better integration
Documentation Updates
- Created comprehensive monitoring stack documentation
- Updated all infrastructure tables with current VMs/CTs
- Added architecture patterns for observability and zero-trust
- Updated storage statistics
- Referenced latest export: disaster-recovery/homelab-export-20251207-120040
Repository Structure
homelab/
monitoring/ # NEW: Monitoring stack configurations
README.md # Comprehensive monitoring documentation
grafana/
docker-compose.yml
prometheus/
docker-compose.yml
prometheus.yml
pve-exporter/
docker-compose.yml
pve.yml
.env
services/ # Docker Compose service configurations
n8n/ # n8n workflow automation
netbox/ # Network documentation & IPAM
README.md # Services overview (updated)
disaster-recovery/
homelab-export-20251207-120040/ # Latest infrastructure export
scripts/
crawlers-exporters/ # Infrastructure collection scripts
fixers/ # Problem-solving scripts
qol/ # Quality of life improvements
CLAUDE.md # AI assistant guidance (updated)
INDEX.md # Navigation index (updated)
README.md # Repository overview (updated)
CLAUDE_STATUS.md # This file - current infrastructure status
Security Status
Latest Audit: 2025-12-20 Total Findings: 31 (6 CRITICAL, 3 HIGH, 2 MEDIUM, 20 LOW) Remediation Status: Planning Phase - Documentation Complete
Critical Vulnerabilities:
- Docker socket exposure (3 containers)
- Proxmox credentials in plaintext
- Database passwords in git repository
- Missing SSL/TLS for internal services
- Weak/default passwords across services
- Containers running as root
Documentation:
- Security Policy:
/home/jramos/homelab/SECURITY.md - Audit Report:
/home/jramos/homelab/troubleshooting/SECURITY_AUDIT_2025-12-20.md - Security Checklist:
/home/jramos/homelab/templates/SECURITY_CHECKLIST.md - Script Validation:
/home/jramos/homelab/scripts/security/VALIDATION_REPORT.md
Current Initiative: Security Audit Remediation - Q4 2025
Goal
Remediate 31 security findings identified in comprehensive security audit (2025-12-20), addressing critical vulnerabilities in Docker socket exposure, credential management, and SSL/TLS configuration.
Phase
Planning - Documentation Complete, Remediation Pending
Progress Checklist
Phase 1: Immediate Actions (Week 1) - Est. 30 min downtime
- Complete security audit (31 findings documented)
- Create remediation scripts (8 scripts validated)
- Document security baseline in SECURITY.md
- Backup all service configurations (
backup-before-remediation.sh) - Migrate secrets to .env files (ByteStash, Paperless-ngx, Speedtest Tracker)
Phase 2: Low-Risk Changes (Weeks 2-3) - Est. 2-4 hours downtime
- Deploy docker-socket-proxy
- Rotate Proxmox API credentials (
rotate-pve-credentials.sh) - Rotate database passwords (
rotate-paperless-password.sh) - Rotate JWT secrets (
rotate-bytestash-jwt.sh)
Phase 3: High-Risk Changes (Month 2) - Est. 4-8 hours downtime
- Migrate Portainer to socket proxy
- Migrate NPM to socket proxy or remove socket access
- Remove socket mounts from Speedtest Tracker
- Implement SSL/TLS for internal services
- Enable container user namespacing
Phase 4: Infrastructure Improvements (Quarter 1) - Est. 8-16 hours
- Implement network segmentation (VLANs for service tiers)
- Deploy fail2ban for rate limiting
- Enable backup encryption (PBS configuration)
- Container vulnerability scanning pipeline
- Automated credential rotation system
Context
Security audit revealed critical infrastructure vulnerabilities requiring systematic remediation. Priority on CRITICAL findings (CVSS 8.5-9.8) to reduce attack surface and prevent credential compromise.
Risk Management:
- Phase 1: Zero downtime (configuration changes only)
- Phase 2: Minimal downtime (credential rotation, proxy deployment)
- Phase 3: Moderate downtime (service reconfiguration)
- Phase 4: Planned maintenance windows (infrastructure changes)
Success Metrics:
- All CRITICAL findings remediated (6/6)
- All HIGH findings remediated (3/3)
- Secrets removed from git repository
- Docker socket access eliminated or proxied
- SSL/TLS enabled for all external services
Previous Initiative: Claude Code Tool Inheritance Bug Investigation (2025-12-18)
Goal
Investigate and document a critical bug in Claude Code CLI where sub-agents with explicit tools: declarations receive only a subset of their configured tools, with first and last array elements consistently dropped.
Phase
COMPLETED - Bug confirmed, comprehensive report generated for Anthropic
Progress Checklist
- Reproduce bug with scribe agent (confirmed: missing Read and Write)
- Reproduce bug with lab-operator agent (confirmed: missing Bash and Write)
- Test backend-builder agent (working correctly - exception to pattern)
- Test librarian agent (working correctly - no tools: declaration)
- Identify pattern: First and last tools dropped for agents with explicit tools: arrays
- Document impact: Scribe cannot create docs, lab-operator cannot execute commands
- Generate comprehensive bug report for Anthropic with all evidence
- Update CLAUDE_STATUS.md with investigation status
- Submit bug report to Anthropic via GitHub issues
Key Findings
Bug Pattern: Sub-agents with tools: [A, B, C, D, E] receive only [B, C, D] at runtime
Affected: scribe (no Read/Write), lab-operator (no Bash/Write)
Unaffected: backend-builder (exception), librarian (no tools: line)
Workaround: Remove tools: declarations to grant all tools by default
Artifacts:
- Bug report:
/home/jramos/homelab/troubleshooting/ANTHROPIC_BUG_REPORT_TOOL_INHERITANCE.md - Original report:
/home/jramos/homelab/troubleshooting/BUG_REPORT.md - Test agent IDs: scribe=a32bd54, lab-operator=ad681e8, backend-builder=aba15f6, librarian=a4cfeb7
Context
Critical workflow disruption: Documentation and infrastructure operations workflows completely broken due to missing tools. This is a Claude Code CLI internal bug, not a user configuration issue.
Previous Initiative: Sub-Agent Architecture Optimization (2025-12-07)
Goal
Improve the quality and effectiveness of all sub-agent prompt definitions to match best practices identified through comprehensive Opus-powered prompt engineering analysis. Target: bring all sub-agents to the quality standard established by librarian.md (~120-340 lines with comprehensive examples, safety protocols, and decision frameworks).
Phase
COMPLETED - All sub-agent improvements and validations finished
Progress Checklist
- Prompt engineering analysis completed (Opus model)
- Analyzed CLAUDE.md and all 4 sub-agent files
- Identified 5 critical issues, 12 high-impact improvements
- Generated comprehensive improvement recommendations
- scribe.md improved (29 340 lines)
- Added 6 usage examples (4 positive, 2 negative redirects)
- Implemented comprehensive responsibilities section
- Added 3 complete ASCII diagram templates
- Included safety protocols and decision frameworks
- Quality now matches librarian.md standard
- backend-builder.md improved (40 291 lines)
- Added 6 usage examples with clear boundaries
- Expanded core responsibilities with Ansible, Terraform, Docker Compose, Python, Shell
- Added technology stack table and validation rules table
- Included safety protocols for secrets and destructive operations
- Added handoff protocol for lab-operator deployment
- Defined clear boundaries (CREATES code, does NOT deploy)
- lab-operator.md improved (37 193 lines)
- Added 6 usage examples with role clarity
- Expanded domain expertise with specific commands
- Added command style guide (5-step pattern)
- Included safety protocols and decision-making framework
- Added error handling and escalation guidelines
- Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC)
- CLAUDE.md structural fixes
- Moved YAML frontmatter to line 1 (was at line 89)
- Fixed trailing pipe character on line 87
- Completed incomplete sentence about backup strategy
- Completed incomplete sentence about storage growth
- Removed redundant "Key Services" reference
- Expanded status file template with actual structure and recovery instructions
- Final validation and testing
- librarian: Git status check successful, clear output format
- scribe: File reading functional (note: reported encoding issue, likely false positive)
- backend-builder: YAML validation successful, proper syntax checking
- lab-operator: Directory listing successful, proper command execution
- All agents demonstrate improved structure and clarity
Context
Why It Matters: Well-designed sub-agent prompts improve task routing accuracy, execution quality, error reduction, and maintainability. The librarian.md agent (143 lines) sets the quality standard; scribe was severely underdeveloped at 29 lines before improvement.
Next Steps: Improve backend-builder.md and lab-operator.md using scribe.md as quality template.
Previous Phase: Infrastructure Documentation Complete
Goal
Comprehensive documentation of monitoring stack and updated infrastructure inventory.
Phase
Documentation & Maintenance
Completed Tasks
- Created
/home/jramos/homelab/monitoring/README.mdwith comprehensive monitoring documentation - Updated
CLAUDE_STATUS.mdwith current infrastructure state - Documented 8 VMs, 2 Templates, and 4 LXC containers
- Updated storage statistics (PBS 27.43%, Vault 10.88%, local 15.13%)
- Added monitoring stack architecture and deployment procedures
- Documented new services: monitoring-docker, twingate-connector, n8n
- Referenced latest export: disaster-recovery/homelab-export-20251207-120040
Remaining Documentation Tasks
- Update INDEX.md with monitoring section and current VM/CT counts
- Update README.md with infrastructure (8 VMs, 2 Templates, 4 LXC)
- Update CLAUDE.md with architecture tables for monitoring and zero-trust
- Update services/README.md with monitoring stack and twingate sections
- Verify all documentation cross-references are accurate
- Test monitoring stack deployment procedures
Access Information
Management Interfaces
- Proxmox UI: https://192.168.2.200:8006
- Grafana: http://192.168.2.114:3000
- Prometheus: http://192.168.2.114:9090
- Nginx Proxy Manager: http://192.168.2.101:81
- n8n: http://192.168.2.107:5678
- TinyAuth: https://tinyauth.apophisnetworking.net (internal: http://192.168.2.10:8000)
Key Network Segments
- Management Network: 192.168.2.0/24
- Proxmox Host: 192.168.2.200
- Reverse Proxy: 192.168.2.101 (CT 102)
- TinyAuth: 192.168.2.10 (CT 115)
- n8n: 192.168.2.107 (CT 113)
- Monitoring: 192.168.2.114 (VM 101)
Maintenance Schedule
Automated Tasks
- Backups: Proxmox Backup Server - Daily incremental, Weekly full
- Monitoring Scrapes: Prometheus - Every 30 seconds
- Certificate Renewal: Nginx Proxy Manager - Automatic via Let's Encrypt
Recommended Manual Tasks
- Weekly: Review Grafana dashboards for anomalies
- Monthly: Update monitoring stack Docker images
- Quarterly: Review backup retention policies
- Semi-Annual: Kernel updates on Proxmox host and VMs
Known Issues & Resolutions
Resolved
- n8n PostgreSQL locale errors (fixed with
fix_n8n_db_c_locale.sh) - n8n database permissions (fixed with
fix_n8n_db_permissions.sh)
Active Security Vulnerabilities (2025-12-20 Audit)
CRITICAL Severity:
-
Docker Socket Exposure (CVSS 9.8)
- Affected: Portainer, Nginx Proxy Manager, Speedtest Tracker
- Impact: Container escape to root access
- Remediation: Deploy docker-socket-proxy (Phase 2)
-
Proxmox Credentials in Plaintext (CVSS 9.1)
- Affected: PVE Exporter
.envandpve.yml - Impact: Full infrastructure compromise
- Remediation: Rotate credentials, use API tokens (Phase 2)
- Affected: PVE Exporter
-
Database Passwords in Git (CVSS 8.5)
- Affected: Paperless-ngx, ByteStash, Speedtest Tracker
- Impact: Credential exposure to all repository users
- Remediation: Migrate to
.envfiles, scrub git history (Phase 1)
HIGH Severity: 4. Missing SSL/TLS (CVSS 7.5)
- Affected: Internal service communication
- Impact: Traffic interception, credential sniffing
- Remediation: Enable HTTPS via NPM or self-signed certs (Phase 3)
-
Weak/Default Passwords (CVSS 7.2)
- Affected: Multiple services
- Impact: Brute-force attacks, unauthorized access
- Remediation: Generate strong passwords, implement rotation (Phase 2)
-
Containers Running as Root (CVSS 7.0)
- Affected: Most Docker containers
- Impact: Privilege escalation if container compromised
- Remediation: Enable user namespacing, set non-root users (Phase 3)
Remediation Timeline: See "Security Audit Remediation - Q4 2025" initiative above
Active Monitoring
- PVE Exporter SSL verification (set to false for self-signed certificates) - SECURITY RISK
- Prometheus retention policies (currently 15 days, may need adjustment)
- Security script container names need verification (3/8 scripts)
Deferred
- NetBox container offline (on-demand service)
- Development VMs stopped (resource conservation)
- Network segmentation implementation (Phase 4)
- Backup encryption (Phase 4)
Version History
- v2.1.0 (2025-12-07): Added monitoring stack, twingate connector, updated infrastructure counts
- v2.0.0 (2025-12-02): Repository reorganization, services migration from GitLab
- v1.0.0 (2025-11-29): Initial infrastructure documentation
Maintained by: jramos Repository: Homelab Infrastructure Configuration Platform: Proxmox VE 8.4.0 Infrastructure Scale: 9 VMs, 2 Templates, 4 Containers Current Status: Operational - Home Automation Integration Deployed