Files
homelab/CLAUDE_STATUS.md
Jordan Ramos d3dc899b30 docs(infrastructure): correct VM/template counts and clarify resource types
Update infrastructure documentation across all files to accurately distinguish
between active VMs (8), templates (2), and LXC containers (4). Previously,
VM templates 104 (ubuntu-dev) and 107 (ubuntu-docker) were incorrectly counted
as active VMs, inflating the total VM count to 10.

Changes:
- CLAUDE.md: Update Quick Reference and Infrastructure Overview sections
- CLAUDE_STATUS.md: Add dedicated VM Templates section with explanatory note
- INDEX.md: Separate templates from active VMs in infrastructure inventory
- README.md: Add VM Templates section distinguishing from active VMs
- Claude_UPDATES.md: Update infrastructure counts in Quick Reference tables
- services/README.md: Correct footer infrastructure counts
- sub-agents/*.md: Update infrastructure context in all agent prompts

This ensures accurate resource tracking and clarifies that templates are
immutable base images for cloning, not running workloads.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-08 13:11:29 -07:00

411 lines
16 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Homelab Infrastructure Status
**Last Updated**: 2025-12-07 12:00:40
**Export Reference**: disaster-recovery/homelab-export-20251207-120040
## Current Infrastructure Snapshot
### Proxmox Environment
- **Node**: serviceslab
- **Version**: Proxmox VE 8.3.3
- **Management IP**: 192.168.2.200
- **Architecture**: Single-node cluster
- **Total Resources**: 8 VMs, 2 Templates, 4 LXC Containers
---
## Virtual Machines (QEMU/KVM) - 8 VMs
| VM ID | Name | IP Address | Status | Purpose |
|-------|------|------------|--------|---------|
| 100 | docker-hub | 192.168.2.XXX | Running | Container registry/Docker hub mirror |
| 101 | monitoring-docker | 192.168.2.114 | Running | Monitoring stack (Grafana/Prometheus/PVE Exporter) |
| 105 | dev | - | Stopped | General-purpose development workstation |
| 106 | Ansible-Control | 192.168.2.XXX | Running | IaC orchestration, configuration management |
| 108 | CML | - | Stopped | Cisco Modeling Labs - network simulation |
| 109 | web-server-01 | 192.168.2.XXX | Running | Web application server (clustered) |
| 110 | web-server-02 | 192.168.2.XXX | Running | Load-balanced pair with web-server-01 |
| 111 | db-server-01 | 192.168.2.XXX | Running | Backend database server |
**Recent Changes**:
- Added VM 101 (monitoring-docker) for dedicated monitoring infrastructure
- Removed VM 101 (gitlab) - service decommissioned
---
## VM Templates - 2 Templates
| Template ID | Name | Purpose |
|-------------|------|---------|
| 104 | ubuntu-dev | Ubuntu development environment template for cloning |
| 107 | ubuntu-docker | Ubuntu Docker host template for rapid deployment |
**Note**: Templates are immutable base images used for cloning new VMs, not running workloads. They provide standardized configurations for consistent infrastructure provisioning.
---
## Containers (LXC) - 4 Containers
| CT ID | Name | IP Address | Status | Purpose |
|-------|------|------------|--------|---------|
| 102 | nginx | 192.168.2.101 | Running | Reverse proxy/load balancer & NPM |
| 103 | netbox | 192.168.2.XXX | Stopped | Network documentation/IPAM |
| 112 | twingate-connector | 192.168.2.XXX | Running | Zero-trust network access connector |
| 113 | n8n | 192.168.2.107 | Running | Workflow automation platform |
**Recent Changes**:
- Added CT 112 (twingate-connector) for zero-trust network security
- Added CT 113 (n8n) for workflow automation
- Removed CT 112 (Anytype) - replaced by n8n
---
## Storage Architecture
| Storage Pool | Type | Total | Used | % Used | Purpose |
|--------------|------|-------|------|--------|---------|
| local | Directory | - | - | 15.13% | System files, ISOs, templates |
| local-lvm | LVM-Thin | - | - | 0.0% | VM disk images (thin provisioned) |
| Vault | NFS/Directory | - | - | 10.88% | Secure storage for sensitive data |
| PBS-Backups | PBS | - | - | 27.43% | Automated backup repository |
| iso-share | NFS/CIFS | - | - | 1.4% | Installation media library |
| localnetwork | Network Share | - | - | N/A | Shared resources across infrastructure |
**Capacity Notes**:
- PBS-Backups utilization increased to 27.43% (healthy retention)
- Vault utilization decreased to 10.88% (space optimization)
- local storage at 15.13% (system overhead normal)
---
## Key Services & Stacks
### Monitoring & Observability (NEW)
**VM 101** - monitoring-docker (192.168.2.114)
- **Grafana**: Port 3000 - Visualization and dashboards
- **Prometheus**: Port 9090 - Metrics collection and time-series database
- **PVE Exporter**: Port 9221 - Proxmox VE metrics exporter
- **Documentation**: `/home/jramos/homelab/monitoring/README.md`
- **Status**: Fully operational
### Network Security (NEW)
**CT 112** - twingate-connector
- **Purpose**: Zero-trust network access
- **Type**: Lightweight connector
- **Status**: Running
- **Integration**: Connects homelab to Twingate network
### Automation & Integration
**CT 113** - n8n (192.168.2.107)
- **Purpose**: Workflow automation platform
- **Technology**: n8n.io
- **Database**: PostgreSQL 15+
- **Features**: API integration, scheduled workflows, webhook triggers
- **Documentation**: `/home/jramos/homelab/services/README.md#n8n-workflow-automation`
- **Status**: Operational (resolved database locale issues)
### Infrastructure Documentation
**CT 103** - netbox
- **Purpose**: Network documentation and IPAM
- **Status**: Stopped (on-demand use)
- **Function**: Infrastructure source of truth
### Reverse Proxy & Load Balancing
**CT 102** - nginx (192.168.2.101)
- **Purpose**: Nginx Proxy Manager
- **Ports**: 80, 81, 443
- **Function**: SSL termination, reverse proxy, certificate management
- **Upstream Services**: All web-facing applications
### Three-Tier Application Stack
**Web Tier**:
- VM 109 (web-server-01) - Primary web server
- VM 110 (web-server-02) - Load-balanced pair
**Database Tier**:
- VM 111 (db-server-01) - Backend database
**Proxy Tier**:
- CT 102 (nginx) - Load balancer and SSL termination
### Development & Automation
**VM 106** - Ansible-Control
- **Purpose**: Infrastructure as Code orchestration
- **Tools**: Ansible, Terraform/OpenTofu (potential)
- **Status**: Running
### Container Registry
**VM 100** - docker-hub
- **Purpose**: Local Docker registry and hub mirror
- **Function**: Caching container images for faster deployments
- **Status**: Running
### Network Simulation
**VM 108** - CML
- **Purpose**: Cisco Modeling Labs
- **Function**: Network topology testing and simulation
- **Status**: Stopped (resource-intensive, on-demand use)
---
## Architecture Patterns
### Monitoring & Observability (NEW)
The infrastructure now implements a comprehensive monitoring stack following industry best practices:
- **Metrics Collection**: Prometheus scraping Proxmox metrics via PVE Exporter
- **Visualization**: Grafana providing real-time dashboards and alerting
- **Isolation**: Dedicated VM for monitoring services (fault isolation)
- **Integration**: Ready for AlertManager, additional exporters, and integrations
**Design Decision**: VM-based deployment provides kernel-level isolation and prevents resource contention with critical infrastructure services.
### Zero-Trust Security (NEW)
Implementation of zero-trust network access principles:
- **Twingate Connector**: Lightweight connector providing secure access without VPNs
- **Container Deployment**: LXC container for minimal resource overhead
- **Network Segmentation**: Secure access to homelab from external networks
**Design Decision**: LXC container chosen for quick provisioning and low resource consumption.
### Automation-First Approach
Workflow automation and infrastructure orchestration:
- **n8n Platform**: Visual workflow builder for API integrations
- **Scheduled Tasks**: Automated backup checks, monitoring alerts, reports
- **Integration Hub**: Connects monitoring, documentation, and operational tools
**Design Decision**: PostgreSQL backend ensures data persistence and supports complex workflows.
### Tiered Application Architecture
Classic three-tier design for production-like environments:
- **Presentation Tier**: Paired web servers (109, 110) behind load balancer
- **Business Logic**: Application processing on web tier
- **Data Tier**: Dedicated database server (111) with backup strategy
**Design Decision**: Separation of concerns, scalability testing, high availability patterns.
### Selective Containerization Strategy
Hybrid approach balancing performance and resource efficiency:
- **LXC Containers**: Stateless services (nginx, netbox, twingate, n8n)
- **Full VMs**: Complex applications, kernel dependencies, heavy workloads
- **Rationale**: LXC for ~10x lower overhead, VMs for isolation and compatibility
---
## Recent Infrastructure Changes (2025-12-07)
### Additions
1. **VM 101 (monitoring-docker)**: New dedicated monitoring infrastructure
- Grafana for visualization
- Prometheus for metrics collection
- PVE Exporter for Proxmox integration
- IP: 192.168.2.114
2. **CT 112 (twingate-connector)**: Zero-trust network security
- Lightweight connector
- Secure remote access without VPN
3. **CT 113 (n8n)**: Workflow automation platform
- PostgreSQL 15+ backend
- IP: 192.168.2.107
- Resolved database locale issues
### Modifications
- Storage utilization updated across all pools
- PBS-Backups now at 27.43% (increased retention)
- Vault optimized to 10.88% (reduced usage)
### Removals
- **VM 101 (gitlab)**: Decommissioned (previously at this ID)
- **CT 112 (Anytype)**: Replaced by n8n for better integration
### Documentation Updates
- Created comprehensive monitoring stack documentation
- Updated all infrastructure tables with current VMs/CTs
- Added architecture patterns for observability and zero-trust
- Updated storage statistics
- Referenced latest export: disaster-recovery/homelab-export-20251207-120040
---
## Repository Structure
```
homelab/
 monitoring/ # NEW: Monitoring stack configurations
  README.md # Comprehensive monitoring documentation
  grafana/
   docker-compose.yml
  prometheus/
   docker-compose.yml
   prometheus.yml
  pve-exporter/
  docker-compose.yml
  pve.yml
  .env
 services/ # Docker Compose service configurations
  n8n/ # n8n workflow automation
  netbox/ # Network documentation & IPAM
  README.md # Services overview (updated)
 disaster-recovery/
  homelab-export-20251207-120040/ # Latest infrastructure export
 scripts/
  crawlers-exporters/ # Infrastructure collection scripts
  fixers/ # Problem-solving scripts
  qol/ # Quality of life improvements
 CLAUDE.md # AI assistant guidance (updated)
 INDEX.md # Navigation index (updated)
 README.md # Repository overview (updated)
 CLAUDE_STATUS.md # This file - current infrastructure status
```
---
## Current Initiative: Sub-Agent Architecture Optimization (2025-12-07)
### Goal
Improve the quality and effectiveness of all sub-agent prompt definitions to match best practices identified through comprehensive Opus-powered prompt engineering analysis. Target: bring all sub-agents to the quality standard established by librarian.md (~120-340 lines with comprehensive examples, safety protocols, and decision frameworks).
### Phase
✅ COMPLETED - All sub-agent improvements and validations finished
### Progress Checklist
- [x] Prompt engineering analysis completed (Opus model)
- Analyzed CLAUDE.md and all 4 sub-agent files
- Identified 5 critical issues, 12 high-impact improvements
- Generated comprehensive improvement recommendations
- [x] scribe.md improved (29→340 lines)
- Added 6 usage examples (4 positive, 2 negative redirects)
- Implemented comprehensive responsibilities section
- Added 3 complete ASCII diagram templates
- Included safety protocols and decision frameworks
- Quality now matches librarian.md standard
- [x] backend-builder.md improved (40→291 lines)
- Added 6 usage examples with clear boundaries
- Expanded core responsibilities with Ansible, Terraform, Docker Compose, Python, Shell
- Added technology stack table and validation rules table
- Included safety protocols for secrets and destructive operations
- Added handoff protocol for lab-operator deployment
- Defined clear boundaries (CREATES code, does NOT deploy)
- [x] lab-operator.md improved (37→193 lines)
- Added 6 usage examples with role clarity
- Expanded domain expertise with specific commands
- Added command style guide (5-step pattern)
- Included safety protocols and decision-making framework
- Added error handling and escalation guidelines
- Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC)
- [x] CLAUDE.md structural fixes
- Moved YAML frontmatter to line 1 (was at line 89)
- Fixed trailing pipe character on line 87
- Completed incomplete sentence about backup strategy
- Completed incomplete sentence about storage growth
- Removed redundant "Key Services" reference
- Expanded status file template with actual structure and recovery instructions
- [x] Final validation and testing
- librarian: ✅ Git status check successful, clear output format
- scribe: ✅ File reading functional (note: reported encoding issue, likely false positive)
- backend-builder: ✅ YAML validation successful, proper syntax checking
- lab-operator: ✅ Directory listing successful, proper command execution
- All agents demonstrate improved structure and clarity
### Context
**Why It Matters**: Well-designed sub-agent prompts improve task routing accuracy, execution quality, error reduction, and maintainability. The librarian.md agent (143 lines) sets the quality standard; scribe was severely underdeveloped at 29 lines before improvement.
**Next Steps**: Improve backend-builder.md and lab-operator.md using scribe.md as quality template.
---
## Previous Phase: Infrastructure Documentation Complete
### Goal
Comprehensive documentation of monitoring stack and updated infrastructure inventory.
### Phase
Documentation & Maintenance
### Completed Tasks
- [x] Created `/home/jramos/homelab/monitoring/README.md` with comprehensive monitoring documentation
- [x] Updated `CLAUDE_STATUS.md` with current infrastructure state
- [x] Documented 10 VMs and 4 LXC containers
- [x] Updated storage statistics (PBS 27.43%, Vault 10.88%, local 15.13%)
- [x] Added monitoring stack architecture and deployment procedures
- [x] Documented new services: monitoring-docker, twingate-connector, n8n
- [x] Referenced latest export: disaster-recovery/homelab-export-20251207-120040
### Remaining Documentation Tasks
- [ ] Update INDEX.md with monitoring section and current VM/CT counts
- [ ] Update README.md with all 10 VMs and 4 CTs
- [ ] Update CLAUDE.md with architecture tables for monitoring and zero-trust
- [ ] Update services/README.md with monitoring stack and twingate sections
- [ ] Verify all documentation cross-references are accurate
- [ ] Test monitoring stack deployment procedures
---
## Access Information
### Management Interfaces
- **Proxmox UI**: https://192.168.2.200:8006
- **Grafana**: http://192.168.2.114:3000
- **Prometheus**: http://192.168.2.114:9090
- **Nginx Proxy Manager**: http://192.168.2.101:81
- **n8n**: http://192.168.2.107:5678
### Key Network Segments
- **Management Network**: 192.168.2.0/24
- **Proxmox Host**: 192.168.2.200
- **Reverse Proxy**: 192.168.2.101 (CT 102)
- **n8n**: 192.168.2.107 (CT 113)
- **Monitoring**: 192.168.2.114 (VM 101)
---
## Maintenance Schedule
### Automated Tasks
- **Backups**: Proxmox Backup Server - Daily incremental, Weekly full
- **Monitoring Scrapes**: Prometheus - Every 30 seconds
- **Certificate Renewal**: Nginx Proxy Manager - Automatic via Let's Encrypt
### Recommended Manual Tasks
- **Weekly**: Review Grafana dashboards for anomalies
- **Monthly**: Update monitoring stack Docker images
- **Quarterly**: Review backup retention policies
- **Semi-Annual**: Kernel updates on Proxmox host and VMs
---
## Known Issues & Resolutions
### Resolved
-  n8n PostgreSQL locale errors (fixed with `fix_n8n_db_c_locale.sh`)
-  n8n database permissions (fixed with `fix_n8n_db_permissions.sh`)
### Active Monitoring
- PVE Exporter SSL verification (set to false for self-signed certificates)
- Prometheus retention policies (currently 15 days, may need adjustment)
### Deferred
- NetBox container offline (on-demand service)
- Development VMs stopped (resource conservation)
---
## Version History
- **v2.1.0** (2025-12-07): Added monitoring stack, twingate connector, updated infrastructure counts
- **v2.0.0** (2025-12-02): Repository reorganization, services migration from GitLab
- **v1.0.0** (2025-11-29): Initial infrastructure documentation
---
**Maintained by**: jramos
**Repository**: Homelab Infrastructure Configuration
**Platform**: Proxmox VE 8.3.3
**Infrastructure Scale**: 10 VMs, 4 Containers
**Current Status**: Operational - Monitoring & Documentation Phase