Files
homelab/CLAUDE_STATUS.md
Jordan Ramos c4962194e3 feat(auth): integrate TinyAuth SSO for NetBox authentication
Deploy TinyAuth v4 as CT 115 (192.168.2.10) to provide centralized
SSO authentication for NetBox via Nginx Proxy Manager.

**New Infrastructure:**
- CT 115: TinyAuth authentication layer
- Domain: tinyauth.apophisnetworking.net
- Integration: NPM auth_request → TinyAuth → NetBox

**Configuration:**
- Docker Compose with bcrypt-hashed credentials
- NPM advanced config for auth_request integration
- HTTPS enforcement via SSL termination

**Issues Resolved:**
- 500 Internal Server Error (Nginx config syntax)
- "IP addresses not allowed" (APP_URL domain requirement)
- Port mapping (8000:3000 for internal port 3000)
- Invalid password (bcrypt hash requirement for v4)

**Documentation:**
- Complete TinyAuth README at services/tinyauth/README.md
- Updated CLAUDE_STATUS.md with CT 115 infrastructure
- Added bug report for scribe agent tool permissions

**Note:** Container restart required on CT 115 to apply bcrypt hash

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 08:15:05 -07:00

20 KiB

Homelab Infrastructure Status

Last Updated: 2025-12-18 17:00:00 Export Reference: disaster-recovery/homelab-export-20251211-144345

Current Infrastructure Snapshot

Proxmox Environment

  • Node: serviceslab
  • Version: Proxmox VE 8.4.0
  • Management IP: 192.168.2.200
  • Architecture: Single-node cluster
  • Total Resources: 9 VMs, 2 Templates, 5 LXC Containers

Virtual Machines (QEMU/KVM) - 9 VMs

VM ID Name IP Address Status Purpose
100 docker-hub 192.168.2.XXX Running Container registry/Docker hub mirror
101 monitoring-docker 192.168.2.114 Running Monitoring stack (Grafana/Prometheus/PVE Exporter)
105 dev - Stopped General-purpose development workstation
106 Ansible-Control 192.168.2.XXX Running IaC orchestration, configuration management
108 CML - Stopped Cisco Modeling Labs - network simulation
109 web-server-01 192.168.2.XXX Running Web application server (clustered)
110 web-server-02 192.168.2.XXX Running Load-balanced pair with web-server-01
111 db-server-01 192.168.2.XXX Running Backend database server
114 haos 192.168.2.XXX Running Home Assistant OS - smart home automation platform

Recent Changes:

  • Added VM 101 (monitoring-docker) for dedicated monitoring infrastructure
  • Removed VM 101 (gitlab) - service decommissioned

VM Templates - 2 Templates

Template ID Name Purpose
104 ubuntu-dev Ubuntu development environment template for cloning
107 ubuntu-docker Ubuntu Docker host template for rapid deployment

Note: Templates are immutable base images used for cloning new VMs, not running workloads. They provide standardized configurations for consistent infrastructure provisioning.


Containers (LXC) - 5 Containers

CT ID Name IP Address Status Purpose
102 nginx 192.168.2.101 Running Reverse proxy/load balancer & NPM
103 netbox 192.168.2.XXX Running Network documentation/IPAM
112 twingate-connector 192.168.2.XXX Running Zero-trust network access connector
113 n8n 192.168.2.107 Running Workflow automation platform
115 tinyauth 192.168.2.10 Running SSO authentication layer for NetBox

Recent Changes:

  • Added CT 115 (tinyauth) for SSO authentication integration with NetBox
  • Added CT 112 (twingate-connector) for zero-trust network security
  • Added CT 113 (n8n) for workflow automation
  • Removed CT 112 (Anytype) - replaced by n8n

Storage Architecture

Storage Pool Type Total Used % Used Purpose
local Directory - - 19.11% System files, ISOs, templates
local-lvm LVM-Thin - - 0.01% VM disk images (thin provisioned)
Vault NFS/Directory - - 12.13% Secure storage for sensitive data
PBS-Backups PBS - - 28.27% Automated backup repository
iso-share NFS/CIFS - - 1.45% Installation media library
localnetwork Network Share - - N/A Shared resources across infrastructure

Capacity Notes:

  • PBS-Backups utilization increased to 28.27% (healthy retention)
  • Vault utilization increased to 12.13% (data growth monitored)
  • local storage at 19.11% (system overhead within normal range)

Key Services & Stacks

Monitoring & Observability (NEW)

VM 101 - monitoring-docker (192.168.2.114)

  • Grafana: Port 3000 - Visualization and dashboards
  • Prometheus: Port 9090 - Metrics collection and time-series database
  • PVE Exporter: Port 9221 - Proxmox VE metrics exporter
  • Documentation: /home/jramos/homelab/monitoring/README.md
  • Status: Fully operational

Network Security (NEW)

CT 112 - twingate-connector

  • Purpose: Zero-trust network access
  • Type: Lightweight connector
  • Status: Running
  • Integration: Connects homelab to Twingate network

Automation & Integration

CT 113 - n8n (192.168.2.107)

  • Purpose: Workflow automation platform
  • Technology: n8n.io
  • Database: PostgreSQL 15+
  • Features: API integration, scheduled workflows, webhook triggers
  • Documentation: /home/jramos/homelab/services/README.md#n8n-workflow-automation
  • Status: Operational (resolved database locale issues)

Authentication & SSO

CT 115 - tinyauth (192.168.2.10)

  • Purpose: Lightweight SSO authentication layer
  • Technology: TinyAuth v4 (Docker container)
  • Port: 8000
  • Domain: tinyauth.apophisnetworking.net
  • Integration: Authentication gateway for NetBox via Nginx Proxy Manager
  • Security: Bcrypt-hashed credentials, HTTPS enforcement
  • Documentation: /home/jramos/homelab/services/tinyauth/README.md
  • Status: Operational

Infrastructure Documentation

CT 103 - netbox

  • Purpose: Network documentation and IPAM
  • Status: Stopped (on-demand use)
  • Function: Infrastructure source of truth

Reverse Proxy & Load Balancing

CT 102 - nginx (192.168.2.101)

  • Purpose: Nginx Proxy Manager
  • Ports: 80, 81, 443
  • Function: SSL termination, reverse proxy, certificate management
  • Upstream Services: All web-facing applications

Three-Tier Application Stack

Web Tier:

  • VM 109 (web-server-01) - Primary web server
  • VM 110 (web-server-02) - Load-balanced pair

Database Tier:

  • VM 111 (db-server-01) - Backend database

Proxy Tier:

  • CT 102 (nginx) - Load balancer and SSL termination

Development & Automation

VM 106 - Ansible-Control

  • Purpose: Infrastructure as Code orchestration
  • Tools: Ansible, Terraform/OpenTofu (potential)
  • Status: Running

Container Registry

VM 100 - docker-hub

  • Purpose: Local Docker registry and hub mirror
  • Function: Caching container images for faster deployments
  • Status: Running

Network Simulation

VM 108 - CML

  • Purpose: Cisco Modeling Labs
  • Function: Network topology testing and simulation
  • Status: Stopped (resource-intensive, on-demand use)

Architecture Patterns

Monitoring & Observability (NEW)

The infrastructure now implements a comprehensive monitoring stack following industry best practices:

  • Metrics Collection: Prometheus scraping Proxmox metrics via PVE Exporter
  • Visualization: Grafana providing real-time dashboards and alerting
  • Isolation: Dedicated VM for monitoring services (fault isolation)
  • Integration: Ready for AlertManager, additional exporters, and integrations

Design Decision: VM-based deployment provides kernel-level isolation and prevents resource contention with critical infrastructure services.

Zero-Trust Security (NEW)

Implementation of zero-trust network access principles:

  • Twingate Connector: Lightweight connector providing secure access without VPNs
  • Container Deployment: LXC container for minimal resource overhead
  • Network Segmentation: Secure access to homelab from external networks

Design Decision: LXC container chosen for quick provisioning and low resource consumption.

Automation-First Approach

Workflow automation and infrastructure orchestration:

  • n8n Platform: Visual workflow builder for API integrations
  • Scheduled Tasks: Automated backup checks, monitoring alerts, reports
  • Integration Hub: Connects monitoring, documentation, and operational tools

Design Decision: PostgreSQL backend ensures data persistence and supports complex workflows.

Tiered Application Architecture

Classic three-tier design for production-like environments:

  • Presentation Tier: Paired web servers (109, 110) behind load balancer
  • Business Logic: Application processing on web tier
  • Data Tier: Dedicated database server (111) with backup strategy

Design Decision: Separation of concerns, scalability testing, high availability patterns.

Selective Containerization Strategy

Hybrid approach balancing performance and resource efficiency:

  • LXC Containers: Stateless services (nginx, netbox, twingate, n8n)
  • Full VMs: Complex applications, kernel dependencies, heavy workloads
  • Rationale: LXC for ~10x lower overhead, VMs for isolation and compatibility

Recent Infrastructure Changes

2025-12-18: TinyAuth SSO Deployment

Service Deployed: CT 115 - TinyAuth authentication layer

Purpose: Centralized SSO authentication for NetBox and future homelab services

Specifications:

  • Container: CT 115 (LXC with Docker)
  • IP Address: 192.168.2.10
  • Domain: tinyauth.apophisnetworking.net
  • Port: 8000 (external), 3000 (internal)
  • Docker Image: ghcr.io/steveiliop56/tinyauth:v4
  • Resource Usage: ~50-100 MB memory, <1% CPU

Integration Architecture:

  • Internet → Nginx Proxy Manager (CT 102) → TinyAuth (CT 115) → NetBox (CT 103)
  • NPM uses auth_request directive to validate credentials via TinyAuth
  • Bcrypt-hashed password storage for security
  • HTTPS enforcement via NPM SSL termination

Issues Resolved During Deployment:

  1. 500 Internal Server Error: Fixed Nginx advanced config syntax
  2. IP addresses not allowed: Changed APP_URL from IP to domain
  3. Port mapping: Corrected Docker port mapping from 8000:8000 to 8000:3000
  4. Invalid password: Implemented bcrypt hash requirement for TinyAuth v4

Integration Impact:

  • NetBox now protected by centralized authentication
  • Foundation for extending SSO to other services (Grafana, Proxmox UI future candidates)
  • Authentication logs available for security auditing

Documentation: Complete guide at /home/jramos/homelab/services/tinyauth/README.md

Status: Operational - Successfully authenticating NetBox access


2025-12-11: Loki-Stack Monitoring Fully Operational

Issue Resolved: Centralized logging pipeline now receiving syslog from UniFi router

Root Cause: rsyslog filter in /etc/rsyslog.d/unifi-router.conf was configured for wrong source IP (192.168.1.1 instead of 192.168.2.1)

Fix Applied: Updated rsyslog filter to match VLAN 2 gateway IP (192.168.2.1)

Status: Complete - Logs flowing UniFi → rsyslog → Promtail → Loki → Grafana

Services Affected:

  • VM 101 (monitoring-docker): rsyslog configuration updated
  • Loki-stack: All components operational
  • Grafana: Dashboards receiving real-time syslog data

Technical Details: See troubleshooting/loki-stack-bugfix.md for complete 5-phase troubleshooting history


2025-12-11: Infrastructure Expansion & System Updates

Proxmox VE Platform Upgrade

  • Upgraded: Proxmox VE 8.3.3 → 8.4.0
  • Kernel: 6.8.12-8-pve
  • pve-manager: 8.4.14
  • Impact: Enhanced performance, security updates, bug fixes
  • Status: Complete - All VMs and containers operating normally

New VM 114: Home Assistant OS Deployment

  • Service: haos (Home Assistant Operating System)
  • Purpose: Smart home automation and integration platform
  • Specifications:
    • Memory: 4 GB (87% utilized)
    • CPU: 2 vCPUs
    • Boot Disk: 50 GB
    • Status: Running (~3 days uptime)
  • Rationale: Centralized home automation hub for IoT device management
  • Integration: Will integrate with monitoring stack for infrastructure metrics

CT 103: NetBox IPAM Activated

  • Service: netbox (Network Documentation & IPAM)
  • Status Change: Stopped → Running
  • Uptime: ~3.1 days
  • Resource Usage: 1.28 GB / 2 GB memory (64%)
  • Purpose: Active network documentation and IP address management
  • Rationale: Required for ongoing infrastructure expansion planning
  • PBS-Backups: 27.43% → 28.27% (+0.84%) - Normal backup retention growth
  • Vault (ZFS): 10.88% → 12.13% (+1.25%) - Data accumulation monitored
  • local: 15.13% → 19.11% (+3.98%) - New VM deployment and system updates
  • iso-share: 1.4% → 1.45% (+0.05%) - Minimal change
  • local-lvm: 0.0% → 0.01% (+0.01%) - Thin provisioned storage baseline

2025-12-07: Infrastructure Documentation & Monitoring Stack

Additions

  1. VM 101 (monitoring-docker): New dedicated monitoring infrastructure

    • Grafana for visualization
    • Prometheus for metrics collection
    • PVE Exporter for Proxmox integration
    • IP: 192.168.2.114
  2. CT 112 (twingate-connector): Zero-trust network security

    • Lightweight connector
    • Secure remote access without VPN
  3. CT 113 (n8n): Workflow automation platform

    • PostgreSQL 15+ backend
    • IP: 192.168.2.107
    • Resolved database locale issues

Modifications

  • Storage utilization updated across all pools
  • PBS-Backups now at 27.43% (increased retention)
  • Vault optimized to 10.88% (reduced usage)

Removals

  • VM 101 (gitlab): Decommissioned (previously at this ID)
  • CT 112 (Anytype): Replaced by n8n for better integration

Documentation Updates

  • Created comprehensive monitoring stack documentation
  • Updated all infrastructure tables with current VMs/CTs
  • Added architecture patterns for observability and zero-trust
  • Updated storage statistics
  • Referenced latest export: disaster-recovery/homelab-export-20251207-120040

Repository Structure

homelab/
    monitoring/                      # NEW: Monitoring stack configurations
        README.md                   # Comprehensive monitoring documentation
        grafana/
            docker-compose.yml
        prometheus/
            docker-compose.yml
            prometheus.yml
        pve-exporter/
            docker-compose.yml
            pve.yml
            .env
    services/                        # Docker Compose service configurations
        n8n/                        # n8n workflow automation
        netbox/                     # Network documentation & IPAM
        README.md                   # Services overview (updated)
    disaster-recovery/
        homelab-export-20251207-120040/  # Latest infrastructure export
    scripts/
        crawlers-exporters/         # Infrastructure collection scripts
        fixers/                     # Problem-solving scripts
        qol/                        # Quality of life improvements
    CLAUDE.md                        # AI assistant guidance (updated)
    INDEX.md                         # Navigation index (updated)
    README.md                        # Repository overview (updated)
    CLAUDE_STATUS.md                # This file - current infrastructure status

Current Initiative: Sub-Agent Architecture Optimization (2025-12-07)

Goal

Improve the quality and effectiveness of all sub-agent prompt definitions to match best practices identified through comprehensive Opus-powered prompt engineering analysis. Target: bring all sub-agents to the quality standard established by librarian.md (~120-340 lines with comprehensive examples, safety protocols, and decision frameworks).

Phase

COMPLETED - All sub-agent improvements and validations finished

Progress Checklist

  • Prompt engineering analysis completed (Opus model)
    • Analyzed CLAUDE.md and all 4 sub-agent files
    • Identified 5 critical issues, 12 high-impact improvements
    • Generated comprehensive improvement recommendations
  • scribe.md improved (29 340 lines)
    • Added 6 usage examples (4 positive, 2 negative redirects)
    • Implemented comprehensive responsibilities section
    • Added 3 complete ASCII diagram templates
    • Included safety protocols and decision frameworks
    • Quality now matches librarian.md standard
  • backend-builder.md improved (40 291 lines)
    • Added 6 usage examples with clear boundaries
    • Expanded core responsibilities with Ansible, Terraform, Docker Compose, Python, Shell
    • Added technology stack table and validation rules table
    • Included safety protocols for secrets and destructive operations
    • Added handoff protocol for lab-operator deployment
    • Defined clear boundaries (CREATES code, does NOT deploy)
  • lab-operator.md improved (37 193 lines)
    • Added 6 usage examples with role clarity
    • Expanded domain expertise with specific commands
    • Added command style guide (5-step pattern)
    • Included safety protocols and decision-making framework
    • Added error handling and escalation guidelines
    • Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC)
  • CLAUDE.md structural fixes
    • Moved YAML frontmatter to line 1 (was at line 89)
    • Fixed trailing pipe character on line 87
    • Completed incomplete sentence about backup strategy
    • Completed incomplete sentence about storage growth
    • Removed redundant "Key Services" reference
    • Expanded status file template with actual structure and recovery instructions
  • Final validation and testing
    • librarian: Git status check successful, clear output format
    • scribe: File reading functional (note: reported encoding issue, likely false positive)
    • backend-builder: YAML validation successful, proper syntax checking
    • lab-operator: Directory listing successful, proper command execution
    • All agents demonstrate improved structure and clarity

Context

Why It Matters: Well-designed sub-agent prompts improve task routing accuracy, execution quality, error reduction, and maintainability. The librarian.md agent (143 lines) sets the quality standard; scribe was severely underdeveloped at 29 lines before improvement.

Next Steps: Improve backend-builder.md and lab-operator.md using scribe.md as quality template.


Previous Phase: Infrastructure Documentation Complete

Goal

Comprehensive documentation of monitoring stack and updated infrastructure inventory.

Phase

Documentation & Maintenance

Completed Tasks

  • Created /home/jramos/homelab/monitoring/README.md with comprehensive monitoring documentation
  • Updated CLAUDE_STATUS.md with current infrastructure state
  • Documented 8 VMs, 2 Templates, and 4 LXC containers
  • Updated storage statistics (PBS 27.43%, Vault 10.88%, local 15.13%)
  • Added monitoring stack architecture and deployment procedures
  • Documented new services: monitoring-docker, twingate-connector, n8n
  • Referenced latest export: disaster-recovery/homelab-export-20251207-120040

Remaining Documentation Tasks

  • Update INDEX.md with monitoring section and current VM/CT counts
  • Update README.md with infrastructure (8 VMs, 2 Templates, 4 LXC)
  • Update CLAUDE.md with architecture tables for monitoring and zero-trust
  • Update services/README.md with monitoring stack and twingate sections
  • Verify all documentation cross-references are accurate
  • Test monitoring stack deployment procedures

Access Information

Management Interfaces

Key Network Segments

  • Management Network: 192.168.2.0/24
  • Proxmox Host: 192.168.2.200
  • Reverse Proxy: 192.168.2.101 (CT 102)
  • TinyAuth: 192.168.2.10 (CT 115)
  • n8n: 192.168.2.107 (CT 113)
  • Monitoring: 192.168.2.114 (VM 101)

Maintenance Schedule

Automated Tasks

  • Backups: Proxmox Backup Server - Daily incremental, Weekly full
  • Monitoring Scrapes: Prometheus - Every 30 seconds
  • Certificate Renewal: Nginx Proxy Manager - Automatic via Let's Encrypt
  • Weekly: Review Grafana dashboards for anomalies
  • Monthly: Update monitoring stack Docker images
  • Quarterly: Review backup retention policies
  • Semi-Annual: Kernel updates on Proxmox host and VMs

Known Issues & Resolutions

Resolved

  • n8n PostgreSQL locale errors (fixed with fix_n8n_db_c_locale.sh)
  • n8n database permissions (fixed with fix_n8n_db_permissions.sh)

Active Monitoring

  • PVE Exporter SSL verification (set to false for self-signed certificates)
  • Prometheus retention policies (currently 15 days, may need adjustment)

Deferred

  • NetBox container offline (on-demand service)
  • Development VMs stopped (resource conservation)

Version History

  • v2.1.0 (2025-12-07): Added monitoring stack, twingate connector, updated infrastructure counts
  • v2.0.0 (2025-12-02): Repository reorganization, services migration from GitLab
  • v1.0.0 (2025-11-29): Initial infrastructure documentation

Maintained by: jramos Repository: Homelab Infrastructure Configuration Platform: Proxmox VE 8.4.0 Infrastructure Scale: 9 VMs, 2 Templates, 4 Containers Current Status: Operational - Home Automation Integration Deployed