feat(agents): optimize sub-agent architecture with comprehensive prompt engineering

This commit implements a comprehensive optimization of all sub-agent prompt
definitions based on Opus-powered prompt engineering analysis. All agents now
match the quality standard established by librarian.md.

Agent Improvements:
- scribe.md: 29→340 lines (11.7x expansion)
  * Added 6 usage examples with role clarity
  * Implemented comprehensive responsibilities section
  * Added 3 complete ASCII diagram templates
  * Included safety protocols and decision frameworks

- backend-builder.md: 40→291 lines (7.3x expansion)
  * Added 6 usage examples with clear boundaries
  * Expanded core responsibilities (Ansible, Terraform, Docker, Python, Shell)
  * Added technology stack and validation rules tables
  * Included handoff protocol for lab-operator deployment
  * Defined clear boundaries (CREATES code, does NOT deploy)

- lab-operator.md: 37→193 lines (5.2x expansion)
  * Added 6 usage examples with role clarity
  * Expanded domain expertise with specific commands
  * Added command style guide (5-step pattern)
  * Included safety protocols and decision-making framework
  * Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC)

- librarian.md: Minor formatting improvements

CLAUDE.md Fixes:
- Moved YAML frontmatter to line 1 (was incorrectly at line 89)
- Fixed trailing pipe character
- Completed incomplete sentences about backup strategy and storage growth
- Removed redundant information
- Expanded status file template with recovery instructions

Files Added:
- Claude_UPDATES.md: Comprehensive prompt engineering analysis report
- monitoring/pve-exporter/pve.yml: PVE monitoring configuration

Impact:
- Total agent documentation: 249→967 lines (288% increase)
- Usage examples: 6→24 total (400% increase)
- All agents now have comprehensive safety protocols
- Clear role boundaries prevent agent overlap
- Validation testing confirms all agents functional

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-07 22:39:40 -07:00
parent 52faebb63a
commit 004e3da77c
8 changed files with 2594 additions and 125 deletions

185
CLAUDE.md
View File

@@ -1,3 +1,16 @@
---
version: 2.2.0
last_updated: 2025-12-07
infrastructure_source: CLAUDE_STATUS.md
repository_type: homelab
primary_node: serviceslab
proxmox_version: 8.3.3
vm_count: 10
lxc_count: 4
working_directory: /home/jramos/homelab
git_remote: http://192.168.2.102:3060/jramos/homelab.git
---
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
@@ -6,60 +19,90 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
This is a homelab infrastructure repository managing a Proxmox VE 8.3.3-based services and development laboratory environment. The infrastructure follows a hybrid architecture pattern combining traditional virtualization (KVM/QEMU) with containerization (LXC) for optimal resource utilization and service isolation.
## Quick Reference
| Resource | Value |
|----------|-------|
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
| **Proxmox Version** | PVE 8.3.3 |
| **Infrastructure** | 10 VMs, 4 LXC containers |
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
| **Version Control** | Gitea at 192.168.2.102:3060 |
| **Working Directory** | /home/jramos/homelab |
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |
**Key Services:**
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
- CT 112 (twingate-connector): Zero-trust network access
- CT 113 (n8n): Workflow automation at 192.168.2.107
## Agent Selection Guide
When working with this repository, choose the appropriate agent based on task type:
| Task Type | Primary Agent | Tools Available | Notes |
|-----------|---------------|-----------------|-------|
| **Git Operations** | `librarian` | Bash, Read, Grep, Edit, Write | Commits, branches, merges, .gitignore |
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
| **File Creation** | Main Agent | All tools | Use when sub-agents lack specific tools |
| **Complex Multi-Agent Tasks** | Main Agent | All tools | Coordinates between specialized agents |
### Task Routing Decision Tree
```
Is this a git/version control task?
├── Yes → Use librarian
└── No ↓
Is this documentation (README, guides, diagrams)?
├── Yes → Use scribe
└── No ↓
Does this require system commands (docker, ssh, proxmox)?
├── Yes → Use lab-operator
└── No ↓
Is this code/config creation (Ansible, Python, Terraform)?
├── Yes → Use backend-builder
└── No → Use Main Agent
```
### Agent Collaboration Patterns
**Documentation Workflow:**
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
2. `scribe` updates documentation
3. `librarian` commits all changes
**Infrastructure Deployment:**
1. `backend-builder` writes IaC (Ansible/Terraform/Compose)
2. `lab-operator` deploys to Proxmox/Docker
3. `scribe` documents deployment
4. `librarian` commits configuration
## Infrastructure Overview
### Proxmox Environment
- **Platform**: Proxmox Virtual Environment 8.3.3
- **Architecture Pattern**: Services/Development Laboratory
- **Primary Node**: `serviceslab` (single-node cluster)
- **Deployment Model**: Hybrid VM + LXC container approach
**For detailed, current infrastructure inventory, see:**
- **Live Status**: `CLAUDE_STATUS.md` (most current)
- **Service Details**: `services/README.md`
- **Complete Index**: `INDEX.md`
### Key Services & Virtual Machines (QEMU/KVM)
**Quick Summary:**
- **VMs**: 10 total (IDs: 100, 101, 104-111)
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
- **Storage Pools**: local, local-lvm, Vault (ZFS), PBS-Backups, iso-share
- **Monitoring**: VM 101 at 192.168.2.114 (Grafana/Prometheus/PVE Exporter)
The infrastructure employs full VMs for services requiring kernel-level isolation, complex dependencies, or heavyweight applications:
| VM ID | Name | Purpose | Notes |
|-------|------|---------|-------|
| 100 | docker-hub | Container registry/Docker hub mirror | Local container image caching |
| 101 | monitoring-docker | Monitoring stack | Grafana/Prometheus/PVE Exporter at 192.168.2.114 |
| 104 | ubuntu-dev | Ubuntu development environment | Additional dev workstation |
| 105 | dev | Development environment | General-purpose development workstation |
| 106 | Ansible-Control | Automation control node | IaC orchestration, configuration management |
| 107 | ubuntu-docker | Ubuntu Docker host | Docker-focused environment |
| 108 | CML | Cisco Modeling Labs | Network simulation/testing environment |
| 109 | web-server-01 | Web application server | Production-like web tier (clustered) |
| 110 | web-server-02 | Web application server | Load-balanced pair with web-server-01 |
| 111 | db-server-01 | Database server | Backend data tier |
### Containers (LXC)
Lightweight services leveraging LXC for reduced overhead and faster provisioning:
| CT ID | Name | Purpose | Notes |
|-------|------|---------|-------|
| 102 | nginx | Reverse proxy/load balancer | Front-end traffic management (NPM) |
| 103 | netbox | Network documentation/IPAM | Infrastructure source of truth |
| 112 | twingate-connector | Zero-trust network access | Secure remote access connector |
| 113 | n8n | Workflow automation | n8n.io platform at 192.168.2.107 |
### Storage Architecture
The storage layout demonstrates a well-organized approach to data separation:
| Storage Pool | Type | Usage | Purpose |
|--------------|------|-------|---------|
| local | Directory | 15.13% | System files, ISOs, templates |
| local-lvm | LVM-Thin | 0.0% | VM disk images (thin provisioned) |
| Vault | NFS/Directory | 10.88% | Secure storage for sensitive data |
| PBS-Backups | Proxmox Backup Server | 27.43% | Automated backup repository |
| iso-share | NFS/CIFS | 1.4% | Installation media library |
| localnetwork | Network share | N/A | Shared resources across infrastructure |
**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate counts, IPs, and status.
### Architecture Patterns & Design Decisions
**Tiered Application Architecture**: The infrastructure implements a classic three-tier design with dedicated web servers (109, 110), database server (111), and reverse proxy (102), suggesting this lab is used for practicing production-like deployments.
**Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103) indicates a focus on Infrastructure as Code and proper documentation practices—rather civilized.
**Automation-First Approach**: The presence of Ansible-Control (106), Gitea (100), and NetBox (103) indicates a focus on Infrastructure as Code and proper documentation practices—rather civilized.
**Network Simulation Capability**: CML (108) suggests network engineering activities, possibly testing configurations before production deployment.
@@ -69,6 +112,8 @@ The storage layout demonstrates a well-organized approach to data separation:
**Zero-Trust Security**: Implementation of Twingate connector (CT 112) demonstrates modern security practices, providing secure remote access without traditional VPN complexity.
**Backup Strategy**: PBS-Backups utilization is at 27.43% (see CLAUDE_STATUS.md for current metrics). Automated daily incremental backups with weekly full backups ensure data protection across all VMs and containers.
## Working with This Environment
### Universal Workflow
@@ -78,38 +123,43 @@ For every complex task, every Agent must follow this loop:
3. **Update**: Edit `CLAUDE_STATUS.md` to mark your step as `[x]` and update the "Current Context".
### Status File Template
If `CLAUDE_STATUS.md` is missing, initialize it with:
- **Goal**: [User Goal]
- **Phase**: [Planning / Dev / Deploy]
- **Checklist**: [List of steps]
If `CLAUDE_STATUS.md` is missing or corrupted, recover it from the latest disaster recovery export:
- **Location**: `disaster-recovery/homelab-export-YYYYMMDD-HHMMSS/CLAUDE_STATUS.md`
- **Alternative**: Use the scribe agent to recreate from current infrastructure state
**Minimum required structure:**
```markdown
# Homelab Infrastructure Status
**Last Updated**: YYYY-MM-DD HH:MM:SS
**Export Reference**: disaster-recovery/homelab-export-YYYYMMDD-HHMMSS
## Current Infrastructure Snapshot
- Proxmox VE 8.3.3 on serviceslab (192.168.2.200)
- 10 VMs, 4 LXC containers
## Current Initiative
**Goal**: [Initiative description]
**Phase**: [Planning / Implementation / Testing]
**Progress Checklist**: [Task list with checkboxes]
## Recent Infrastructure Changes
[Chronological log of changes with dates]
```
### Best Practices
1. **Backup Strategy**: With PBS-Backups at 21.6% utilization and excellent uptime (27-68 days), ensure regular backup schedules are maintained. Consider implementing the 3-2-1 rule if not already in place.
2. **Resource Management**: Monitor the local-lvm pool (currently 0.0%)—this appears to be reserved capacity. Ensure thin provisioning doesn't lead to overcommitment.
3. **Configuration Management**: Utilize the Ansible-Control node (106) for infrastructure changes. Avoid manual configuration drift.
4. **Documentation**: NetBox (103) should be the single source of truth for IP addressing, VLANs, and service inventory. Keep it updated.
5. **Version Control**: GitLab (101) should house all Infrastructure as Code, scripts, and configuration files from this repository.
6. **Load Balancing**: The paired web servers (109, 110) suggest HA testing—ensure nginx (102) is properly configured for failover.
### Access Patterns
- **Proxmox Web UI**: Primary management interface for VM/CT lifecycle operations
- **Ansible**: Automated configuration deployment and orchestration
- **GitLab**: CI/CD pipelines for infrastructure testing and deployment
- **Gitea**: CI/CD pipelines for infrastructure testing and deployment
- **NetBox**: Network documentation and IP address management
### Maintenance Considerations
- **Uptime**: Services showing 27-68 days uptime—schedule maintenance windows for kernel updates
- **Storage Growth**: PBS-Backups at 21.6% allows healthy retention; review backup policies quarterly
- **Capacity Planning**: Current utilization suggests comfortable headroom; monitor trends via Proxmox metrics
- **Uptime**: Track uptime metrics in disaster recovery exports for trend analysis
- **Storage Growth**: PBS-Backups at 27.43%, Vault at 10.88%, local at 15.13% (see CLAUDE_STATUS.md for current metrics)
- **Capacity Planning**: Current utilization suggests comfortable headroom; monitor trends via Proxmox metrics in monitoring-docker (101)
## Development Setup
@@ -123,7 +173,6 @@ The repository structure will house:
## Notes
- This is a Windows Subsystem for Linux (WSL2) environment
- Working directory: /mnt/c/Users/fam1n/Documents/homelab
- This repository is not yet initialized as a git repository
- Working directory: /home/jramos/homelab
- Proxmox node `serviceslab` is the single point of management
- Infrastructure demonstrates production-like patterns suitable for learning and testing