Files
homelab/Claude_UPDATES.md

1613 lines
55 KiB
Markdown
Raw Normal View History

feat(agents): optimize sub-agent architecture with comprehensive prompt engineering This commit implements a comprehensive optimization of all sub-agent prompt definitions based on Opus-powered prompt engineering analysis. All agents now match the quality standard established by librarian.md. Agent Improvements: - scribe.md: 29→340 lines (11.7x expansion) * Added 6 usage examples with role clarity * Implemented comprehensive responsibilities section * Added 3 complete ASCII diagram templates * Included safety protocols and decision frameworks - backend-builder.md: 40→291 lines (7.3x expansion) * Added 6 usage examples with clear boundaries * Expanded core responsibilities (Ansible, Terraform, Docker, Python, Shell) * Added technology stack and validation rules tables * Included handoff protocol for lab-operator deployment * Defined clear boundaries (CREATES code, does NOT deploy) - lab-operator.md: 37→193 lines (5.2x expansion) * Added 6 usage examples with role clarity * Expanded domain expertise with specific commands * Added command style guide (5-step pattern) * Included safety protocols and decision-making framework * Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC) - librarian.md: Minor formatting improvements CLAUDE.md Fixes: - Moved YAML frontmatter to line 1 (was incorrectly at line 89) - Fixed trailing pipe character - Completed incomplete sentences about backup strategy and storage growth - Removed redundant information - Expanded status file template with recovery instructions Files Added: - Claude_UPDATES.md: Comprehensive prompt engineering analysis report - monitoring/pve-exporter/pve.yml: PVE monitoring configuration Impact: - Total agent documentation: 249→967 lines (288% increase) - Usage examples: 6→24 total (400% increase) - All agents now have comprehensive safety protocols - Clear role boundaries prevent agent overlap - Validation testing confirms all agents functional 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-07 22:39:40 -07:00
# Claude Code Homelab Repository - Comprehensive Analysis & Improvement Recommendations
**Date**: 2025-12-07
**Scope**: CLAUDE.md + Sub-Agent Architecture Review
**Methodology**: Opus-powered prompt engineering analysis
**Repository**: `/home/jramos/homelab/`
---
## Executive Summary
This comprehensive analysis evaluated the CLAUDE.md guidance file and all four sub-agent definitions (scribe, librarian, lab-operator, backend-builder) for efficiency, clarity, and effectiveness. The review identified **5 critical issues**, **12 high-impact improvements**, and **15 structural enhancements** that would significantly improve the agent system's functionality and maintainability.
### Critical Findings
1. **BLOCKING: Librarian Agent Non-Functional** - No tools defined in frontmatter; cannot execute ANY git commands
2. **BLOCKING: Backend-Builder Cannot Test Code** - Missing Bash tool; cannot validate any scripts or playbooks written
3. **HIGH: No Agent Can Create Files** - All agents lack Write tool; can only modify existing files
4. **HIGH: CLAUDE.md Has Stale References** - 5 references to decommissioned GitLab, wrong working directory path
5. **HIGH: Information Duplication Crisis** - Infrastructure tables duplicated across 5 files, creating maintenance burden
### Quick Win Opportunities (5-20 minutes each)
- Fix librarian tools: **2 minutes**, **CRITICAL impact**
- Fix GitLab references in CLAUDE.md: **5 minutes**, **high impact**
- Add Write tool to all agents: **3 minutes**, **high impact**
- Remove broken placeholder from scribe: **1 minute**, **medium impact**
### Total Estimated Effort
- **Priority 1 fixes**: ~15 minutes
- **Priority 2 improvements**: ~90 minutes
- **Priority 3 enhancements**: ~180 minutes
- **Full implementation**: ~5 hours
---
# Part 1: CLAUDE.md Analysis
## 1.1 Current State Assessment
**File**: `/home/jramos/homelab/CLAUDE.md`
**Length**: 130 lines
**Purpose**: Primary context file for Claude Code agents working in this repository
**Last Updated**: Unknown (no version tracking)
### Strengths
| Aspect | Details |
|--------|---------|
| **Infrastructure Context** | Lines 17-33 provide clear VM inventory with IDs, names, purposes |
| **Architecture Rationale** | Lines 58-70 explain the "why" behind design decisions |
| **Workflow Template** | Lines 74-84 establish a universal workflow pattern |
| **Storage Documentation** | Lines 45-56 document storage architecture comprehensively |
### Critical Issues
| Severity | Line(s) | Issue | Impact |
|----------|---------|-------|--------|
| **HIGH** | 62 | References "GitLab (101)" in Architecture Patterns - GitLab decommissioned | Misleading |
| **HIGH** | 97 | "GitLab (101) should house all IaC" - Service no longer exists | Incorrect |
| **HIGH** | 105 | "GitLab: CI/CD pipelines" - Wrong service listed | Confusing |
| **HIGH** | 126 | Wrong path "/mnt/c/Users/fam1n/Documents/homelab" | Breaks navigation |
| **HIGH** | 127 | "not yet initialized as a git repository" - Repository IS initialized | Factually wrong |
| **MEDIUM** | 89 | States "PBS-Backups at 21.6%" but line 54 says 27.43% | Inconsistent |
| **MEDIUM** | 110-112 | Hardcoded uptime numbers (27-68 days) become stale | Maintenance burden |
### Structural Issues
#### 1.1.1 Information Duplication
The VM/LXC/Storage tables (lines 17-56) duplicate content from:
- `CLAUDE_STATUS.md` (lines 17-45)
- `INDEX.md` (lines 314-349)
- `README.md` (lines 18-33)
- `services/README.md` (mentions throughout)
**Impact**: Updates require changing 5 files, creating drift risk and maintenance overhead.
#### 1.1.2 Missing Critical Sections
- **No Quick Reference**: Takes too long to find key info (node IP, monitoring URL, repo location)
- **No Agent Routing Guide**: No guidance on which agent to use for which task
- **No Version Tracking**: No YAML frontmatter or last-updated timestamp
- **No Tool-to-Task Mappings**: Agents don't know their capabilities vs requirements
#### 1.1.3 Outdated Information
| Line | Current Text | Reality |
|------|--------------|---------|
| 62 | "GitLab (101)" | Gitea (external) or monitoring-docker (VM 101) |
| 89 | "21.6% utilization" | Should reference CLAUDE_STATUS.md for current |
| 97 | "GitLab (101) should house all IaC" | Gitea now handles version control |
| 105 | "GitLab: CI/CD pipelines" | Should be "Gitea: Version control" |
## 1.2 Recommended CLAUDE.md Restructuring
### Priority 1: Immediate Fixes (5 minutes total)
#### Fix 1: Update GitLab References
```diff
# Line 62
- **Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103)...
+ **Automation-First Approach**: The presence of Ansible-Control (106), Gitea, and NetBox (103)...
# Line 97
- 5. **Version Control**: GitLab (101) should house all Infrastructure as Code, scripts, and configuration files from this repository.
+ 5. **Version Control**: Gitea should house all Infrastructure as Code, scripts, and configuration files from this repository.
# Line 105
- - **GitLab**: CI/CD pipelines for infrastructure testing and deployment
+ - **Gitea**: Version control and repository management
```
#### Fix 2: Correct Working Directory
```diff
# Line 126
- - Working directory: /mnt/c/Users/fam1n/Documents/homelab
+ - Working directory: /home/jramos/homelab
```
#### Fix 3: Remove False Statement
```diff
# Line 127 - DELETE THIS LINE
- - This repository is not yet initialized as a git repository
```
#### Fix 4: Fix Storage Percentage
```diff
# Line 89
- 1. **Backup Strategy**: With PBS-Backups at 21.6% utilization...
+ 1. **Backup Strategy**: With PBS-Backups utilization growing (see CLAUDE_STATUS.md for current)...
```
### Priority 2: Add Quick Reference Section (15 minutes)
**Insert after line 8, before "## Infrastructure Overview":**
```markdown
## Quick Reference
| Resource | Value |
|----------|-------|
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
| **Proxmox Version** | PVE 8.3.3 |
| **Infrastructure** | 8 VMs, 2 Templates, 4 LXC containers |
feat(agents): optimize sub-agent architecture with comprehensive prompt engineering This commit implements a comprehensive optimization of all sub-agent prompt definitions based on Opus-powered prompt engineering analysis. All agents now match the quality standard established by librarian.md. Agent Improvements: - scribe.md: 29→340 lines (11.7x expansion) * Added 6 usage examples with role clarity * Implemented comprehensive responsibilities section * Added 3 complete ASCII diagram templates * Included safety protocols and decision frameworks - backend-builder.md: 40→291 lines (7.3x expansion) * Added 6 usage examples with clear boundaries * Expanded core responsibilities (Ansible, Terraform, Docker, Python, Shell) * Added technology stack and validation rules tables * Included handoff protocol for lab-operator deployment * Defined clear boundaries (CREATES code, does NOT deploy) - lab-operator.md: 37→193 lines (5.2x expansion) * Added 6 usage examples with role clarity * Expanded domain expertise with specific commands * Added command style guide (5-step pattern) * Included safety protocols and decision-making framework * Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC) - librarian.md: Minor formatting improvements CLAUDE.md Fixes: - Moved YAML frontmatter to line 1 (was incorrectly at line 89) - Fixed trailing pipe character - Completed incomplete sentences about backup strategy and storage growth - Removed redundant information - Expanded status file template with recovery instructions Files Added: - Claude_UPDATES.md: Comprehensive prompt engineering analysis report - monitoring/pve-exporter/pve.yml: PVE monitoring configuration Impact: - Total agent documentation: 249→967 lines (288% increase) - Usage examples: 6→24 total (400% increase) - All agents now have comprehensive safety protocols - Clear role boundaries prevent agent overlap - Validation testing confirms all agents functional 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-07 22:39:40 -07:00
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
| **Version Control** | Gitea at 192.168.2.102:3060 |
| **Working Directory** | /home/jramos/homelab |
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |
**Key Services:**
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
- CT 112 (twingate-connector): Zero-trust network access
- CT 113 (n8n): Workflow automation at 192.168.2.107
```
### Priority 2: Add Agent Routing Guide (30 minutes)
**Insert after Quick Reference:**
```markdown
## Agent Selection Guide
When working with this repository, choose the appropriate agent based on task type:
| Task Type | Primary Agent | Tools Available | Notes |
|-----------|---------------|-----------------|-------|
| **Git Operations** | `librarian` | Bash, Read, Grep, Edit, Write | Commits, branches, merges, .gitignore |
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
| **File Creation** | Main Agent | All tools | Use when sub-agents lack specific tools |
| **Complex Multi-Agent Tasks** | Main Agent | All tools | Coordinates between specialized agents |
### Task Routing Decision Tree
```
Is this a git/version control task?
├── Yes → Use librarian
└── No ↓
Is this documentation (README, guides, diagrams)?
├── Yes → Use scribe
└── No ↓
Does this require system commands (docker, ssh, proxmox)?
├── Yes → Use lab-operator
└── No ↓
Is this code/config creation (Ansible, Python, Terraform)?
├── Yes → Use backend-builder
└── No → Use Main Agent
```
### Agent Collaboration Patterns
**Documentation Workflow:**
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
2. `scribe` updates documentation
3. `librarian` commits all changes
**Infrastructure Deployment:**
1. `backend-builder` writes IaC (Ansible/Terraform/Compose)
2. `lab-operator` deploys to Proxmox/Docker
3. `scribe` documents deployment
4. `librarian` commits configuration
```
### Priority 2: Remove Duplicate Infrastructure Tables (20 minutes)
**Replace lines 17-56 with:**
```markdown
## Infrastructure Overview
**For detailed, current infrastructure inventory, see:**
- **Live Status**: `CLAUDE_STATUS.md` (most current)
- **Service Details**: `services/README.md`
- **Complete Index**: `INDEX.md`
**Quick Summary:**
- **VMs**: 10 total (IDs: 100, 101, 104-111)
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
- **Storage Pools**: local, local-lvm, Vault (ZFS), PBS-Backups, iso-share
- **Monitoring**: VM 101 at 192.168.2.114 (Grafana/Prometheus/PVE Exporter)
- **Key Services**: See Quick Reference above
**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate counts, IPs, and status.
```
### Priority 3: Add YAML Frontmatter (5 minutes)
**Insert at very beginning of file:**
```yaml
---
version: 2.2.0
last_updated: 2025-12-07
infrastructure_source: CLAUDE_STATUS.md
repository_type: homelab
primary_node: serviceslab
proxmox_version: 8.3.3
vm_count: 10
lxc_count: 4
working_directory: /home/jramos/homelab
git_remote: http://192.168.2.102:3060/jramos/homelab.git
---
```
## 1.3 Complete Proposed CLAUDE.md Structure
```markdown
---
version: 2.2.0
last_updated: 2025-12-07
infrastructure_source: CLAUDE_STATUS.md
---
# CLAUDE.md
This file provides guidance to Claude Code when working with this homelab infrastructure repository.
## Quick Reference
[Key info table - 10 lines]
## Agent Selection Guide
[Task routing decision tree - 30 lines]
## Repository Overview
[High-level purpose - 10 lines]
## Infrastructure Reference
[Link to CLAUDE_STATUS.md - 15 lines]
## Working with This Environment
### Universal Workflow
[Existing content - 15 lines]
## Architecture Principles
[Condensed from current patterns - 20 lines]
## Best Practices
[Updated practices - 15 lines]
## Development Setup
[Existing content - 10 lines]
## Notes
[Updated notes - 5 lines]
```
**Estimated new length**: ~130 lines (same as current)
**Information density**: Significantly higher
**Maintenance burden**: Reduced (references instead of duplicates)
---
# Part 2: Sub-Agent Architecture Analysis
## 2.1 Agent Inventory
| Agent | File | Lines | Tools Defined | Status |
|-------|------|-------|---------------|--------|
| **scribe** | sub-agents/scribe.md | 30 | Read, Grep, Glob, Edit | Missing Write |
| **librarian** | sub-agents/librarian.md | 127 | **NONE** | **NON-FUNCTIONAL** |
| **lab-operator** | sub-agents/lab-operator.md | 33 | Bash, Read, Grep, Edit | Missing Glob, Write |
| **backend-builder** | sub-agents/backend-builder.md | 28 | Read, Edit, Grep, Glob | Missing Write, Bash |
## 2.2 Individual Agent Reviews
### 2.2.1 Scribe Agent
**File**: `/home/jramos/homelab/sub-agents/scribe.md`
#### Frontmatter (Lines 1-8)
```yaml
---
name: scribe
description: >
Homelab Architect and Technical Writer. Explains concepts, designs network topologies,
summarizes project structures, and maintains documentation (READMEs).
tools: [Read, Grep, Glob, Edit]
model: sonnet
---
```
**Strengths:**
- Clean YAML structure
- Clear description
- Appropriate model
**Issues:**
| Line | Issue | Impact |
|------|-------|--------|
| 6 | Missing `Write` tool | Cannot create new documentation files |
| Missing | No `color` field | Inconsistent with librarian |
#### Prompt Body Analysis
**Lines 11-12:**
```
You are the **Scribe** (formerly Steve's Architecture Module).
```
- "Steve" reference confusing without context
- **Recommendation**: Remove "(formerly Steve's Architecture Module)"
**Line 16:**
```
1. **Documentation**: Keep `README.md` and `docs/` up to date
```
- References `docs/` directory that doesn't exist
- **Recommendation**: Update to actual docs locations
**Line 20 - CRITICAL ISSUE:**
```
[Image of network topology diagram]
```
- Broken placeholder, incomplete
- **Recommendation**: Delete this line immediately
**Line 28:**
```
- Do not execute code. Your job is to plan and explain.
```
- Conflicts with having `Edit` tool (which modifies files)
- **Recommendation**: Clarify "Do not execute system commands via Bash"
#### Scribe Recommendations
**Priority 1 (CRITICAL):**
```diff
# Line 6
- tools: [Read, Grep, Glob, Edit]
+ tools: [Read, Grep, Glob, Edit, Write]
# Line 20 - DELETE
- [Image of network topology diagram]
# After Line 7
+ color: blue
```
**Priority 2:**
```diff
# Line 11
- You are the **Scribe** (formerly Steve's Architecture Module).
+ You are the **Scribe** - Documentation Architect and Technical Writer.
# Line 16
- Keep `README.md` and `docs/` up to date
+ Keep `README.md`, `services/README.md`, and infrastructure docs up to date
```
---
### 2.2.2 Librarian Agent
**File**: `/home/jramos/homelab/sub-agents/librarian.md`
#### Frontmatter (Lines 1-6) - CRITICAL ISSUE
```yaml
---
name: librarian
description: Use this agent when the user needs Git repository management...
model: sonnet
color: purple
---
```
**BLOCKING ISSUE**: No `tools` field defined
**Impact**: Agent cannot execute ANY git commands. Completely non-functional.
#### Description Field - Major Problem
**Line 3**: Description is 552 words with 6 embedded examples
Example excerpt:
```
description: Use this agent when...
- Example 1 (Commit Operation):
user: "I've finished implementing..."
assistant: "I'll use the git-version-control agent..."
[... 5 more examples ...]
```
**Issues:**
1. Examples should be in prompt body, not frontmatter
2. Description unparseable by automated systems
3. Violates YAML frontmatter conventions
#### Prompt Body (Lines 8-125)
**Line count**: 118 lines (4x longer than other agents)
**Structure**: Professional prose (no XML tags like other agents)
**Strengths:**
- Comprehensive Git guidance
- Excellent safety protocols
- Infrastructure-aware (mentions VM/CT IDs)
- Good conventional commit examples
**Issues:**
| Line | Issue |
|------|-------|
| 8 | Prose style vs XML tags in other agents |
| 14-125 | Could be condensed by moving common patterns to CLAUDE.md |
#### Librarian Recommendations
**Priority 1 (CRITICAL) - MUST FIX:**
```diff
# Line 3
- description: Use this agent when the user needs Git repository management, including...
+ description: >
+ Git repository management specialist. Handles commits, branches, merges,
+ history review, .gitignore maintenance, and enforces conventional commit standards.
# After line 5 - ADD THIS
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
```
**Priority 2:**
Move examples from description to prompt body:
```markdown
## Usage Examples
### Commit Operation
User: "I've finished implementing the Ansible playbook for nginx configuration."
Action: Create properly formatted conventional commit.
### Branch Management
User: "Create a new feature branch for NetBox integration."
Action: Create appropriately named feature branch.
[... remaining examples ...]
```
**Priority 3:**
Add XML structure for consistency:
```xml
<system_role>
You are the **Librarian** - Git Version Control Specialist for the homelab repository.
</system_role>
<core_responsibilities>
[existing commit management section]
</core_responsibilities>
<safety_protocols>
1. NEVER force push to main/master
2. NEVER rewrite published history
3. Require confirmation for destructive operations
4. Block commits containing sensitive data patterns
</safety_protocols>
```
---
### 2.2.3 Lab-Operator Agent
**File**: `/home/jramos/homelab/sub-agents/lab-operator.md`
#### Frontmatter (Lines 1-8)
```yaml
---
name: lab-operator
description: >
Expert Homelab SysAdmin. Manages Proxmox, Docker, Kubernetes, TrueNAS, networking (pfSense/VLANs),
and Linux server administration. Handles package installation and system config.
tools: [Bash, Read, Grep, Edit]
model: sonnet
---
```
**Issues:**
| Line | Issue | Impact |
|------|-------|--------|
| 4-5 | Mentions Kubernetes, TrueNAS, pfSense not in homelab | Misleading |
| 6 | Missing `Glob` tool | Cannot find files by pattern |
| 6 | Missing `Write` tool | Cannot create new configs |
| Missing | No `color` field | Inconsistent |
#### Prompt Body (Lines 10-33)
**Strengths:**
- XML tag structure consistent with scribe/backend-builder
- Excellent `<safety_protocols>` section
- Good response style guidance
**Lines 16-20 - Domain Expertise Issues:**
```xml
<domain_expertise>
- **Virtualization**: Proxmox VE (LXC/VM management), ESXi.
- **Containers**: Docker Compose, Portainer, Kubernetes (k3s/microk8s).
- **Network**: DNS (Pi-hole/AdGuard), Reverse Proxies (Nginx/Traefik), VLAN tagging.
- **Storage**: ZFS pool management, NFS/SMB shares.
</domain_expertise>
```
**Problems:**
- Mentions ESXi, Portainer, Kubernetes, Pi-hole, AdGuard, Traefik - none in infrastructure
- Mentions ZFS but only once in actual setup (Vault storage)
- Doesn't mention Nginx Proxy Manager, Grafana, Prometheus, Twingate, n8n
#### Lab-Operator Recommendations
**Priority 1:**
```diff
# Line 6
- tools: [Bash, Read, Grep, Edit]
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
# After line 7
+ color: green
```
**Priority 2:**
```diff
# Lines 16-20 - REPLACE
- <domain_expertise>
- - **Virtualization**: Proxmox VE (LXC/VM management), ESXi.
- - **Containers**: Docker Compose, Portainer, Kubernetes (k3s/microk8s).
- - **Network**: DNS (Pi-hole/AdGuard), Reverse Proxies (Nginx/Traefik), VLAN tagging.
- - **Storage**: ZFS pool management, NFS/SMB shares.
- </domain_expertise>
+ <domain_expertise>
+ - **Virtualization**: Proxmox VE 8.3.3 (LXC containers, QEMU/KVM VMs)
+ - **Containers**: Docker Compose, container orchestration on VM hosts
+ - **Network**: Nginx Proxy Manager (CT 102), VLAN tagging, DNS
+ - **Storage**: Proxmox storage pools (local, local-lvm, Vault, PBS-Backups, iso-share)
+ - **Monitoring**: Grafana, Prometheus, PVE Exporter (VM 101 at 192.168.2.114)
+ - **Automation**: n8n workflow platform (CT 113), Ansible (VM 106)
+ - **Security**: Twingate zero-trust connector (CT 112)
+ </domain_expertise>
```
**Priority 3:**
Add Proxmox-specific safety protocols:
```diff
# After line 26
+ 4. **Proxmox Safety**: Confirm before `qm destroy`, `pct destroy`, or snapshot deletion.
+ 5. **Backup Verification**: Before major changes, verify PBS backup exists and is recent.
```
---
### 2.2.4 Backend-Builder Agent
**File**: `/home/jramos/homelab/sub-agents/backend-builder.md`
#### Frontmatter (Lines 1-8)
```yaml
---
name: backend-builder
description: >
DevOps and Software Engineer. Writes Python/Java code, Ansible playbooks,
Terraform configs, and complex Shell scripts. Handles database logic and API integrations.
tools: [Read, Edit, Grep, Glob]
model: sonnet
---
```
**Issues:**
| Line | Issue | Impact |
|------|-------|--------|
| 4 | Mentions Java - not in homelab | Misleading |
| 6 | Missing `Bash` tool | **CRITICAL**: Cannot test/validate code |
| 6 | Missing `Write` tool | Cannot create new files |
| Missing | No `color` field | Inconsistent |
#### Prompt Body (Lines 10-27)
**Strengths:**
- Good security focus (secrets management)
- Appropriate coding standards
- "Do not be lazy" guidance
**Line 18-20 - Homelab Stack:**
```
- **Python**: Use modern libraries (`pydantic` for config, `httpx` for APIs).
- **Ansible**: Ensure playbooks are idempotent.
- **Terraform**: precise resource targeting.
```
**Issues:**
- Missing Docker Compose guidance (major part of homelab)
- Terraform guidance vague
- No Shell script guidance
#### Backend-Builder Recommendations
**Priority 1 (CRITICAL):**
```diff
# Line 6
- tools: [Read, Edit, Grep, Glob]
+ tools: [Read, Edit, Grep, Glob, Write, Bash]
# After line 7
+ color: orange
```
**Priority 2:**
```diff
# After line 20 - ADD
+ - **Docker Compose**: Follow compose spec v3.8+, use named volumes, include healthchecks.
+ - **Shell Scripts**: Use `#!/usr/bin/env bash`, include error handling (`set -euo pipefail`).
# Line 20 - REPLACE
- - **Terraform**: precise resource targeting.
+ - **Terraform**: Use modules, implement state management, leverage data sources for existing resources.
```
**Priority 3:**
Add validation section:
```xml
<validation_rules>
After writing code, validate before presenting:
- **Python**: Run `python -m py_compile <file>` to check syntax
- **Ansible**: Run `ansible-playbook --syntax-check <playbook>`
- **Docker Compose**: Run `docker compose config` to validate
- **Shell Scripts**: Run `bash -n <script>` for syntax check
- **YAML/JSON**: Validate structure before writing
</validation_rules>
```
---
## 2.3 Cross-Agent Analysis
### Tool Distribution Matrix
| Tool | Scribe | Librarian | Lab-Operator | Backend-Builder |
|------|--------|-----------|--------------|-----------------|
| **Read** | ✓ | ✗ | ✓ | ✓ |
| **Write** | ✗ | ✗ | ✗ | ✗ |
| **Edit** | ✓ | ✗ | ✓ | ✓ |
| **Grep** | ✓ | ✗ | ✓ | ✓ |
| **Glob** | ✓ | ✗ | ✗ | ✓ |
| **Bash** | ✗ | ✗ | ✓ | ✗ |
### Critical Tool Gaps
| Gap | Agent | Impact |
|-----|-------|--------|
| **No tools at all** | Librarian | **BLOCKING** - Cannot execute ANY git commands |
| **No Bash** | Backend-Builder | **CRITICAL** - Cannot test Python, validate Ansible, check Terraform |
| **No Write** | All 4 agents | **HIGH** - Cannot create new files (only edit existing) |
| **No Glob** | Lab-Operator | **MEDIUM** - Cannot find docker-compose files, configs by pattern |
### Consistency Issues
| Aspect | Scribe | Librarian | Lab-Operator | Backend-Builder |
|--------|--------|-----------|--------------|-----------------|
| **XML tags** | Yes | **No** | Yes | Yes |
| **Tools in frontmatter** | Yes | **No** | Yes | Yes |
| **Color field** | No | Yes | No | No |
| **Line count** | 30 | **127** | 33 | 28 |
| **Steve reference** | Yes | No | Yes | Yes |
| **Safety protocols** | No | Partial | **Yes** | Partial |
### Role Boundary Ambiguities
| Scenario | Possible Agents | Recommendation |
|----------|-----------------|----------------|
| Create docker-compose.yml | Backend-Builder OR Lab-Operator | Backend-Builder creates, Lab-Operator deploys |
| Write Ansible playbook | Backend-Builder OR Lab-Operator | Backend-Builder writes, Lab-Operator executes |
| Update README after code change | Scribe OR Backend-Builder | Backend-Builder notifies, Scribe updates |
| Commit infrastructure changes | Librarian OR Lab-Operator | Lab-Operator makes change, Librarian commits |
## 2.4 Recommended Tool Distribution
### Proposed Standard Toolsets
**Documentation Agents** (Scribe):
```yaml
tools: [Read, Grep, Glob, Edit, Write]
```
- Rationale: Needs all file operations, no system commands
**Operations Agents** (Lab-Operator):
```yaml
tools: [Bash, Read, Grep, Glob, Edit, Write]
```
- Rationale: Needs system commands + all file operations
**Development Agents** (Backend-Builder):
```yaml
tools: [Bash, Read, Grep, Glob, Edit, Write]
```
- Rationale: Needs to test/validate code + all file operations
**Git Agents** (Librarian):
```yaml
tools: [Bash, Read, Grep, Glob, Edit, Write]
```
- Rationale: Git commands + file inspection + .gitignore management
---
# Part 3: Actionable Recommendations
## 3.1 Priority 1 - Critical Fixes (15 minutes total)
### Fix 1: Librarian - Add Tools (2 minutes) **BLOCKING**
**File**: `/home/jramos/homelab/sub-agents/librarian.md`
```diff
---
name: librarian
- description: Use this agent when the user needs Git repository management, including operations like committing changes...
+ description: >
+ Git repository management specialist. Handles commits, branches, merges,
+ history review, .gitignore maintenance, and enforces conventional commit standards.
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
model: sonnet
color: purple
---
```
### Fix 2: Backend-Builder - Add Bash (1 minute) **CRITICAL**
**File**: `/home/jramos/homelab/sub-agents/backend-builder.md`
```diff
---
name: backend-builder
description: >
DevOps and Software Engineer. Writes Python, Ansible playbooks,
Terraform configs, and Shell scripts. Handles IaC and automation.
- tools: [Read, Edit, Grep, Glob]
+ tools: [Read, Edit, Grep, Glob, Write, Bash]
model: sonnet
+ color: orange
---
```
### Fix 3: CLAUDE.md - Fix GitLab References (5 minutes)
**File**: `/home/jramos/homelab/CLAUDE.md`
```diff
# Line 62
- **Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103)...
+ **Automation-First Approach**: The presence of Ansible-Control (106), Gitea, and NetBox (103)...
# Line 97
- 5. **Version Control**: GitLab (101) should house all Infrastructure as Code...
+ 5. **Version Control**: Gitea should house all Infrastructure as Code...
# Line 105
- - **GitLab**: CI/CD pipelines for infrastructure testing and deployment
+ - **Gitea**: Version control and repository management
# Line 126
- - Working directory: /mnt/c/Users/fam1n/Documents/homelab
+ - Working directory: /home/jramos/homelab
# Line 127 - DELETE
- - This repository is not yet initialized as a git repository
```
### Fix 4: Scribe - Remove Broken Placeholder (1 minute)
**File**: `/home/jramos/homelab/sub-agents/scribe.md`
```diff
# Line 20 - DELETE
- [Image of network topology diagram]
```
### Fix 5: Add Write Tool to All Agents (3 minutes)
**Scribe** (line 6):
```diff
- tools: [Read, Grep, Glob, Edit]
+ tools: [Read, Grep, Glob, Edit, Write]
```
**Lab-Operator** (line 6):
```diff
- tools: [Bash, Read, Grep, Edit]
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
```
### Fix 6: Add Missing Color Fields (3 minutes)
**Scribe** (after line 7):
```diff
model: sonnet
+ color: blue
```
**Lab-Operator** (after line 7):
```diff
model: sonnet
+ color: green
```
---
## 3.2 Priority 2 - High-Impact Improvements (90 minutes total)
### Improvement 1: CLAUDE.md - Add Quick Reference (15 minutes)
**File**: `/home/jramos/homelab/CLAUDE.md`
**Location**: After line 8, before "## Infrastructure Overview"
```markdown
## Quick Reference
| Resource | Value |
|----------|-------|
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
| **Proxmox Version** | PVE 8.3.3 |
| **Infrastructure** | 8 VMs, 2 Templates, 4 LXC containers |
feat(agents): optimize sub-agent architecture with comprehensive prompt engineering This commit implements a comprehensive optimization of all sub-agent prompt definitions based on Opus-powered prompt engineering analysis. All agents now match the quality standard established by librarian.md. Agent Improvements: - scribe.md: 29→340 lines (11.7x expansion) * Added 6 usage examples with role clarity * Implemented comprehensive responsibilities section * Added 3 complete ASCII diagram templates * Included safety protocols and decision frameworks - backend-builder.md: 40→291 lines (7.3x expansion) * Added 6 usage examples with clear boundaries * Expanded core responsibilities (Ansible, Terraform, Docker, Python, Shell) * Added technology stack and validation rules tables * Included handoff protocol for lab-operator deployment * Defined clear boundaries (CREATES code, does NOT deploy) - lab-operator.md: 37→193 lines (5.2x expansion) * Added 6 usage examples with role clarity * Expanded domain expertise with specific commands * Added command style guide (5-step pattern) * Included safety protocols and decision-making framework * Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC) - librarian.md: Minor formatting improvements CLAUDE.md Fixes: - Moved YAML frontmatter to line 1 (was incorrectly at line 89) - Fixed trailing pipe character - Completed incomplete sentences about backup strategy and storage growth - Removed redundant information - Expanded status file template with recovery instructions Files Added: - Claude_UPDATES.md: Comprehensive prompt engineering analysis report - monitoring/pve-exporter/pve.yml: PVE monitoring configuration Impact: - Total agent documentation: 249→967 lines (288% increase) - Usage examples: 6→24 total (400% increase) - All agents now have comprehensive safety protocols - Clear role boundaries prevent agent overlap - Validation testing confirms all agents functional 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-07 22:39:40 -07:00
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
| **Version Control** | Gitea at 192.168.2.102:3060 |
| **Working Directory** | /home/jramos/homelab |
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |
**Key Services:**
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
- CT 112 (twingate-connector): Zero-trust network access
- CT 113 (n8n): Workflow automation at 192.168.2.107
```
### Improvement 2: CLAUDE.md - Add Agent Routing Guide (30 minutes)
**File**: `/home/jramos/homelab/CLAUDE.md`
**Location**: After Quick Reference
```markdown
## Agent Selection Guide
When working with this repository, choose the appropriate agent based on task type:
| Task Type | Primary Agent | Tools Available | Notes |
|-----------|---------------|-----------------|-------|
| **Git Operations** | `librarian` | Bash, Read, Grep, Glob, Edit, Write | Commits, branches, merges, .gitignore |
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
| **Complex Multi-Agent** | Main Agent | All tools | Coordinates between specialized agents |
### Task Routing Decision Tree
```
Is this a git/version control task?
├── Yes → Use librarian
└── No ↓
Is this documentation (README, guides, diagrams)?
├── Yes → Use scribe
└── No ↓
Does this require system commands (docker, ssh, proxmox)?
├── Yes → Use lab-operator
└── No ↓
Is this code/config creation (Ansible, Python, Terraform)?
├── Yes → Use backend-builder
└── No → Use Main Agent
```
### Agent Collaboration Patterns
**Documentation Workflow:**
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
2. `scribe` updates documentation to reflect changes
3. `librarian` commits all changes with proper commit message
**Infrastructure Deployment:**
1. `backend-builder` writes IaC (Ansible playbooks, Terraform configs, Docker Compose)
2. `lab-operator` validates and deploys to Proxmox/Docker
3. `scribe` documents deployment procedures and architecture
4. `librarian` commits configuration to repository
**Code Development:**
1. `backend-builder` writes code/scripts
2. `backend-builder` tests with Bash tool
3. `scribe` adds code documentation
4. `librarian` commits with conventional commit message
```
### Improvement 3: CLAUDE.md - Remove Duplicate Tables (20 minutes)
**File**: `/home/jramos/homelab/CLAUDE.md`
**Lines**: Replace 17-56
```markdown
## Infrastructure Overview
**For detailed, current infrastructure inventory, see:**
- **Live Status**: `CLAUDE_STATUS.md` (most current - updated frequently)
- **Service Details**: `services/README.md` (service-specific documentation)
- **Complete Index**: `INDEX.md` (comprehensive repository navigation)
**Quick Summary:**
- **Virtual Machines**: 10 total (IDs: 100, 101, 104-111)
- Highlights: VM 100 (docker-hub), VM 101 (monitoring-docker), VM 106 (Ansible-Control)
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
- Highlights: CT 102 (nginx/NPM), CT 112 (twingate), CT 113 (n8n)
- **Storage Pools**: 5 pools
- local (system), local-lvm (VM disks), Vault (ZFS - secure data)
- PBS-Backups (Proxmox Backup Server), iso-share (installation media)
- **Monitoring Stack**: VM 101 at 192.168.2.114
- Grafana (port 3000), Prometheus (port 9090), PVE Exporter (port 9221)
- **Key Network Services**:
- Nginx Proxy Manager (CT 102), Twingate (CT 112), n8n (CT 113)
**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate VM/CT counts, IP addresses, and current status.
```
### Improvement 4: Lab-Operator - Update Domain Expertise (15 minutes)
**File**: `/home/jramos/homelab/sub-agents/lab-operator.md`
**Lines**: Replace 16-20
```xml
<domain_expertise>
- **Virtualization**: Proxmox VE 8.3.3 (LXC containers, QEMU/KVM virtual machines)
- **Containers**: Docker Compose orchestration on VM hosts (VM 100, 101, 107)
- **Network**: Nginx Proxy Manager (CT 102), VLAN tagging, DNS configuration, reverse proxy
- **Storage**: Proxmox storage architecture
- local (Directory): System files, ISOs, templates
- local-lvm (LVM-Thin): VM disk images (thin provisioned)
- Vault (ZFS Pool): Secure storage for sensitive data
- PBS-Backups: Proxmox Backup Server repository
- iso-share (NFS): Installation media library
- **Monitoring**: Observability stack on VM 101 (192.168.2.114)
- Grafana: Metrics visualization and dashboards
- Prometheus: Time-series database and alerting
- PVE Exporter: Proxmox VE metrics exporter
- **Automation**:
- n8n workflow automation platform (CT 113 at 192.168.2.107)
- Ansible automation (VM 106)
- **Security**:
- Twingate zero-trust network access connector (CT 112)
- Nginx Proxy Manager with SSL/TLS termination
</domain_expertise>
```
### Improvement 5: Backend-Builder - Add Docker Compose & Validation (10 minutes)
**File**: `/home/jramos/homelab/sub-agents/backend-builder.md`
**After line 21**
```xml
<coding_standards>
1. **Secrets Management**: NEVER hardcode passwords or API keys. Use `.env` files or environment variables.
2. **Homelab Stack**:
- **Python**: Use modern libraries (`pydantic` for config, `httpx` for APIs).
- **Ansible**: Ensure playbooks are idempotent with proper error handling.
- **Terraform**: Use modules, implement state management, leverage data sources.
- **Docker Compose**: Follow compose spec v3.8+, use named volumes, include healthchecks.
- **Shell Scripts**: Use `#!/usr/bin/env bash`, include error handling (`set -euo pipefail`).
3. **Error Handling**: Homelabs are messy. Your code must handle network timeouts and missing files gracefully.
</coding_standards>
<validation_rules>
After writing code, validate before presenting to user:
- **Python**: Run `python -m py_compile <file>` to check syntax
- **Ansible**: Run `ansible-playbook --syntax-check <playbook>`
- **Docker Compose**: Run `docker compose config` to validate syntax
- **Shell Scripts**: Run `bash -n <script>` for syntax validation
- **Terraform**: Run `terraform validate` after init
- **YAML/JSON**: Validate structure before writing
</validation_rules>
```
---
## 3.3 Priority 3 - Quality Enhancements (180 minutes total)
### Enhancement 1: CLAUDE.md - Add YAML Frontmatter (5 minutes)
**File**: `/home/jramos/homelab/CLAUDE.md`
**Location**: Very beginning of file
```yaml
---
version: 2.2.0
last_updated: 2025-12-07
infrastructure_source: CLAUDE_STATUS.md
repository_type: homelab_infrastructure
primary_node: serviceslab
primary_node_ip: 192.168.2.200
proxmox_version: 8.3.3
vm_count: 10
lxc_count: 4
working_directory: /home/jramos/homelab
git_remote: http://192.168.2.102:3060/jramos/homelab.git
monitoring_url: http://192.168.2.114:3000
---
```
### Enhancement 2: Remove "Steve" References (5 minutes)
**Files**: scribe.md (line 11), lab-operator.md (line 11), backend-builder.md (line 11)
```diff
# scribe.md line 11
- You are the **Scribe** (formerly Steve's Architecture Module).
+ You are the **Scribe** - Documentation Architect and Technical Writer.
# lab-operator.md line 11
- You are the **Lab Operator** (formerly Steve's Infrastructure Module).
+ You are the **Lab Operator** - Expert Homelab Systems Administrator.
# backend-builder.md line 11
- You are the **Backend Builder** (formerly Steve's Coding Module).
+ You are the **Backend Builder** - DevOps and Infrastructure as Code Specialist.
```
### Enhancement 3: Add Safety Protocols to Scribe (10 minutes)
**File**: `/home/jramos/homelab/sub-agents/scribe.md`
**After line 23**
```xml
<safety_protocols>
1. **Read Before Edit**: Always read existing documentation before modifying
2. **Preserve User Content**: Never overwrite user-created sections without explicit permission
3. **Timestamp Updates**: Include last-updated dates in documentation headers
4. **Link Validation**: When referencing other docs, verify paths exist
5. **No Code Execution**: Document code, don't execute it (use lab-operator or backend-builder)
</safety_protocols>
```
### Enhancement 4: Librarian - Add XML Structure (30 minutes)
**File**: `/home/jramos/homelab/sub-agents/librarian.md`
**Restructure entire prompt body**
```xml
<system_role>
You are the **Librarian** - Git Version Control Specialist for the homelab infrastructure repository.
You have deep expertise in Git workflows, branching strategies, commit conventions, and repository hygiene.
</system_role>
<core_responsibilities>
## 1. Commit Management
- Enforce conventional commit format: `type(scope): description`
- Valid types: feat, fix, docs, style, refactor, test, chore, ci, build, perf
- Ensure commit messages are clear, concise (50 char summary), descriptive body
- Example: `feat(ansible): add nginx reverse proxy playbook for Proxmox CT 102`
- Reference VM/CT IDs and service names in infrastructure commits
- Stage appropriate files and verify changes before committing
- NEVER commit sensitive data (credentials, API keys, private keys)
## 2. Branching Strategy
- Use descriptive branch names: `feature/description`, `bugfix/description`, `hotfix/description`
- Infrastructure examples: `feature/ansible-netbox-integration`, `fix/proxmox-storage-config`
- Create branches from appropriate base (main/develop)
- Keep branches focused on single features or fixes
- Delete merged branches to maintain repository cleanliness
## 3. Merging Operations
- Check for conflicts before merging
- Prefer fast-forward merges for linear history when possible
- Use merge commits for feature branches to preserve context
- Verify all tests pass before completing merges
- Write clear merge commit messages explaining integration
## 4. History Management
- Use `git log` with formatting for readability
- Filter history by file paths, authors, date ranges
- Never rewrite public/shared branch history
- Identify when rebasing or amending is appropriate vs prohibited
## 5. .gitignore Hygiene
- Proactively identify files that should be ignored
- Infrastructure-specific patterns:
* Terraform: `*.tfstate`, `*.tfstate.backup`, `.terraform/`, `terraform.tfvars`
* Ansible: `*.retry`, `vault_pass.txt`, `.vault_password`
* Monitoring: `**/pve.yml` (credentials), `.env` files
* General: `*.log`, `*.swp`, `.DS_Store`
- Organize .gitignore with commented sections
- Check existing .gitignore before suggesting additions
</core_responsibilities>
<safety_protocols>
1. **NEVER** force push to main/master without explicit user confirmation
2. **NEVER** rewrite published/shared history
3. **ALWAYS** verify no sensitive data in staged changes before commit
4. **ALWAYS** require confirmation for destructive operations (hard reset, force push)
5. **BLOCK** commits containing patterns: password, api_key, secret, token (unless in templates)
</safety_protocols>
<quality_assurance>
## Pre-Commit Checks
- Run `git status` to see current state
- Verify no sensitive data in staged changes
- Ensure commit message follows conventional format
- Confirm files being committed are intentional
- Check for debug code, TODOs, temporary files
## Pre-Merge Validation
- Run `git diff` to review changes
- Check for merge conflicts
- Verify branch is up-to-date with target
- Confirm tests pass (if applicable)
</quality_assurance>
<homelab_context>
This homelab infrastructure repository contains:
- Proxmox VM/CT configurations (reference VM/CT IDs in commits)
- Docker Compose service definitions
- Ansible playbooks and roles
- Monitoring stack configs (Grafana/Prometheus)
- Sensitive data in Vault storage (ensure .gitignore coverage)
- Infrastructure as Code (Terraform, Ansible)
Key infrastructure components to reference:
- VMs: 100 (docker-hub), 101 (monitoring-docker), 106 (Ansible-Control), 109-110 (web servers), 111 (database)
- CTs: 102 (nginx/NPM), 103 (netbox), 112 (twingate), 113 (n8n)
- Storage: Vault (sensitive), PBS-Backups (disaster recovery)
</homelab_context>
<output_format>
When performing operations:
1. Explain what you're about to do and why
2. Show the exact Git commands you'll execute
3. Display relevant output or confirmations
4. Summarize the result and next steps
5. Highlight any warnings or recommendations
</output_format>
<escalation>
Seek user clarification when:
- Merge conflicts require manual resolution decisions
- Multiple valid branching strategies could apply
- Commit scope is ambiguous or affects multiple areas
- Destructive operations are requested
- Repository state is unclear or potentially corrupted
</escalation>
```
### Enhancement 5: Add Proxmox Safety to Lab-Operator (5 minutes)
**File**: `/home/jramos/homelab/sub-agents/lab-operator.md`
**After line 26**
```diff
3. **Container Safety**: When modifying `docker-compose.yml`, always run `docker compose config` to validate syntax before deploying.
+ 4. **Proxmox VM/CT Operations**: Confirm before `qm destroy`, `pct destroy`, or snapshot deletion.
+ 5. **Backup Verification**: Before major infrastructure changes, verify recent PBS backup exists.
+ 6. **Monitoring Impact**: Consider impact on Grafana/Prometheus metrics when changing infrastructure.
```
---
## 3.4 Agent Architecture Proposals
### Should Any Agents Be Split?
#### Librarian Analysis
**Current**: Single agent handling all Git operations (127 lines)
**Recommendation**: **DO NOT SPLIT**
**Rationale**:
- Git operations are cohesive and related
- Splitting would create handoff friction
- Same tools needed for all Git tasks
- Better solution: Extract common patterns to CLAUDE.md, reduce line count
#### Lab-Operator Analysis
**Current**: Single agent for infrastructure operations (33 lines)
**Recommendation**: **DO NOT SPLIT** (currently)
**Rationale**:
- Single-node homelab has interconnected operations
- Splitting (docker-specialist, proxmox-specialist, network-specialist) would fragment workflow
- A single deployment may touch Proxmox, Docker, and networking
- **Future consideration**: If infrastructure grows to multi-node, reconsider
#### Backend-Builder Analysis
**Current**: Single agent for all code/IaC (28 lines)
**Recommendation**: **CONSIDER SPLITTING** (medium priority)
**Proposed Split**:
1. **IaC-Builder**: Ansible, Terraform, Docker Compose (declarative configs)
2. **Script-Developer**: Python, Shell (imperative code, custom tooling)
**Rationale**:
- Different mental models: declarative vs imperative
- Different validation approaches
- Different integration points (IaC-Builder → lab-operator; Script-Developer → monitoring)
- Manageable cognitive load for each
**Implementation Effort**: 60 minutes
### New Agent Proposals
#### 1. Infrastructure-Auditor (HIGH PRIORITY)
**Purpose**: Security scanning, compliance checking, configuration drift detection
**Justification**:
- Current agents focus on creation/modification, not validation
- Homelab has sensitive components (Vault storage, credentials in monitoring configs)
- PBS backups need verification
- Configuration drift between IaC and reality
**Proposed Definition**:
```yaml
---
name: infrastructure-auditor
description: >
Security and compliance specialist. Scans for misconfigurations, exposed credentials,
outdated packages, configuration drift, and security vulnerabilities.
tools: [Bash, Read, Grep, Glob]
model: sonnet
color: red
---
<system_role>
You are the **Infrastructure Auditor** - Security and compliance specialist.
Your job is to find problems before they become incidents.
</system_role>
<audit_domains>
1. **Credential Exposure**: Scan for hardcoded secrets, exposed API keys, plaintext passwords
- Check for patterns: password=, api_key=, token=, secret=
- Verify .gitignore coverage for sensitive files
- Validate environment variable usage vs hardcoding
2. **Configuration Drift**: Compare running state to declared state
- Compare docker-compose configs to running containers
- Verify Proxmox VM/CT configs match documentation
- Check Ansible playbook state vs actual system state
3. **Package Security**: Check for outdated packages with known CVEs
- Proxmox package versions
- Docker image versions
- Python package versions
4. **Backup Verification**: Validate PBS backup integrity and recency
- Check last backup timestamp for critical VMs/CTs
- Verify backup size and integrity
- Test restore procedures (read-only simulation)
5. **Permission Audit**: Review file permissions and access controls
- Docker socket exposure
- Sudo access configurations
- File ownership and permissions
6. **Network Security**: Review exposed services and ports
- Check for services listening on 0.0.0.0
- Verify firewall rules
- Audit reverse proxy configurations
</audit_domains>
<safety_protocols>
1. **READ-ONLY OPERATIONS**: NEVER modify anything - audit only
2. **Report Findings**: Document issues, do not auto-remediate
3. **Escalate Critical Issues**: Immediately flag exposed credentials or critical vulnerabilities
4. **No Destructive Checks**: Do not run tests that could impact running services
</safety_protocols>
<audit_checklist>
Run these checks on demand or scheduled:
- [ ] Scan all .env, .yml, .yaml files for hardcoded credentials
- [ ] Verify .gitignore covers all sensitive files
- [ ] Check PBS backup status for all critical VMs/CTs
- [ ] Compare Grafana datasources to prometheus.yml
- [ ] Audit Nginx Proxy Manager SSL certificate expiration
- [ ] Check for exposed Docker sockets
- [ ] Verify Twingate connector status
- [ ] Review n8n workflow credential storage
</audit_checklist>
```
**Implementation Effort**: 45 minutes
**Priority**: HIGH - Addresses security gap in current agent coverage
#### 2. Backup-Manager (DEFER)
**Purpose**: PBS operations, disaster recovery, restore testing
**Recommendation**: **DEFER** - Lab-Operator can handle backup operations
**Rationale**:
- PBS operations infrequent
- Lab-Operator has necessary tools and expertise
- Would add complexity without significant benefit
- **Reconsider**: When backup operations become more complex or automated
#### 3. Monitoring-Specialist (DEFER)
**Purpose**: Grafana dashboards, Prometheus queries, alerting
**Recommendation**: **DEFER** - Backend-Builder can handle monitoring configs
**Rationale**:
- Monitoring configs are code (YAML, PromQL)
- Backend-Builder has appropriate tools
- Grafana/Prometheus documentation is good
- **Reconsider**: When alerting becomes complex or requires dedicated expertise
---
## 3.5 Proposed Final Agent Architecture
### Recommended Structure (5-6 Agents)
```
┌─────────────────────────────────────────────────────────────────┐
│ DOCUMENTATION LAYER │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Scribe (documentation, architecture, diagrams) │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ VERSION CONTROL LAYER │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Librarian (git operations, commits, branches) │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ OPERATIONS LAYER │
│ ┌────────────────────┐ ┌────────────────────────────────┐ │
│ │ Lab-Operator │ │ Infrastructure-Auditor (NEW) │ │
│ │ (infra mgmt) │ │ (security scanning) │ │
│ └────────────────────┘ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DEVELOPMENT LAYER │
│ ┌────────────────────┐ ┌────────────────────────────────┐ │
│ │ IaC-Builder │ │ Script-Developer │ │
│ │ (Ansible, Terraform,│ (Python, Shell automation) │ │
│ │ Docker Compose) │ │ │ │
│ └────────────────────┘ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
### Implementation Phases
**Phase 1: Critical Fixes** (Day 1 - 15 minutes)
- Fix librarian tools
- Add Bash to backend-builder
- Fix CLAUDE.md GitLab references
- Add Write tool to all agents
**Phase 2: High-Impact** (Week 1 - 90 minutes)
- Add Quick Reference to CLAUDE.md
- Add Agent Routing Guide to CLAUDE.md
- Update lab-operator domain expertise
- Add validation rules to backend-builder
**Phase 3: Quality Enhancements** (Week 2 - 180 minutes)
- Add YAML frontmatter to CLAUDE.md
- Restructure librarian with XML
- Add safety protocols to all agents
- Remove "Steve" references
**Phase 4: Architecture Expansion** (Month 1 - 120 minutes)
- Create Infrastructure-Auditor agent
- Split Backend-Builder into IaC-Builder + Script-Developer
- Test and refine agent boundaries
---
# Part 4: Implementation Checklist
## Quick Reference: Files to Modify
| File | Priority 1 | Priority 2 | Priority 3 | Total Changes |
|------|-----------|-----------|-----------|---------------|
| `/home/jramos/homelab/CLAUDE.md` | 5 fixes | 3 additions | 1 frontmatter | 9 edits |
| `/home/jramos/homelab/sub-agents/scribe.md` | 3 fixes | 0 | 2 enhancements | 5 edits |
| `/home/jramos/homelab/sub-agents/librarian.md` | 2 fixes | 1 restructure | 1 restructure | 4 edits |
| `/home/jramos/homelab/sub-agents/lab-operator.md` | 2 fixes | 1 update | 2 additions | 5 edits |
| `/home/jramos/homelab/sub-agents/backend-builder.md` | 2 fixes | 1 addition | 1 addition | 4 edits |
| **TOTAL** | **14** | **6** | **7** | **27 edits** |
## Detailed Implementation Checklist
### Priority 1: Critical Fixes (15 minutes)
- [ ] **librarian.md**: Add tools field (line 5)
- `tools: [Bash, Read, Grep, Glob, Edit, Write]`
- [ ] **librarian.md**: Condense description (line 3)
- Remove examples, keep 2-3 sentences
- [ ] **backend-builder.md**: Add Bash and Write (line 6)
- `tools: [Read, Edit, Grep, Glob, Write, Bash]`
- [ ] **backend-builder.md**: Add color field
- `color: orange`
- [ ] **scribe.md**: Add Write tool (line 6)
- `tools: [Read, Grep, Glob, Edit, Write]`
- [ ] **scribe.md**: Add color field
- `color: blue`
- [ ] **scribe.md**: Delete broken placeholder (line 20)
- [ ] **lab-operator.md**: Add Glob and Write (line 6)
- `tools: [Bash, Read, Grep, Glob, Edit, Write]`
- [ ] **lab-operator.md**: Add color field
- `color: green`
- [ ] **CLAUDE.md**: Fix GitLab → Gitea (lines 62, 97, 105)
- [ ] **CLAUDE.md**: Fix working directory (line 126)
- [ ] **CLAUDE.md**: Delete "not initialized" line (127)
- [ ] **CLAUDE.md**: Fix storage percentage reference (line 89)
### Priority 2: High-Impact Improvements (90 minutes)
- [ ] **CLAUDE.md**: Add YAML frontmatter (beginning)
- [ ] **CLAUDE.md**: Add Quick Reference section (after line 8)
- [ ] **CLAUDE.md**: Add Agent Routing Guide (after Quick Reference)
- [ ] **CLAUDE.md**: Replace duplicate tables with references (lines 17-56)
- [ ] **lab-operator.md**: Update domain expertise (lines 16-20)
- [ ] **backend-builder.md**: Add Docker Compose guidance (after line 20)
- [ ] **backend-builder.md**: Add validation rules section (after line 27)
### Priority 3: Quality Enhancements (180 minutes)
- [ ] **scribe.md**: Remove "Steve" reference (line 11)
- [ ] **scribe.md**: Update docs directory reference (line 16)
- [ ] **scribe.md**: Add safety protocols section (after line 23)
- [ ] **librarian.md**: Restructure with XML tags (entire prompt body)
- [ ] **librarian.md**: Move examples to prompt body
- [ ] **lab-operator.md**: Remove "Steve" reference (line 11)
- [ ] **lab-operator.md**: Add Proxmox safety protocols (after line 26)
- [ ] **backend-builder.md**: Remove "Steve" reference (line 11)
### Future Enhancements (Optional)
- [ ] Create `infrastructure-auditor.md` agent
- [ ] Split `backend-builder` into `iac-builder` and `script-developer`
- [ ] Extract common patterns from librarian to CLAUDE.md
- [ ] Add examples section to CLAUDE.md
- [ ] Create agent capability testing suite
---
# Part 5: Expected Outcomes
## Before vs After Comparison
### Current State Issues
| Issue | Impact | Affected Agents |
|-------|--------|-----------------|
| Librarian has no tools | **BLOCKING** - Cannot execute ANY git commands | 1 |
| Backend-Builder lacks Bash | **CRITICAL** - Cannot test code | 1 |
| No agent has Write tool | **HIGH** - Cannot create new files | 4 |
| CLAUDE.md has stale GitLab refs | **HIGH** - Misleading documentation | N/A |
| Duplicate infrastructure tables | **MEDIUM** - Maintenance burden | N/A |
| Inconsistent agent structure | **MEDIUM** - Confusion, learning curve | 4 |
### Post-Implementation Benefits
| Improvement | Benefit | Measurable Impact |
|-------------|---------|-------------------|
| All agents have proper tools | Functional, can complete tasks | 100% → 100% capability |
| CLAUDE.md has Quick Reference | Faster context gathering | ~5 min → ~30 sec |
| Agent Routing Guide | Clear task assignment | Reduced user decision time |
| No duplicate tables | Easier maintenance | 5 files → 1 file to update |
| Consistent agent structure | Easier to understand/maintain | Uniform XML structure |
| Infrastructure-Auditor | Security coverage | New capability |
## Success Metrics
### Quantitative
- **Tool Coverage**: 0% (librarian) → 100% (all agents functional)
- **Documentation Accuracy**: 5 stale references → 0 stale references
- **Agent Consistency**: 25% use XML tags → 100% use XML tags
- **Color Field Coverage**: 25% have color → 100% have color
- **Information Duplication**: Infrastructure in 5 files → 1 canonical file
### Qualitative
- **User Experience**: Clear agent selection vs guesswork
- **Maintenance Burden**: Single source of truth for infrastructure
- **Security Posture**: Proactive scanning capability
- **Documentation Quality**: Up-to-date, accurate, easy to navigate
- **Agent Clarity**: Well-defined boundaries and responsibilities
---
# Conclusion
This analysis identified **critical blocking issues** (librarian non-functional, backend-builder cannot test code) alongside **significant structural improvements** (outdated references, duplicate information, missing routing guidance).
## Immediate Action Required
1. **Fix librarian tools** (2 minutes) - **BLOCKING** issue
2. **Add Bash to backend-builder** (1 minute) - **CRITICAL** issue
3. **Fix CLAUDE.md GitLab references** (5 minutes) - **HIGH** priority
**Total time for critical fixes: 15 minutes**
## High-Value Improvements
1. Add Quick Reference to CLAUDE.md (15 min)
2. Add Agent Routing Guide (30 min)
3. Remove duplicate infrastructure tables (20 min)
**Total time for high-impact: 90 minutes**
## Long-Term Vision
With all improvements implemented:
- **All agents functional** with proper tools
- **Clear documentation** with quick reference and routing guide
- **Consistent structure** across all agent definitions
- **Security coverage** with infrastructure-auditor
- **Reduced maintenance** through single source of truth
**Total implementation effort**: ~5 hours for complete transformation
---
**Generated**: 2025-12-07
**Analysis Tool**: Claude Opus 4.5
**Scope**: CLAUDE.md + 4 sub-agents (scribe, librarian, lab-operator, backend-builder)
**Total Issues Identified**: 27 (5 critical, 12 high-impact, 10 enhancements)