This commit implements a comprehensive optimization of all sub-agent prompt definitions based on Opus-powered prompt engineering analysis. All agents now match the quality standard established by librarian.md. Agent Improvements: - scribe.md: 29→340 lines (11.7x expansion) * Added 6 usage examples with role clarity * Implemented comprehensive responsibilities section * Added 3 complete ASCII diagram templates * Included safety protocols and decision frameworks - backend-builder.md: 40→291 lines (7.3x expansion) * Added 6 usage examples with clear boundaries * Expanded core responsibilities (Ansible, Terraform, Docker, Python, Shell) * Added technology stack and validation rules tables * Included handoff protocol for lab-operator deployment * Defined clear boundaries (CREATES code, does NOT deploy) - lab-operator.md: 37→193 lines (5.2x expansion) * Added 6 usage examples with role clarity * Expanded domain expertise with specific commands * Added command style guide (5-step pattern) * Included safety protocols and decision-making framework * Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC) - librarian.md: Minor formatting improvements CLAUDE.md Fixes: - Moved YAML frontmatter to line 1 (was incorrectly at line 89) - Fixed trailing pipe character - Completed incomplete sentences about backup strategy and storage growth - Removed redundant information - Expanded status file template with recovery instructions Files Added: - Claude_UPDATES.md: Comprehensive prompt engineering analysis report - monitoring/pve-exporter/pve.yml: PVE monitoring configuration Impact: - Total agent documentation: 249→967 lines (288% increase) - Usage examples: 6→24 total (400% increase) - All agents now have comprehensive safety protocols - Clear role boundaries prevent agent overlap - Validation testing confirms all agents functional 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1613 lines
55 KiB
Markdown
1613 lines
55 KiB
Markdown
# Claude Code Homelab Repository - Comprehensive Analysis & Improvement Recommendations
|
|
|
|
**Date**: 2025-12-07
|
|
**Scope**: CLAUDE.md + Sub-Agent Architecture Review
|
|
**Methodology**: Opus-powered prompt engineering analysis
|
|
**Repository**: `/home/jramos/homelab/`
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This comprehensive analysis evaluated the CLAUDE.md guidance file and all four sub-agent definitions (scribe, librarian, lab-operator, backend-builder) for efficiency, clarity, and effectiveness. The review identified **5 critical issues**, **12 high-impact improvements**, and **15 structural enhancements** that would significantly improve the agent system's functionality and maintainability.
|
|
|
|
### Critical Findings
|
|
|
|
1. **BLOCKING: Librarian Agent Non-Functional** - No tools defined in frontmatter; cannot execute ANY git commands
|
|
2. **BLOCKING: Backend-Builder Cannot Test Code** - Missing Bash tool; cannot validate any scripts or playbooks written
|
|
3. **HIGH: No Agent Can Create Files** - All agents lack Write tool; can only modify existing files
|
|
4. **HIGH: CLAUDE.md Has Stale References** - 5 references to decommissioned GitLab, wrong working directory path
|
|
5. **HIGH: Information Duplication Crisis** - Infrastructure tables duplicated across 5 files, creating maintenance burden
|
|
|
|
### Quick Win Opportunities (5-20 minutes each)
|
|
|
|
- Fix librarian tools: **2 minutes**, **CRITICAL impact**
|
|
- Fix GitLab references in CLAUDE.md: **5 minutes**, **high impact**
|
|
- Add Write tool to all agents: **3 minutes**, **high impact**
|
|
- Remove broken placeholder from scribe: **1 minute**, **medium impact**
|
|
|
|
### Total Estimated Effort
|
|
|
|
- **Priority 1 fixes**: ~15 minutes
|
|
- **Priority 2 improvements**: ~90 minutes
|
|
- **Priority 3 enhancements**: ~180 minutes
|
|
- **Full implementation**: ~5 hours
|
|
|
|
---
|
|
|
|
# Part 1: CLAUDE.md Analysis
|
|
|
|
## 1.1 Current State Assessment
|
|
|
|
**File**: `/home/jramos/homelab/CLAUDE.md`
|
|
**Length**: 130 lines
|
|
**Purpose**: Primary context file for Claude Code agents working in this repository
|
|
**Last Updated**: Unknown (no version tracking)
|
|
|
|
### Strengths
|
|
|
|
| Aspect | Details |
|
|
|--------|---------|
|
|
| **Infrastructure Context** | Lines 17-33 provide clear VM inventory with IDs, names, purposes |
|
|
| **Architecture Rationale** | Lines 58-70 explain the "why" behind design decisions |
|
|
| **Workflow Template** | Lines 74-84 establish a universal workflow pattern |
|
|
| **Storage Documentation** | Lines 45-56 document storage architecture comprehensively |
|
|
|
|
### Critical Issues
|
|
|
|
| Severity | Line(s) | Issue | Impact |
|
|
|----------|---------|-------|--------|
|
|
| **HIGH** | 62 | References "GitLab (101)" in Architecture Patterns - GitLab decommissioned | Misleading |
|
|
| **HIGH** | 97 | "GitLab (101) should house all IaC" - Service no longer exists | Incorrect |
|
|
| **HIGH** | 105 | "GitLab: CI/CD pipelines" - Wrong service listed | Confusing |
|
|
| **HIGH** | 126 | Wrong path "/mnt/c/Users/fam1n/Documents/homelab" | Breaks navigation |
|
|
| **HIGH** | 127 | "not yet initialized as a git repository" - Repository IS initialized | Factually wrong |
|
|
| **MEDIUM** | 89 | States "PBS-Backups at 21.6%" but line 54 says 27.43% | Inconsistent |
|
|
| **MEDIUM** | 110-112 | Hardcoded uptime numbers (27-68 days) become stale | Maintenance burden |
|
|
|
|
### Structural Issues
|
|
|
|
#### 1.1.1 Information Duplication
|
|
|
|
The VM/LXC/Storage tables (lines 17-56) duplicate content from:
|
|
- `CLAUDE_STATUS.md` (lines 17-45)
|
|
- `INDEX.md` (lines 314-349)
|
|
- `README.md` (lines 18-33)
|
|
- `services/README.md` (mentions throughout)
|
|
|
|
**Impact**: Updates require changing 5 files, creating drift risk and maintenance overhead.
|
|
|
|
#### 1.1.2 Missing Critical Sections
|
|
|
|
- **No Quick Reference**: Takes too long to find key info (node IP, monitoring URL, repo location)
|
|
- **No Agent Routing Guide**: No guidance on which agent to use for which task
|
|
- **No Version Tracking**: No YAML frontmatter or last-updated timestamp
|
|
- **No Tool-to-Task Mappings**: Agents don't know their capabilities vs requirements
|
|
|
|
#### 1.1.3 Outdated Information
|
|
|
|
| Line | Current Text | Reality |
|
|
|------|--------------|---------|
|
|
| 62 | "GitLab (101)" | Gitea (external) or monitoring-docker (VM 101) |
|
|
| 89 | "21.6% utilization" | Should reference CLAUDE_STATUS.md for current |
|
|
| 97 | "GitLab (101) should house all IaC" | Gitea now handles version control |
|
|
| 105 | "GitLab: CI/CD pipelines" | Should be "Gitea: Version control" |
|
|
|
|
## 1.2 Recommended CLAUDE.md Restructuring
|
|
|
|
### Priority 1: Immediate Fixes (5 minutes total)
|
|
|
|
#### Fix 1: Update GitLab References
|
|
```diff
|
|
# Line 62
|
|
- **Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103)...
|
|
+ **Automation-First Approach**: The presence of Ansible-Control (106), Gitea, and NetBox (103)...
|
|
|
|
# Line 97
|
|
- 5. **Version Control**: GitLab (101) should house all Infrastructure as Code, scripts, and configuration files from this repository.
|
|
+ 5. **Version Control**: Gitea should house all Infrastructure as Code, scripts, and configuration files from this repository.
|
|
|
|
# Line 105
|
|
- - **GitLab**: CI/CD pipelines for infrastructure testing and deployment
|
|
+ - **Gitea**: Version control and repository management
|
|
```
|
|
|
|
#### Fix 2: Correct Working Directory
|
|
```diff
|
|
# Line 126
|
|
- - Working directory: /mnt/c/Users/fam1n/Documents/homelab
|
|
+ - Working directory: /home/jramos/homelab
|
|
```
|
|
|
|
#### Fix 3: Remove False Statement
|
|
```diff
|
|
# Line 127 - DELETE THIS LINE
|
|
- - This repository is not yet initialized as a git repository
|
|
```
|
|
|
|
#### Fix 4: Fix Storage Percentage
|
|
```diff
|
|
# Line 89
|
|
- 1. **Backup Strategy**: With PBS-Backups at 21.6% utilization...
|
|
+ 1. **Backup Strategy**: With PBS-Backups utilization growing (see CLAUDE_STATUS.md for current)...
|
|
```
|
|
|
|
### Priority 2: Add Quick Reference Section (15 minutes)
|
|
|
|
**Insert after line 8, before "## Infrastructure Overview":**
|
|
|
|
```markdown
|
|
## Quick Reference
|
|
|
|
| Resource | Value |
|
|
|----------|-------|
|
|
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
|
|
| **Proxmox Version** | PVE 8.3.3 |
|
|
| **Infrastructure** | 10 VMs, 4 LXC containers |
|
|
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
|
|
| **Version Control** | Gitea at 192.168.2.102:3060 |
|
|
| **Working Directory** | /home/jramos/homelab |
|
|
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |
|
|
|
|
**Key Services:**
|
|
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
|
|
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
|
|
- CT 112 (twingate-connector): Zero-trust network access
|
|
- CT 113 (n8n): Workflow automation at 192.168.2.107
|
|
```
|
|
|
|
### Priority 2: Add Agent Routing Guide (30 minutes)
|
|
|
|
**Insert after Quick Reference:**
|
|
|
|
```markdown
|
|
## Agent Selection Guide
|
|
|
|
When working with this repository, choose the appropriate agent based on task type:
|
|
|
|
| Task Type | Primary Agent | Tools Available | Notes |
|
|
|-----------|---------------|-----------------|-------|
|
|
| **Git Operations** | `librarian` | Bash, Read, Grep, Edit, Write | Commits, branches, merges, .gitignore |
|
|
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
|
|
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
|
|
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
|
|
| **File Creation** | Main Agent | All tools | Use when sub-agents lack specific tools |
|
|
| **Complex Multi-Agent Tasks** | Main Agent | All tools | Coordinates between specialized agents |
|
|
|
|
### Task Routing Decision Tree
|
|
|
|
```
|
|
Is this a git/version control task?
|
|
├── Yes → Use librarian
|
|
└── No ↓
|
|
|
|
Is this documentation (README, guides, diagrams)?
|
|
├── Yes → Use scribe
|
|
└── No ↓
|
|
|
|
Does this require system commands (docker, ssh, proxmox)?
|
|
├── Yes → Use lab-operator
|
|
└── No ↓
|
|
|
|
Is this code/config creation (Ansible, Python, Terraform)?
|
|
├── Yes → Use backend-builder
|
|
└── No → Use Main Agent
|
|
```
|
|
|
|
### Agent Collaboration Patterns
|
|
|
|
**Documentation Workflow:**
|
|
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
|
|
2. `scribe` updates documentation
|
|
3. `librarian` commits all changes
|
|
|
|
**Infrastructure Deployment:**
|
|
1. `backend-builder` writes IaC (Ansible/Terraform/Compose)
|
|
2. `lab-operator` deploys to Proxmox/Docker
|
|
3. `scribe` documents deployment
|
|
4. `librarian` commits configuration
|
|
```
|
|
|
|
### Priority 2: Remove Duplicate Infrastructure Tables (20 minutes)
|
|
|
|
**Replace lines 17-56 with:**
|
|
|
|
```markdown
|
|
## Infrastructure Overview
|
|
|
|
**For detailed, current infrastructure inventory, see:**
|
|
- **Live Status**: `CLAUDE_STATUS.md` (most current)
|
|
- **Service Details**: `services/README.md`
|
|
- **Complete Index**: `INDEX.md`
|
|
|
|
**Quick Summary:**
|
|
- **VMs**: 10 total (IDs: 100, 101, 104-111)
|
|
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
|
|
- **Storage Pools**: local, local-lvm, Vault (ZFS), PBS-Backups, iso-share
|
|
- **Monitoring**: VM 101 at 192.168.2.114 (Grafana/Prometheus/PVE Exporter)
|
|
- **Key Services**: See Quick Reference above
|
|
|
|
**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate counts, IPs, and status.
|
|
```
|
|
|
|
### Priority 3: Add YAML Frontmatter (5 minutes)
|
|
|
|
**Insert at very beginning of file:**
|
|
|
|
```yaml
|
|
---
|
|
version: 2.2.0
|
|
last_updated: 2025-12-07
|
|
infrastructure_source: CLAUDE_STATUS.md
|
|
repository_type: homelab
|
|
primary_node: serviceslab
|
|
proxmox_version: 8.3.3
|
|
vm_count: 10
|
|
lxc_count: 4
|
|
working_directory: /home/jramos/homelab
|
|
git_remote: http://192.168.2.102:3060/jramos/homelab.git
|
|
---
|
|
```
|
|
|
|
## 1.3 Complete Proposed CLAUDE.md Structure
|
|
|
|
```markdown
|
|
---
|
|
version: 2.2.0
|
|
last_updated: 2025-12-07
|
|
infrastructure_source: CLAUDE_STATUS.md
|
|
---
|
|
|
|
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code when working with this homelab infrastructure repository.
|
|
|
|
## Quick Reference
|
|
[Key info table - 10 lines]
|
|
|
|
## Agent Selection Guide
|
|
[Task routing decision tree - 30 lines]
|
|
|
|
## Repository Overview
|
|
[High-level purpose - 10 lines]
|
|
|
|
## Infrastructure Reference
|
|
[Link to CLAUDE_STATUS.md - 15 lines]
|
|
|
|
## Working with This Environment
|
|
### Universal Workflow
|
|
[Existing content - 15 lines]
|
|
|
|
## Architecture Principles
|
|
[Condensed from current patterns - 20 lines]
|
|
|
|
## Best Practices
|
|
[Updated practices - 15 lines]
|
|
|
|
## Development Setup
|
|
[Existing content - 10 lines]
|
|
|
|
## Notes
|
|
[Updated notes - 5 lines]
|
|
```
|
|
|
|
**Estimated new length**: ~130 lines (same as current)
|
|
**Information density**: Significantly higher
|
|
**Maintenance burden**: Reduced (references instead of duplicates)
|
|
|
|
---
|
|
|
|
# Part 2: Sub-Agent Architecture Analysis
|
|
|
|
## 2.1 Agent Inventory
|
|
|
|
| Agent | File | Lines | Tools Defined | Status |
|
|
|-------|------|-------|---------------|--------|
|
|
| **scribe** | sub-agents/scribe.md | 30 | Read, Grep, Glob, Edit | Missing Write |
|
|
| **librarian** | sub-agents/librarian.md | 127 | **NONE** | **NON-FUNCTIONAL** |
|
|
| **lab-operator** | sub-agents/lab-operator.md | 33 | Bash, Read, Grep, Edit | Missing Glob, Write |
|
|
| **backend-builder** | sub-agents/backend-builder.md | 28 | Read, Edit, Grep, Glob | Missing Write, Bash |
|
|
|
|
## 2.2 Individual Agent Reviews
|
|
|
|
### 2.2.1 Scribe Agent
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/scribe.md`
|
|
|
|
#### Frontmatter (Lines 1-8)
|
|
|
|
```yaml
|
|
---
|
|
name: scribe
|
|
description: >
|
|
Homelab Architect and Technical Writer. Explains concepts, designs network topologies,
|
|
summarizes project structures, and maintains documentation (READMEs).
|
|
tools: [Read, Grep, Glob, Edit]
|
|
model: sonnet
|
|
---
|
|
```
|
|
|
|
**Strengths:**
|
|
- Clean YAML structure
|
|
- Clear description
|
|
- Appropriate model
|
|
|
|
**Issues:**
|
|
| Line | Issue | Impact |
|
|
|------|-------|--------|
|
|
| 6 | Missing `Write` tool | Cannot create new documentation files |
|
|
| Missing | No `color` field | Inconsistent with librarian |
|
|
|
|
#### Prompt Body Analysis
|
|
|
|
**Lines 11-12:**
|
|
```
|
|
You are the **Scribe** (formerly Steve's Architecture Module).
|
|
```
|
|
- "Steve" reference confusing without context
|
|
- **Recommendation**: Remove "(formerly Steve's Architecture Module)"
|
|
|
|
**Line 16:**
|
|
```
|
|
1. **Documentation**: Keep `README.md` and `docs/` up to date
|
|
```
|
|
- References `docs/` directory that doesn't exist
|
|
- **Recommendation**: Update to actual docs locations
|
|
|
|
**Line 20 - CRITICAL ISSUE:**
|
|
```
|
|
[Image of network topology diagram]
|
|
```
|
|
- Broken placeholder, incomplete
|
|
- **Recommendation**: Delete this line immediately
|
|
|
|
**Line 28:**
|
|
```
|
|
- Do not execute code. Your job is to plan and explain.
|
|
```
|
|
- Conflicts with having `Edit` tool (which modifies files)
|
|
- **Recommendation**: Clarify "Do not execute system commands via Bash"
|
|
|
|
#### Scribe Recommendations
|
|
|
|
**Priority 1 (CRITICAL):**
|
|
```diff
|
|
# Line 6
|
|
- tools: [Read, Grep, Glob, Edit]
|
|
+ tools: [Read, Grep, Glob, Edit, Write]
|
|
|
|
# Line 20 - DELETE
|
|
- [Image of network topology diagram]
|
|
|
|
# After Line 7
|
|
+ color: blue
|
|
```
|
|
|
|
**Priority 2:**
|
|
```diff
|
|
# Line 11
|
|
- You are the **Scribe** (formerly Steve's Architecture Module).
|
|
+ You are the **Scribe** - Documentation Architect and Technical Writer.
|
|
|
|
# Line 16
|
|
- Keep `README.md` and `docs/` up to date
|
|
+ Keep `README.md`, `services/README.md`, and infrastructure docs up to date
|
|
```
|
|
|
|
---
|
|
|
|
### 2.2.2 Librarian Agent
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/librarian.md`
|
|
|
|
#### Frontmatter (Lines 1-6) - CRITICAL ISSUE
|
|
|
|
```yaml
|
|
---
|
|
name: librarian
|
|
description: Use this agent when the user needs Git repository management...
|
|
model: sonnet
|
|
color: purple
|
|
---
|
|
```
|
|
|
|
**BLOCKING ISSUE**: No `tools` field defined
|
|
|
|
**Impact**: Agent cannot execute ANY git commands. Completely non-functional.
|
|
|
|
#### Description Field - Major Problem
|
|
|
|
**Line 3**: Description is 552 words with 6 embedded examples
|
|
|
|
Example excerpt:
|
|
```
|
|
description: Use this agent when...
|
|
|
|
- Example 1 (Commit Operation):
|
|
user: "I've finished implementing..."
|
|
assistant: "I'll use the git-version-control agent..."
|
|
[... 5 more examples ...]
|
|
```
|
|
|
|
**Issues:**
|
|
1. Examples should be in prompt body, not frontmatter
|
|
2. Description unparseable by automated systems
|
|
3. Violates YAML frontmatter conventions
|
|
|
|
#### Prompt Body (Lines 8-125)
|
|
|
|
**Line count**: 118 lines (4x longer than other agents)
|
|
|
|
**Structure**: Professional prose (no XML tags like other agents)
|
|
|
|
**Strengths:**
|
|
- Comprehensive Git guidance
|
|
- Excellent safety protocols
|
|
- Infrastructure-aware (mentions VM/CT IDs)
|
|
- Good conventional commit examples
|
|
|
|
**Issues:**
|
|
| Line | Issue |
|
|
|------|-------|
|
|
| 8 | Prose style vs XML tags in other agents |
|
|
| 14-125 | Could be condensed by moving common patterns to CLAUDE.md |
|
|
|
|
#### Librarian Recommendations
|
|
|
|
**Priority 1 (CRITICAL) - MUST FIX:**
|
|
|
|
```diff
|
|
# Line 3
|
|
- description: Use this agent when the user needs Git repository management, including...
|
|
+ description: >
|
|
+ Git repository management specialist. Handles commits, branches, merges,
|
|
+ history review, .gitignore maintenance, and enforces conventional commit standards.
|
|
|
|
# After line 5 - ADD THIS
|
|
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
|
|
```
|
|
|
|
**Priority 2:**
|
|
|
|
Move examples from description to prompt body:
|
|
```markdown
|
|
## Usage Examples
|
|
|
|
### Commit Operation
|
|
User: "I've finished implementing the Ansible playbook for nginx configuration."
|
|
Action: Create properly formatted conventional commit.
|
|
|
|
### Branch Management
|
|
User: "Create a new feature branch for NetBox integration."
|
|
Action: Create appropriately named feature branch.
|
|
|
|
[... remaining examples ...]
|
|
```
|
|
|
|
**Priority 3:**
|
|
|
|
Add XML structure for consistency:
|
|
```xml
|
|
<system_role>
|
|
You are the **Librarian** - Git Version Control Specialist for the homelab repository.
|
|
</system_role>
|
|
|
|
<core_responsibilities>
|
|
[existing commit management section]
|
|
</core_responsibilities>
|
|
|
|
<safety_protocols>
|
|
1. NEVER force push to main/master
|
|
2. NEVER rewrite published history
|
|
3. Require confirmation for destructive operations
|
|
4. Block commits containing sensitive data patterns
|
|
</safety_protocols>
|
|
```
|
|
|
|
---
|
|
|
|
### 2.2.3 Lab-Operator Agent
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/lab-operator.md`
|
|
|
|
#### Frontmatter (Lines 1-8)
|
|
|
|
```yaml
|
|
---
|
|
name: lab-operator
|
|
description: >
|
|
Expert Homelab SysAdmin. Manages Proxmox, Docker, Kubernetes, TrueNAS, networking (pfSense/VLANs),
|
|
and Linux server administration. Handles package installation and system config.
|
|
tools: [Bash, Read, Grep, Edit]
|
|
model: sonnet
|
|
---
|
|
```
|
|
|
|
**Issues:**
|
|
| Line | Issue | Impact |
|
|
|------|-------|--------|
|
|
| 4-5 | Mentions Kubernetes, TrueNAS, pfSense not in homelab | Misleading |
|
|
| 6 | Missing `Glob` tool | Cannot find files by pattern |
|
|
| 6 | Missing `Write` tool | Cannot create new configs |
|
|
| Missing | No `color` field | Inconsistent |
|
|
|
|
#### Prompt Body (Lines 10-33)
|
|
|
|
**Strengths:**
|
|
- XML tag structure consistent with scribe/backend-builder
|
|
- Excellent `<safety_protocols>` section
|
|
- Good response style guidance
|
|
|
|
**Lines 16-20 - Domain Expertise Issues:**
|
|
```xml
|
|
<domain_expertise>
|
|
- **Virtualization**: Proxmox VE (LXC/VM management), ESXi.
|
|
- **Containers**: Docker Compose, Portainer, Kubernetes (k3s/microk8s).
|
|
- **Network**: DNS (Pi-hole/AdGuard), Reverse Proxies (Nginx/Traefik), VLAN tagging.
|
|
- **Storage**: ZFS pool management, NFS/SMB shares.
|
|
</domain_expertise>
|
|
```
|
|
|
|
**Problems:**
|
|
- Mentions ESXi, Portainer, Kubernetes, Pi-hole, AdGuard, Traefik - none in infrastructure
|
|
- Mentions ZFS but only once in actual setup (Vault storage)
|
|
- Doesn't mention Nginx Proxy Manager, Grafana, Prometheus, Twingate, n8n
|
|
|
|
#### Lab-Operator Recommendations
|
|
|
|
**Priority 1:**
|
|
```diff
|
|
# Line 6
|
|
- tools: [Bash, Read, Grep, Edit]
|
|
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
|
|
|
|
# After line 7
|
|
+ color: green
|
|
```
|
|
|
|
**Priority 2:**
|
|
```diff
|
|
# Lines 16-20 - REPLACE
|
|
- <domain_expertise>
|
|
- - **Virtualization**: Proxmox VE (LXC/VM management), ESXi.
|
|
- - **Containers**: Docker Compose, Portainer, Kubernetes (k3s/microk8s).
|
|
- - **Network**: DNS (Pi-hole/AdGuard), Reverse Proxies (Nginx/Traefik), VLAN tagging.
|
|
- - **Storage**: ZFS pool management, NFS/SMB shares.
|
|
- </domain_expertise>
|
|
+ <domain_expertise>
|
|
+ - **Virtualization**: Proxmox VE 8.3.3 (LXC containers, QEMU/KVM VMs)
|
|
+ - **Containers**: Docker Compose, container orchestration on VM hosts
|
|
+ - **Network**: Nginx Proxy Manager (CT 102), VLAN tagging, DNS
|
|
+ - **Storage**: Proxmox storage pools (local, local-lvm, Vault, PBS-Backups, iso-share)
|
|
+ - **Monitoring**: Grafana, Prometheus, PVE Exporter (VM 101 at 192.168.2.114)
|
|
+ - **Automation**: n8n workflow platform (CT 113), Ansible (VM 106)
|
|
+ - **Security**: Twingate zero-trust connector (CT 112)
|
|
+ </domain_expertise>
|
|
```
|
|
|
|
**Priority 3:**
|
|
|
|
Add Proxmox-specific safety protocols:
|
|
```diff
|
|
# After line 26
|
|
+ 4. **Proxmox Safety**: Confirm before `qm destroy`, `pct destroy`, or snapshot deletion.
|
|
+ 5. **Backup Verification**: Before major changes, verify PBS backup exists and is recent.
|
|
```
|
|
|
|
---
|
|
|
|
### 2.2.4 Backend-Builder Agent
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/backend-builder.md`
|
|
|
|
#### Frontmatter (Lines 1-8)
|
|
|
|
```yaml
|
|
---
|
|
name: backend-builder
|
|
description: >
|
|
DevOps and Software Engineer. Writes Python/Java code, Ansible playbooks,
|
|
Terraform configs, and complex Shell scripts. Handles database logic and API integrations.
|
|
tools: [Read, Edit, Grep, Glob]
|
|
model: sonnet
|
|
---
|
|
```
|
|
|
|
**Issues:**
|
|
| Line | Issue | Impact |
|
|
|------|-------|--------|
|
|
| 4 | Mentions Java - not in homelab | Misleading |
|
|
| 6 | Missing `Bash` tool | **CRITICAL**: Cannot test/validate code |
|
|
| 6 | Missing `Write` tool | Cannot create new files |
|
|
| Missing | No `color` field | Inconsistent |
|
|
|
|
#### Prompt Body (Lines 10-27)
|
|
|
|
**Strengths:**
|
|
- Good security focus (secrets management)
|
|
- Appropriate coding standards
|
|
- "Do not be lazy" guidance
|
|
|
|
**Line 18-20 - Homelab Stack:**
|
|
```
|
|
- **Python**: Use modern libraries (`pydantic` for config, `httpx` for APIs).
|
|
- **Ansible**: Ensure playbooks are idempotent.
|
|
- **Terraform**: precise resource targeting.
|
|
```
|
|
|
|
**Issues:**
|
|
- Missing Docker Compose guidance (major part of homelab)
|
|
- Terraform guidance vague
|
|
- No Shell script guidance
|
|
|
|
#### Backend-Builder Recommendations
|
|
|
|
**Priority 1 (CRITICAL):**
|
|
```diff
|
|
# Line 6
|
|
- tools: [Read, Edit, Grep, Glob]
|
|
+ tools: [Read, Edit, Grep, Glob, Write, Bash]
|
|
|
|
# After line 7
|
|
+ color: orange
|
|
```
|
|
|
|
**Priority 2:**
|
|
```diff
|
|
# After line 20 - ADD
|
|
+ - **Docker Compose**: Follow compose spec v3.8+, use named volumes, include healthchecks.
|
|
+ - **Shell Scripts**: Use `#!/usr/bin/env bash`, include error handling (`set -euo pipefail`).
|
|
|
|
# Line 20 - REPLACE
|
|
- - **Terraform**: precise resource targeting.
|
|
+ - **Terraform**: Use modules, implement state management, leverage data sources for existing resources.
|
|
```
|
|
|
|
**Priority 3:**
|
|
|
|
Add validation section:
|
|
```xml
|
|
<validation_rules>
|
|
After writing code, validate before presenting:
|
|
- **Python**: Run `python -m py_compile <file>` to check syntax
|
|
- **Ansible**: Run `ansible-playbook --syntax-check <playbook>`
|
|
- **Docker Compose**: Run `docker compose config` to validate
|
|
- **Shell Scripts**: Run `bash -n <script>` for syntax check
|
|
- **YAML/JSON**: Validate structure before writing
|
|
</validation_rules>
|
|
```
|
|
|
|
---
|
|
|
|
## 2.3 Cross-Agent Analysis
|
|
|
|
### Tool Distribution Matrix
|
|
|
|
| Tool | Scribe | Librarian | Lab-Operator | Backend-Builder |
|
|
|------|--------|-----------|--------------|-----------------|
|
|
| **Read** | ✓ | ✗ | ✓ | ✓ |
|
|
| **Write** | ✗ | ✗ | ✗ | ✗ |
|
|
| **Edit** | ✓ | ✗ | ✓ | ✓ |
|
|
| **Grep** | ✓ | ✗ | ✓ | ✓ |
|
|
| **Glob** | ✓ | ✗ | ✗ | ✓ |
|
|
| **Bash** | ✗ | ✗ | ✓ | ✗ |
|
|
|
|
### Critical Tool Gaps
|
|
|
|
| Gap | Agent | Impact |
|
|
|-----|-------|--------|
|
|
| **No tools at all** | Librarian | **BLOCKING** - Cannot execute ANY git commands |
|
|
| **No Bash** | Backend-Builder | **CRITICAL** - Cannot test Python, validate Ansible, check Terraform |
|
|
| **No Write** | All 4 agents | **HIGH** - Cannot create new files (only edit existing) |
|
|
| **No Glob** | Lab-Operator | **MEDIUM** - Cannot find docker-compose files, configs by pattern |
|
|
|
|
### Consistency Issues
|
|
|
|
| Aspect | Scribe | Librarian | Lab-Operator | Backend-Builder |
|
|
|--------|--------|-----------|--------------|-----------------|
|
|
| **XML tags** | Yes | **No** | Yes | Yes |
|
|
| **Tools in frontmatter** | Yes | **No** | Yes | Yes |
|
|
| **Color field** | No | Yes | No | No |
|
|
| **Line count** | 30 | **127** | 33 | 28 |
|
|
| **Steve reference** | Yes | No | Yes | Yes |
|
|
| **Safety protocols** | No | Partial | **Yes** | Partial |
|
|
|
|
### Role Boundary Ambiguities
|
|
|
|
| Scenario | Possible Agents | Recommendation |
|
|
|----------|-----------------|----------------|
|
|
| Create docker-compose.yml | Backend-Builder OR Lab-Operator | Backend-Builder creates, Lab-Operator deploys |
|
|
| Write Ansible playbook | Backend-Builder OR Lab-Operator | Backend-Builder writes, Lab-Operator executes |
|
|
| Update README after code change | Scribe OR Backend-Builder | Backend-Builder notifies, Scribe updates |
|
|
| Commit infrastructure changes | Librarian OR Lab-Operator | Lab-Operator makes change, Librarian commits |
|
|
|
|
## 2.4 Recommended Tool Distribution
|
|
|
|
### Proposed Standard Toolsets
|
|
|
|
**Documentation Agents** (Scribe):
|
|
```yaml
|
|
tools: [Read, Grep, Glob, Edit, Write]
|
|
```
|
|
- Rationale: Needs all file operations, no system commands
|
|
|
|
**Operations Agents** (Lab-Operator):
|
|
```yaml
|
|
tools: [Bash, Read, Grep, Glob, Edit, Write]
|
|
```
|
|
- Rationale: Needs system commands + all file operations
|
|
|
|
**Development Agents** (Backend-Builder):
|
|
```yaml
|
|
tools: [Bash, Read, Grep, Glob, Edit, Write]
|
|
```
|
|
- Rationale: Needs to test/validate code + all file operations
|
|
|
|
**Git Agents** (Librarian):
|
|
```yaml
|
|
tools: [Bash, Read, Grep, Glob, Edit, Write]
|
|
```
|
|
- Rationale: Git commands + file inspection + .gitignore management
|
|
|
|
---
|
|
|
|
# Part 3: Actionable Recommendations
|
|
|
|
## 3.1 Priority 1 - Critical Fixes (15 minutes total)
|
|
|
|
### Fix 1: Librarian - Add Tools (2 minutes) **BLOCKING**
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/librarian.md`
|
|
|
|
```diff
|
|
---
|
|
name: librarian
|
|
- description: Use this agent when the user needs Git repository management, including operations like committing changes...
|
|
+ description: >
|
|
+ Git repository management specialist. Handles commits, branches, merges,
|
|
+ history review, .gitignore maintenance, and enforces conventional commit standards.
|
|
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
|
|
model: sonnet
|
|
color: purple
|
|
---
|
|
```
|
|
|
|
### Fix 2: Backend-Builder - Add Bash (1 minute) **CRITICAL**
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/backend-builder.md`
|
|
|
|
```diff
|
|
---
|
|
name: backend-builder
|
|
description: >
|
|
DevOps and Software Engineer. Writes Python, Ansible playbooks,
|
|
Terraform configs, and Shell scripts. Handles IaC and automation.
|
|
- tools: [Read, Edit, Grep, Glob]
|
|
+ tools: [Read, Edit, Grep, Glob, Write, Bash]
|
|
model: sonnet
|
|
+ color: orange
|
|
---
|
|
```
|
|
|
|
### Fix 3: CLAUDE.md - Fix GitLab References (5 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/CLAUDE.md`
|
|
|
|
```diff
|
|
# Line 62
|
|
- **Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103)...
|
|
+ **Automation-First Approach**: The presence of Ansible-Control (106), Gitea, and NetBox (103)...
|
|
|
|
# Line 97
|
|
- 5. **Version Control**: GitLab (101) should house all Infrastructure as Code...
|
|
+ 5. **Version Control**: Gitea should house all Infrastructure as Code...
|
|
|
|
# Line 105
|
|
- - **GitLab**: CI/CD pipelines for infrastructure testing and deployment
|
|
+ - **Gitea**: Version control and repository management
|
|
|
|
# Line 126
|
|
- - Working directory: /mnt/c/Users/fam1n/Documents/homelab
|
|
+ - Working directory: /home/jramos/homelab
|
|
|
|
# Line 127 - DELETE
|
|
- - This repository is not yet initialized as a git repository
|
|
```
|
|
|
|
### Fix 4: Scribe - Remove Broken Placeholder (1 minute)
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/scribe.md`
|
|
|
|
```diff
|
|
# Line 20 - DELETE
|
|
- [Image of network topology diagram]
|
|
```
|
|
|
|
### Fix 5: Add Write Tool to All Agents (3 minutes)
|
|
|
|
**Scribe** (line 6):
|
|
```diff
|
|
- tools: [Read, Grep, Glob, Edit]
|
|
+ tools: [Read, Grep, Glob, Edit, Write]
|
|
```
|
|
|
|
**Lab-Operator** (line 6):
|
|
```diff
|
|
- tools: [Bash, Read, Grep, Edit]
|
|
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
|
|
```
|
|
|
|
### Fix 6: Add Missing Color Fields (3 minutes)
|
|
|
|
**Scribe** (after line 7):
|
|
```diff
|
|
model: sonnet
|
|
+ color: blue
|
|
```
|
|
|
|
**Lab-Operator** (after line 7):
|
|
```diff
|
|
model: sonnet
|
|
+ color: green
|
|
```
|
|
|
|
---
|
|
|
|
## 3.2 Priority 2 - High-Impact Improvements (90 minutes total)
|
|
|
|
### Improvement 1: CLAUDE.md - Add Quick Reference (15 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/CLAUDE.md`
|
|
**Location**: After line 8, before "## Infrastructure Overview"
|
|
|
|
```markdown
|
|
## Quick Reference
|
|
|
|
| Resource | Value |
|
|
|----------|-------|
|
|
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
|
|
| **Proxmox Version** | PVE 8.3.3 |
|
|
| **Infrastructure** | 10 VMs, 4 LXC containers |
|
|
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
|
|
| **Version Control** | Gitea at 192.168.2.102:3060 |
|
|
| **Working Directory** | /home/jramos/homelab |
|
|
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |
|
|
|
|
**Key Services:**
|
|
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
|
|
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
|
|
- CT 112 (twingate-connector): Zero-trust network access
|
|
- CT 113 (n8n): Workflow automation at 192.168.2.107
|
|
```
|
|
|
|
### Improvement 2: CLAUDE.md - Add Agent Routing Guide (30 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/CLAUDE.md`
|
|
**Location**: After Quick Reference
|
|
|
|
```markdown
|
|
## Agent Selection Guide
|
|
|
|
When working with this repository, choose the appropriate agent based on task type:
|
|
|
|
| Task Type | Primary Agent | Tools Available | Notes |
|
|
|-----------|---------------|-----------------|-------|
|
|
| **Git Operations** | `librarian` | Bash, Read, Grep, Glob, Edit, Write | Commits, branches, merges, .gitignore |
|
|
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
|
|
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
|
|
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
|
|
| **Complex Multi-Agent** | Main Agent | All tools | Coordinates between specialized agents |
|
|
|
|
### Task Routing Decision Tree
|
|
|
|
```
|
|
Is this a git/version control task?
|
|
├── Yes → Use librarian
|
|
└── No ↓
|
|
|
|
Is this documentation (README, guides, diagrams)?
|
|
├── Yes → Use scribe
|
|
└── No ↓
|
|
|
|
Does this require system commands (docker, ssh, proxmox)?
|
|
├── Yes → Use lab-operator
|
|
└── No ↓
|
|
|
|
Is this code/config creation (Ansible, Python, Terraform)?
|
|
├── Yes → Use backend-builder
|
|
└── No → Use Main Agent
|
|
```
|
|
|
|
### Agent Collaboration Patterns
|
|
|
|
**Documentation Workflow:**
|
|
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
|
|
2. `scribe` updates documentation to reflect changes
|
|
3. `librarian` commits all changes with proper commit message
|
|
|
|
**Infrastructure Deployment:**
|
|
1. `backend-builder` writes IaC (Ansible playbooks, Terraform configs, Docker Compose)
|
|
2. `lab-operator` validates and deploys to Proxmox/Docker
|
|
3. `scribe` documents deployment procedures and architecture
|
|
4. `librarian` commits configuration to repository
|
|
|
|
**Code Development:**
|
|
1. `backend-builder` writes code/scripts
|
|
2. `backend-builder` tests with Bash tool
|
|
3. `scribe` adds code documentation
|
|
4. `librarian` commits with conventional commit message
|
|
```
|
|
|
|
### Improvement 3: CLAUDE.md - Remove Duplicate Tables (20 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/CLAUDE.md`
|
|
**Lines**: Replace 17-56
|
|
|
|
```markdown
|
|
## Infrastructure Overview
|
|
|
|
**For detailed, current infrastructure inventory, see:**
|
|
- **Live Status**: `CLAUDE_STATUS.md` (most current - updated frequently)
|
|
- **Service Details**: `services/README.md` (service-specific documentation)
|
|
- **Complete Index**: `INDEX.md` (comprehensive repository navigation)
|
|
|
|
**Quick Summary:**
|
|
- **Virtual Machines**: 10 total (IDs: 100, 101, 104-111)
|
|
- Highlights: VM 100 (docker-hub), VM 101 (monitoring-docker), VM 106 (Ansible-Control)
|
|
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
|
|
- Highlights: CT 102 (nginx/NPM), CT 112 (twingate), CT 113 (n8n)
|
|
- **Storage Pools**: 5 pools
|
|
- local (system), local-lvm (VM disks), Vault (ZFS - secure data)
|
|
- PBS-Backups (Proxmox Backup Server), iso-share (installation media)
|
|
- **Monitoring Stack**: VM 101 at 192.168.2.114
|
|
- Grafana (port 3000), Prometheus (port 9090), PVE Exporter (port 9221)
|
|
- **Key Network Services**:
|
|
- Nginx Proxy Manager (CT 102), Twingate (CT 112), n8n (CT 113)
|
|
|
|
**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate VM/CT counts, IP addresses, and current status.
|
|
```
|
|
|
|
### Improvement 4: Lab-Operator - Update Domain Expertise (15 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/lab-operator.md`
|
|
**Lines**: Replace 16-20
|
|
|
|
```xml
|
|
<domain_expertise>
|
|
- **Virtualization**: Proxmox VE 8.3.3 (LXC containers, QEMU/KVM virtual machines)
|
|
- **Containers**: Docker Compose orchestration on VM hosts (VM 100, 101, 107)
|
|
- **Network**: Nginx Proxy Manager (CT 102), VLAN tagging, DNS configuration, reverse proxy
|
|
- **Storage**: Proxmox storage architecture
|
|
- local (Directory): System files, ISOs, templates
|
|
- local-lvm (LVM-Thin): VM disk images (thin provisioned)
|
|
- Vault (ZFS Pool): Secure storage for sensitive data
|
|
- PBS-Backups: Proxmox Backup Server repository
|
|
- iso-share (NFS): Installation media library
|
|
- **Monitoring**: Observability stack on VM 101 (192.168.2.114)
|
|
- Grafana: Metrics visualization and dashboards
|
|
- Prometheus: Time-series database and alerting
|
|
- PVE Exporter: Proxmox VE metrics exporter
|
|
- **Automation**:
|
|
- n8n workflow automation platform (CT 113 at 192.168.2.107)
|
|
- Ansible automation (VM 106)
|
|
- **Security**:
|
|
- Twingate zero-trust network access connector (CT 112)
|
|
- Nginx Proxy Manager with SSL/TLS termination
|
|
</domain_expertise>
|
|
```
|
|
|
|
### Improvement 5: Backend-Builder - Add Docker Compose & Validation (10 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/backend-builder.md`
|
|
**After line 21**
|
|
|
|
```xml
|
|
<coding_standards>
|
|
1. **Secrets Management**: NEVER hardcode passwords or API keys. Use `.env` files or environment variables.
|
|
2. **Homelab Stack**:
|
|
- **Python**: Use modern libraries (`pydantic` for config, `httpx` for APIs).
|
|
- **Ansible**: Ensure playbooks are idempotent with proper error handling.
|
|
- **Terraform**: Use modules, implement state management, leverage data sources.
|
|
- **Docker Compose**: Follow compose spec v3.8+, use named volumes, include healthchecks.
|
|
- **Shell Scripts**: Use `#!/usr/bin/env bash`, include error handling (`set -euo pipefail`).
|
|
3. **Error Handling**: Homelabs are messy. Your code must handle network timeouts and missing files gracefully.
|
|
</coding_standards>
|
|
|
|
<validation_rules>
|
|
After writing code, validate before presenting to user:
|
|
- **Python**: Run `python -m py_compile <file>` to check syntax
|
|
- **Ansible**: Run `ansible-playbook --syntax-check <playbook>`
|
|
- **Docker Compose**: Run `docker compose config` to validate syntax
|
|
- **Shell Scripts**: Run `bash -n <script>` for syntax validation
|
|
- **Terraform**: Run `terraform validate` after init
|
|
- **YAML/JSON**: Validate structure before writing
|
|
</validation_rules>
|
|
```
|
|
|
|
---
|
|
|
|
## 3.3 Priority 3 - Quality Enhancements (180 minutes total)
|
|
|
|
### Enhancement 1: CLAUDE.md - Add YAML Frontmatter (5 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/CLAUDE.md`
|
|
**Location**: Very beginning of file
|
|
|
|
```yaml
|
|
---
|
|
version: 2.2.0
|
|
last_updated: 2025-12-07
|
|
infrastructure_source: CLAUDE_STATUS.md
|
|
repository_type: homelab_infrastructure
|
|
primary_node: serviceslab
|
|
primary_node_ip: 192.168.2.200
|
|
proxmox_version: 8.3.3
|
|
vm_count: 10
|
|
lxc_count: 4
|
|
working_directory: /home/jramos/homelab
|
|
git_remote: http://192.168.2.102:3060/jramos/homelab.git
|
|
monitoring_url: http://192.168.2.114:3000
|
|
---
|
|
```
|
|
|
|
### Enhancement 2: Remove "Steve" References (5 minutes)
|
|
|
|
**Files**: scribe.md (line 11), lab-operator.md (line 11), backend-builder.md (line 11)
|
|
|
|
```diff
|
|
# scribe.md line 11
|
|
- You are the **Scribe** (formerly Steve's Architecture Module).
|
|
+ You are the **Scribe** - Documentation Architect and Technical Writer.
|
|
|
|
# lab-operator.md line 11
|
|
- You are the **Lab Operator** (formerly Steve's Infrastructure Module).
|
|
+ You are the **Lab Operator** - Expert Homelab Systems Administrator.
|
|
|
|
# backend-builder.md line 11
|
|
- You are the **Backend Builder** (formerly Steve's Coding Module).
|
|
+ You are the **Backend Builder** - DevOps and Infrastructure as Code Specialist.
|
|
```
|
|
|
|
### Enhancement 3: Add Safety Protocols to Scribe (10 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/scribe.md`
|
|
**After line 23**
|
|
|
|
```xml
|
|
<safety_protocols>
|
|
1. **Read Before Edit**: Always read existing documentation before modifying
|
|
2. **Preserve User Content**: Never overwrite user-created sections without explicit permission
|
|
3. **Timestamp Updates**: Include last-updated dates in documentation headers
|
|
4. **Link Validation**: When referencing other docs, verify paths exist
|
|
5. **No Code Execution**: Document code, don't execute it (use lab-operator or backend-builder)
|
|
</safety_protocols>
|
|
```
|
|
|
|
### Enhancement 4: Librarian - Add XML Structure (30 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/librarian.md`
|
|
**Restructure entire prompt body**
|
|
|
|
```xml
|
|
<system_role>
|
|
You are the **Librarian** - Git Version Control Specialist for the homelab infrastructure repository.
|
|
You have deep expertise in Git workflows, branching strategies, commit conventions, and repository hygiene.
|
|
</system_role>
|
|
|
|
<core_responsibilities>
|
|
## 1. Commit Management
|
|
- Enforce conventional commit format: `type(scope): description`
|
|
- Valid types: feat, fix, docs, style, refactor, test, chore, ci, build, perf
|
|
- Ensure commit messages are clear, concise (50 char summary), descriptive body
|
|
- Example: `feat(ansible): add nginx reverse proxy playbook for Proxmox CT 102`
|
|
- Reference VM/CT IDs and service names in infrastructure commits
|
|
- Stage appropriate files and verify changes before committing
|
|
- NEVER commit sensitive data (credentials, API keys, private keys)
|
|
|
|
## 2. Branching Strategy
|
|
- Use descriptive branch names: `feature/description`, `bugfix/description`, `hotfix/description`
|
|
- Infrastructure examples: `feature/ansible-netbox-integration`, `fix/proxmox-storage-config`
|
|
- Create branches from appropriate base (main/develop)
|
|
- Keep branches focused on single features or fixes
|
|
- Delete merged branches to maintain repository cleanliness
|
|
|
|
## 3. Merging Operations
|
|
- Check for conflicts before merging
|
|
- Prefer fast-forward merges for linear history when possible
|
|
- Use merge commits for feature branches to preserve context
|
|
- Verify all tests pass before completing merges
|
|
- Write clear merge commit messages explaining integration
|
|
|
|
## 4. History Management
|
|
- Use `git log` with formatting for readability
|
|
- Filter history by file paths, authors, date ranges
|
|
- Never rewrite public/shared branch history
|
|
- Identify when rebasing or amending is appropriate vs prohibited
|
|
|
|
## 5. .gitignore Hygiene
|
|
- Proactively identify files that should be ignored
|
|
- Infrastructure-specific patterns:
|
|
* Terraform: `*.tfstate`, `*.tfstate.backup`, `.terraform/`, `terraform.tfvars`
|
|
* Ansible: `*.retry`, `vault_pass.txt`, `.vault_password`
|
|
* Monitoring: `**/pve.yml` (credentials), `.env` files
|
|
* General: `*.log`, `*.swp`, `.DS_Store`
|
|
- Organize .gitignore with commented sections
|
|
- Check existing .gitignore before suggesting additions
|
|
</core_responsibilities>
|
|
|
|
<safety_protocols>
|
|
1. **NEVER** force push to main/master without explicit user confirmation
|
|
2. **NEVER** rewrite published/shared history
|
|
3. **ALWAYS** verify no sensitive data in staged changes before commit
|
|
4. **ALWAYS** require confirmation for destructive operations (hard reset, force push)
|
|
5. **BLOCK** commits containing patterns: password, api_key, secret, token (unless in templates)
|
|
</safety_protocols>
|
|
|
|
<quality_assurance>
|
|
## Pre-Commit Checks
|
|
- Run `git status` to see current state
|
|
- Verify no sensitive data in staged changes
|
|
- Ensure commit message follows conventional format
|
|
- Confirm files being committed are intentional
|
|
- Check for debug code, TODOs, temporary files
|
|
|
|
## Pre-Merge Validation
|
|
- Run `git diff` to review changes
|
|
- Check for merge conflicts
|
|
- Verify branch is up-to-date with target
|
|
- Confirm tests pass (if applicable)
|
|
</quality_assurance>
|
|
|
|
<homelab_context>
|
|
This homelab infrastructure repository contains:
|
|
- Proxmox VM/CT configurations (reference VM/CT IDs in commits)
|
|
- Docker Compose service definitions
|
|
- Ansible playbooks and roles
|
|
- Monitoring stack configs (Grafana/Prometheus)
|
|
- Sensitive data in Vault storage (ensure .gitignore coverage)
|
|
- Infrastructure as Code (Terraform, Ansible)
|
|
|
|
Key infrastructure components to reference:
|
|
- VMs: 100 (docker-hub), 101 (monitoring-docker), 106 (Ansible-Control), 109-110 (web servers), 111 (database)
|
|
- CTs: 102 (nginx/NPM), 103 (netbox), 112 (twingate), 113 (n8n)
|
|
- Storage: Vault (sensitive), PBS-Backups (disaster recovery)
|
|
</homelab_context>
|
|
|
|
<output_format>
|
|
When performing operations:
|
|
1. Explain what you're about to do and why
|
|
2. Show the exact Git commands you'll execute
|
|
3. Display relevant output or confirmations
|
|
4. Summarize the result and next steps
|
|
5. Highlight any warnings or recommendations
|
|
</output_format>
|
|
|
|
<escalation>
|
|
Seek user clarification when:
|
|
- Merge conflicts require manual resolution decisions
|
|
- Multiple valid branching strategies could apply
|
|
- Commit scope is ambiguous or affects multiple areas
|
|
- Destructive operations are requested
|
|
- Repository state is unclear or potentially corrupted
|
|
</escalation>
|
|
```
|
|
|
|
### Enhancement 5: Add Proxmox Safety to Lab-Operator (5 minutes)
|
|
|
|
**File**: `/home/jramos/homelab/sub-agents/lab-operator.md`
|
|
**After line 26**
|
|
|
|
```diff
|
|
3. **Container Safety**: When modifying `docker-compose.yml`, always run `docker compose config` to validate syntax before deploying.
|
|
+ 4. **Proxmox VM/CT Operations**: Confirm before `qm destroy`, `pct destroy`, or snapshot deletion.
|
|
+ 5. **Backup Verification**: Before major infrastructure changes, verify recent PBS backup exists.
|
|
+ 6. **Monitoring Impact**: Consider impact on Grafana/Prometheus metrics when changing infrastructure.
|
|
```
|
|
|
|
---
|
|
|
|
## 3.4 Agent Architecture Proposals
|
|
|
|
### Should Any Agents Be Split?
|
|
|
|
#### Librarian Analysis
|
|
|
|
**Current**: Single agent handling all Git operations (127 lines)
|
|
|
|
**Recommendation**: **DO NOT SPLIT**
|
|
|
|
**Rationale**:
|
|
- Git operations are cohesive and related
|
|
- Splitting would create handoff friction
|
|
- Same tools needed for all Git tasks
|
|
- Better solution: Extract common patterns to CLAUDE.md, reduce line count
|
|
|
|
#### Lab-Operator Analysis
|
|
|
|
**Current**: Single agent for infrastructure operations (33 lines)
|
|
|
|
**Recommendation**: **DO NOT SPLIT** (currently)
|
|
|
|
**Rationale**:
|
|
- Single-node homelab has interconnected operations
|
|
- Splitting (docker-specialist, proxmox-specialist, network-specialist) would fragment workflow
|
|
- A single deployment may touch Proxmox, Docker, and networking
|
|
- **Future consideration**: If infrastructure grows to multi-node, reconsider
|
|
|
|
#### Backend-Builder Analysis
|
|
|
|
**Current**: Single agent for all code/IaC (28 lines)
|
|
|
|
**Recommendation**: **CONSIDER SPLITTING** (medium priority)
|
|
|
|
**Proposed Split**:
|
|
1. **IaC-Builder**: Ansible, Terraform, Docker Compose (declarative configs)
|
|
2. **Script-Developer**: Python, Shell (imperative code, custom tooling)
|
|
|
|
**Rationale**:
|
|
- Different mental models: declarative vs imperative
|
|
- Different validation approaches
|
|
- Different integration points (IaC-Builder → lab-operator; Script-Developer → monitoring)
|
|
- Manageable cognitive load for each
|
|
|
|
**Implementation Effort**: 60 minutes
|
|
|
|
### New Agent Proposals
|
|
|
|
#### 1. Infrastructure-Auditor (HIGH PRIORITY)
|
|
|
|
**Purpose**: Security scanning, compliance checking, configuration drift detection
|
|
|
|
**Justification**:
|
|
- Current agents focus on creation/modification, not validation
|
|
- Homelab has sensitive components (Vault storage, credentials in monitoring configs)
|
|
- PBS backups need verification
|
|
- Configuration drift between IaC and reality
|
|
|
|
**Proposed Definition**:
|
|
|
|
```yaml
|
|
---
|
|
name: infrastructure-auditor
|
|
description: >
|
|
Security and compliance specialist. Scans for misconfigurations, exposed credentials,
|
|
outdated packages, configuration drift, and security vulnerabilities.
|
|
tools: [Bash, Read, Grep, Glob]
|
|
model: sonnet
|
|
color: red
|
|
---
|
|
|
|
<system_role>
|
|
You are the **Infrastructure Auditor** - Security and compliance specialist.
|
|
Your job is to find problems before they become incidents.
|
|
</system_role>
|
|
|
|
<audit_domains>
|
|
1. **Credential Exposure**: Scan for hardcoded secrets, exposed API keys, plaintext passwords
|
|
- Check for patterns: password=, api_key=, token=, secret=
|
|
- Verify .gitignore coverage for sensitive files
|
|
- Validate environment variable usage vs hardcoding
|
|
|
|
2. **Configuration Drift**: Compare running state to declared state
|
|
- Compare docker-compose configs to running containers
|
|
- Verify Proxmox VM/CT configs match documentation
|
|
- Check Ansible playbook state vs actual system state
|
|
|
|
3. **Package Security**: Check for outdated packages with known CVEs
|
|
- Proxmox package versions
|
|
- Docker image versions
|
|
- Python package versions
|
|
|
|
4. **Backup Verification**: Validate PBS backup integrity and recency
|
|
- Check last backup timestamp for critical VMs/CTs
|
|
- Verify backup size and integrity
|
|
- Test restore procedures (read-only simulation)
|
|
|
|
5. **Permission Audit**: Review file permissions and access controls
|
|
- Docker socket exposure
|
|
- Sudo access configurations
|
|
- File ownership and permissions
|
|
|
|
6. **Network Security**: Review exposed services and ports
|
|
- Check for services listening on 0.0.0.0
|
|
- Verify firewall rules
|
|
- Audit reverse proxy configurations
|
|
</audit_domains>
|
|
|
|
<safety_protocols>
|
|
1. **READ-ONLY OPERATIONS**: NEVER modify anything - audit only
|
|
2. **Report Findings**: Document issues, do not auto-remediate
|
|
3. **Escalate Critical Issues**: Immediately flag exposed credentials or critical vulnerabilities
|
|
4. **No Destructive Checks**: Do not run tests that could impact running services
|
|
</safety_protocols>
|
|
|
|
<audit_checklist>
|
|
Run these checks on demand or scheduled:
|
|
- [ ] Scan all .env, .yml, .yaml files for hardcoded credentials
|
|
- [ ] Verify .gitignore covers all sensitive files
|
|
- [ ] Check PBS backup status for all critical VMs/CTs
|
|
- [ ] Compare Grafana datasources to prometheus.yml
|
|
- [ ] Audit Nginx Proxy Manager SSL certificate expiration
|
|
- [ ] Check for exposed Docker sockets
|
|
- [ ] Verify Twingate connector status
|
|
- [ ] Review n8n workflow credential storage
|
|
</audit_checklist>
|
|
```
|
|
|
|
**Implementation Effort**: 45 minutes
|
|
|
|
**Priority**: HIGH - Addresses security gap in current agent coverage
|
|
|
|
#### 2. Backup-Manager (DEFER)
|
|
|
|
**Purpose**: PBS operations, disaster recovery, restore testing
|
|
|
|
**Recommendation**: **DEFER** - Lab-Operator can handle backup operations
|
|
|
|
**Rationale**:
|
|
- PBS operations infrequent
|
|
- Lab-Operator has necessary tools and expertise
|
|
- Would add complexity without significant benefit
|
|
- **Reconsider**: When backup operations become more complex or automated
|
|
|
|
#### 3. Monitoring-Specialist (DEFER)
|
|
|
|
**Purpose**: Grafana dashboards, Prometheus queries, alerting
|
|
|
|
**Recommendation**: **DEFER** - Backend-Builder can handle monitoring configs
|
|
|
|
**Rationale**:
|
|
- Monitoring configs are code (YAML, PromQL)
|
|
- Backend-Builder has appropriate tools
|
|
- Grafana/Prometheus documentation is good
|
|
- **Reconsider**: When alerting becomes complex or requires dedicated expertise
|
|
|
|
---
|
|
|
|
## 3.5 Proposed Final Agent Architecture
|
|
|
|
### Recommended Structure (5-6 Agents)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ DOCUMENTATION LAYER │
|
|
│ ┌────────────────────────────────────────────────────────┐ │
|
|
│ │ Scribe (documentation, architecture, diagrams) │ │
|
|
│ └────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ VERSION CONTROL LAYER │
|
|
│ ┌────────────────────────────────────────────────────────┐ │
|
|
│ │ Librarian (git operations, commits, branches) │ │
|
|
│ └────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ OPERATIONS LAYER │
|
|
│ ┌────────────────────┐ ┌────────────────────────────────┐ │
|
|
│ │ Lab-Operator │ │ Infrastructure-Auditor (NEW) │ │
|
|
│ │ (infra mgmt) │ │ (security scanning) │ │
|
|
│ └────────────────────┘ └────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ DEVELOPMENT LAYER │
|
|
│ ┌────────────────────┐ ┌────────────────────────────────┐ │
|
|
│ │ IaC-Builder │ │ Script-Developer │ │
|
|
│ │ (Ansible, Terraform,│ (Python, Shell automation) │ │
|
|
│ │ Docker Compose) │ │ │ │
|
|
│ └────────────────────┘ └────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Implementation Phases
|
|
|
|
**Phase 1: Critical Fixes** (Day 1 - 15 minutes)
|
|
- Fix librarian tools
|
|
- Add Bash to backend-builder
|
|
- Fix CLAUDE.md GitLab references
|
|
- Add Write tool to all agents
|
|
|
|
**Phase 2: High-Impact** (Week 1 - 90 minutes)
|
|
- Add Quick Reference to CLAUDE.md
|
|
- Add Agent Routing Guide to CLAUDE.md
|
|
- Update lab-operator domain expertise
|
|
- Add validation rules to backend-builder
|
|
|
|
**Phase 3: Quality Enhancements** (Week 2 - 180 minutes)
|
|
- Add YAML frontmatter to CLAUDE.md
|
|
- Restructure librarian with XML
|
|
- Add safety protocols to all agents
|
|
- Remove "Steve" references
|
|
|
|
**Phase 4: Architecture Expansion** (Month 1 - 120 minutes)
|
|
- Create Infrastructure-Auditor agent
|
|
- Split Backend-Builder into IaC-Builder + Script-Developer
|
|
- Test and refine agent boundaries
|
|
|
|
---
|
|
|
|
# Part 4: Implementation Checklist
|
|
|
|
## Quick Reference: Files to Modify
|
|
|
|
| File | Priority 1 | Priority 2 | Priority 3 | Total Changes |
|
|
|------|-----------|-----------|-----------|---------------|
|
|
| `/home/jramos/homelab/CLAUDE.md` | 5 fixes | 3 additions | 1 frontmatter | 9 edits |
|
|
| `/home/jramos/homelab/sub-agents/scribe.md` | 3 fixes | 0 | 2 enhancements | 5 edits |
|
|
| `/home/jramos/homelab/sub-agents/librarian.md` | 2 fixes | 1 restructure | 1 restructure | 4 edits |
|
|
| `/home/jramos/homelab/sub-agents/lab-operator.md` | 2 fixes | 1 update | 2 additions | 5 edits |
|
|
| `/home/jramos/homelab/sub-agents/backend-builder.md` | 2 fixes | 1 addition | 1 addition | 4 edits |
|
|
| **TOTAL** | **14** | **6** | **7** | **27 edits** |
|
|
|
|
## Detailed Implementation Checklist
|
|
|
|
### Priority 1: Critical Fixes (15 minutes)
|
|
|
|
- [ ] **librarian.md**: Add tools field (line 5)
|
|
- `tools: [Bash, Read, Grep, Glob, Edit, Write]`
|
|
|
|
- [ ] **librarian.md**: Condense description (line 3)
|
|
- Remove examples, keep 2-3 sentences
|
|
|
|
- [ ] **backend-builder.md**: Add Bash and Write (line 6)
|
|
- `tools: [Read, Edit, Grep, Glob, Write, Bash]`
|
|
|
|
- [ ] **backend-builder.md**: Add color field
|
|
- `color: orange`
|
|
|
|
- [ ] **scribe.md**: Add Write tool (line 6)
|
|
- `tools: [Read, Grep, Glob, Edit, Write]`
|
|
|
|
- [ ] **scribe.md**: Add color field
|
|
- `color: blue`
|
|
|
|
- [ ] **scribe.md**: Delete broken placeholder (line 20)
|
|
|
|
- [ ] **lab-operator.md**: Add Glob and Write (line 6)
|
|
- `tools: [Bash, Read, Grep, Glob, Edit, Write]`
|
|
|
|
- [ ] **lab-operator.md**: Add color field
|
|
- `color: green`
|
|
|
|
- [ ] **CLAUDE.md**: Fix GitLab → Gitea (lines 62, 97, 105)
|
|
|
|
- [ ] **CLAUDE.md**: Fix working directory (line 126)
|
|
|
|
- [ ] **CLAUDE.md**: Delete "not initialized" line (127)
|
|
|
|
- [ ] **CLAUDE.md**: Fix storage percentage reference (line 89)
|
|
|
|
### Priority 2: High-Impact Improvements (90 minutes)
|
|
|
|
- [ ] **CLAUDE.md**: Add YAML frontmatter (beginning)
|
|
|
|
- [ ] **CLAUDE.md**: Add Quick Reference section (after line 8)
|
|
|
|
- [ ] **CLAUDE.md**: Add Agent Routing Guide (after Quick Reference)
|
|
|
|
- [ ] **CLAUDE.md**: Replace duplicate tables with references (lines 17-56)
|
|
|
|
- [ ] **lab-operator.md**: Update domain expertise (lines 16-20)
|
|
|
|
- [ ] **backend-builder.md**: Add Docker Compose guidance (after line 20)
|
|
|
|
- [ ] **backend-builder.md**: Add validation rules section (after line 27)
|
|
|
|
### Priority 3: Quality Enhancements (180 minutes)
|
|
|
|
- [ ] **scribe.md**: Remove "Steve" reference (line 11)
|
|
|
|
- [ ] **scribe.md**: Update docs directory reference (line 16)
|
|
|
|
- [ ] **scribe.md**: Add safety protocols section (after line 23)
|
|
|
|
- [ ] **librarian.md**: Restructure with XML tags (entire prompt body)
|
|
|
|
- [ ] **librarian.md**: Move examples to prompt body
|
|
|
|
- [ ] **lab-operator.md**: Remove "Steve" reference (line 11)
|
|
|
|
- [ ] **lab-operator.md**: Add Proxmox safety protocols (after line 26)
|
|
|
|
- [ ] **backend-builder.md**: Remove "Steve" reference (line 11)
|
|
|
|
### Future Enhancements (Optional)
|
|
|
|
- [ ] Create `infrastructure-auditor.md` agent
|
|
|
|
- [ ] Split `backend-builder` into `iac-builder` and `script-developer`
|
|
|
|
- [ ] Extract common patterns from librarian to CLAUDE.md
|
|
|
|
- [ ] Add examples section to CLAUDE.md
|
|
|
|
- [ ] Create agent capability testing suite
|
|
|
|
---
|
|
|
|
# Part 5: Expected Outcomes
|
|
|
|
## Before vs After Comparison
|
|
|
|
### Current State Issues
|
|
|
|
| Issue | Impact | Affected Agents |
|
|
|-------|--------|-----------------|
|
|
| Librarian has no tools | **BLOCKING** - Cannot execute ANY git commands | 1 |
|
|
| Backend-Builder lacks Bash | **CRITICAL** - Cannot test code | 1 |
|
|
| No agent has Write tool | **HIGH** - Cannot create new files | 4 |
|
|
| CLAUDE.md has stale GitLab refs | **HIGH** - Misleading documentation | N/A |
|
|
| Duplicate infrastructure tables | **MEDIUM** - Maintenance burden | N/A |
|
|
| Inconsistent agent structure | **MEDIUM** - Confusion, learning curve | 4 |
|
|
|
|
### Post-Implementation Benefits
|
|
|
|
| Improvement | Benefit | Measurable Impact |
|
|
|-------------|---------|-------------------|
|
|
| All agents have proper tools | Functional, can complete tasks | 100% → 100% capability |
|
|
| CLAUDE.md has Quick Reference | Faster context gathering | ~5 min → ~30 sec |
|
|
| Agent Routing Guide | Clear task assignment | Reduced user decision time |
|
|
| No duplicate tables | Easier maintenance | 5 files → 1 file to update |
|
|
| Consistent agent structure | Easier to understand/maintain | Uniform XML structure |
|
|
| Infrastructure-Auditor | Security coverage | New capability |
|
|
|
|
## Success Metrics
|
|
|
|
### Quantitative
|
|
|
|
- **Tool Coverage**: 0% (librarian) → 100% (all agents functional)
|
|
- **Documentation Accuracy**: 5 stale references → 0 stale references
|
|
- **Agent Consistency**: 25% use XML tags → 100% use XML tags
|
|
- **Color Field Coverage**: 25% have color → 100% have color
|
|
- **Information Duplication**: Infrastructure in 5 files → 1 canonical file
|
|
|
|
### Qualitative
|
|
|
|
- **User Experience**: Clear agent selection vs guesswork
|
|
- **Maintenance Burden**: Single source of truth for infrastructure
|
|
- **Security Posture**: Proactive scanning capability
|
|
- **Documentation Quality**: Up-to-date, accurate, easy to navigate
|
|
- **Agent Clarity**: Well-defined boundaries and responsibilities
|
|
|
|
---
|
|
|
|
# Conclusion
|
|
|
|
This analysis identified **critical blocking issues** (librarian non-functional, backend-builder cannot test code) alongside **significant structural improvements** (outdated references, duplicate information, missing routing guidance).
|
|
|
|
## Immediate Action Required
|
|
|
|
1. **Fix librarian tools** (2 minutes) - **BLOCKING** issue
|
|
2. **Add Bash to backend-builder** (1 minute) - **CRITICAL** issue
|
|
3. **Fix CLAUDE.md GitLab references** (5 minutes) - **HIGH** priority
|
|
|
|
**Total time for critical fixes: 15 minutes**
|
|
|
|
## High-Value Improvements
|
|
|
|
1. Add Quick Reference to CLAUDE.md (15 min)
|
|
2. Add Agent Routing Guide (30 min)
|
|
3. Remove duplicate infrastructure tables (20 min)
|
|
|
|
**Total time for high-impact: 90 minutes**
|
|
|
|
## Long-Term Vision
|
|
|
|
With all improvements implemented:
|
|
- **All agents functional** with proper tools
|
|
- **Clear documentation** with quick reference and routing guide
|
|
- **Consistent structure** across all agent definitions
|
|
- **Security coverage** with infrastructure-auditor
|
|
- **Reduced maintenance** through single source of truth
|
|
|
|
**Total implementation effort**: ~5 hours for complete transformation
|
|
|
|
---
|
|
|
|
**Generated**: 2025-12-07
|
|
**Analysis Tool**: Claude Opus 4.5
|
|
**Scope**: CLAUDE.md + 4 sub-agents (scribe, librarian, lab-operator, backend-builder)
|
|
**Total Issues Identified**: 27 (5 critical, 12 high-impact, 10 enhancements)
|