Files

Jordan Ramos 004e3da77c feat(agents): optimize sub-agent architecture with comprehensive prompt engineering

This commit implements a comprehensive optimization of all sub-agent prompt
definitions based on Opus-powered prompt engineering analysis. All agents now
match the quality standard established by librarian.md.

Agent Improvements:
- scribe.md: 29→340 lines (11.7x expansion)
  * Added 6 usage examples with role clarity
  * Implemented comprehensive responsibilities section
  * Added 3 complete ASCII diagram templates
  * Included safety protocols and decision frameworks

- backend-builder.md: 40→291 lines (7.3x expansion)
  * Added 6 usage examples with clear boundaries
  * Expanded core responsibilities (Ansible, Terraform, Docker, Python, Shell)
  * Added technology stack and validation rules tables
  * Included handoff protocol for lab-operator deployment
  * Defined clear boundaries (CREATES code, does NOT deploy)

- lab-operator.md: 37→193 lines (5.2x expansion)
  * Added 6 usage examples with role clarity
  * Expanded domain expertise with specific commands
  * Added command style guide (5-step pattern)
  * Included safety protocols and decision-making framework
  * Defined clear boundaries (DEPLOYS/OPERATES, does NOT create IaC)

- librarian.md: Minor formatting improvements

CLAUDE.md Fixes:
- Moved YAML frontmatter to line 1 (was incorrectly at line 89)
- Fixed trailing pipe character
- Completed incomplete sentences about backup strategy and storage growth
- Removed redundant information
- Expanded status file template with recovery instructions

Files Added:
- Claude_UPDATES.md: Comprehensive prompt engineering analysis report
- monitoring/pve-exporter/pve.yml: PVE monitoring configuration

Impact:
- Total agent documentation: 249→967 lines (288% increase)
- Usage examples: 6→24 total (400% increase)
- All agents now have comprehensive safety protocols
- Clear role boundaries prevent agent overlap
- Validation testing confirms all agents functional

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-07 22:39:40 -07:00

55 KiB

Raw Blame History

Claude Code Homelab Repository - Comprehensive Analysis & Improvement Recommendations

Date: 2025-12-07 Scope: CLAUDE.md + Sub-Agent Architecture Review Methodology: Opus-powered prompt engineering analysis Repository: /home/jramos/homelab/

Executive Summary

This comprehensive analysis evaluated the CLAUDE.md guidance file and all four sub-agent definitions (scribe, librarian, lab-operator, backend-builder) for efficiency, clarity, and effectiveness. The review identified 5 critical issues, 12 high-impact improvements, and 15 structural enhancements that would significantly improve the agent system's functionality and maintainability.

Critical Findings

BLOCKING: Librarian Agent Non-Functional - No tools defined in frontmatter; cannot execute ANY git commands
BLOCKING: Backend-Builder Cannot Test Code - Missing Bash tool; cannot validate any scripts or playbooks written
HIGH: No Agent Can Create Files - All agents lack Write tool; can only modify existing files
HIGH: CLAUDE.md Has Stale References - 5 references to decommissioned GitLab, wrong working directory path
HIGH: Information Duplication Crisis - Infrastructure tables duplicated across 5 files, creating maintenance burden

Quick Win Opportunities (5-20 minutes each)

Fix librarian tools: 2 minutes, CRITICAL impact
Fix GitLab references in CLAUDE.md: 5 minutes, high impact
Add Write tool to all agents: 3 minutes, high impact
Remove broken placeholder from scribe: 1 minute, medium impact

Total Estimated Effort

Priority 1 fixes: ~15 minutes
Priority 2 improvements: ~90 minutes
Priority 3 enhancements: ~180 minutes
Full implementation: ~5 hours

Part 1: CLAUDE.md Analysis

1.1 Current State Assessment

File: /home/jramos/homelab/CLAUDE.md Length: 130 lines Purpose: Primary context file for Claude Code agents working in this repository Last Updated: Unknown (no version tracking)

Strengths

Aspect	Details
Infrastructure Context	Lines 17-33 provide clear VM inventory with IDs, names, purposes
Architecture Rationale	Lines 58-70 explain the "why" behind design decisions
Workflow Template	Lines 74-84 establish a universal workflow pattern
Storage Documentation	Lines 45-56 document storage architecture comprehensively

Critical Issues

Severity	Line(s)	Issue	Impact
HIGH	62	References "GitLab (101)" in Architecture Patterns - GitLab decommissioned	Misleading
HIGH	97	"GitLab (101) should house all IaC" - Service no longer exists	Incorrect
HIGH	105	"GitLab: CI/CD pipelines" - Wrong service listed	Confusing
HIGH	126	Wrong path "/mnt/c/Users/fam1n/Documents/homelab"	Breaks navigation
HIGH	127	"not yet initialized as a git repository" - Repository IS initialized	Factually wrong
MEDIUM	89	States "PBS-Backups at 21.6%" but line 54 says 27.43%	Inconsistent
MEDIUM	110-112	Hardcoded uptime numbers (27-68 days) become stale	Maintenance burden

Structural Issues

1.1.1 Information Duplication

The VM/LXC/Storage tables (lines 17-56) duplicate content from:

CLAUDE_STATUS.md (lines 17-45)
INDEX.md (lines 314-349)
README.md (lines 18-33)
services/README.md (mentions throughout)

Impact: Updates require changing 5 files, creating drift risk and maintenance overhead.

1.1.2 Missing Critical Sections

No Quick Reference: Takes too long to find key info (node IP, monitoring URL, repo location)
No Agent Routing Guide: No guidance on which agent to use for which task
No Version Tracking: No YAML frontmatter or last-updated timestamp
No Tool-to-Task Mappings: Agents don't know their capabilities vs requirements

1.1.3 Outdated Information

Line	Current Text	Reality
62	"GitLab (101)"	Gitea (external) or monitoring-docker (VM 101)
89	"21.6% utilization"	Should reference CLAUDE_STATUS.md for current
97	"GitLab (101) should house all IaC"	Gitea now handles version control
105	"GitLab: CI/CD pipelines"	Should be "Gitea: Version control"

1.2 Recommended CLAUDE.md Restructuring

Priority 1: Immediate Fixes (5 minutes total)

Fix 1: Update GitLab References

# Line 62
- **Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103)...
+ **Automation-First Approach**: The presence of Ansible-Control (106), Gitea, and NetBox (103)...

# Line 97
- 5. **Version Control**: GitLab (101) should house all Infrastructure as Code, scripts, and configuration files from this repository.
+ 5. **Version Control**: Gitea should house all Infrastructure as Code, scripts, and configuration files from this repository.

# Line 105
- - **GitLab**: CI/CD pipelines for infrastructure testing and deployment
+ - **Gitea**: Version control and repository management

Fix 2: Correct Working Directory

# Line 126
- - Working directory: /mnt/c/Users/fam1n/Documents/homelab
+ - Working directory: /home/jramos/homelab

Fix 3: Remove False Statement

# Line 127 - DELETE THIS LINE
- - This repository is not yet initialized as a git repository

Fix 4: Fix Storage Percentage

# Line 89
- 1. **Backup Strategy**: With PBS-Backups at 21.6% utilization...
+ 1. **Backup Strategy**: With PBS-Backups utilization growing (see CLAUDE_STATUS.md for current)...

Priority 2: Add Quick Reference Section (15 minutes)

Insert after line 8, before "## Infrastructure Overview":

## Quick Reference

| Resource | Value |
|----------|-------|
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
| **Proxmox Version** | PVE 8.3.3 |
| **Infrastructure** | 10 VMs, 4 LXC containers |
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
| **Version Control** | Gitea at 192.168.2.102:3060 |
| **Working Directory** | /home/jramos/homelab |
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |

**Key Services:**
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
- CT 112 (twingate-connector): Zero-trust network access
- CT 113 (n8n): Workflow automation at 192.168.2.107

Priority 2: Add Agent Routing Guide (30 minutes)

Insert after Quick Reference:

## Agent Selection Guide

When working with this repository, choose the appropriate agent based on task type:

| Task Type | Primary Agent | Tools Available | Notes |
|-----------|---------------|-----------------|-------|
| **Git Operations** | `librarian` | Bash, Read, Grep, Edit, Write | Commits, branches, merges, .gitignore |
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
| **File Creation** | Main Agent | All tools | Use when sub-agents lack specific tools |
| **Complex Multi-Agent Tasks** | Main Agent | All tools | Coordinates between specialized agents |

### Task Routing Decision Tree

Is this a git/version control task? ├── Yes → Use librarian └── No ↓

Is this documentation (README, guides, diagrams)? ├── Yes → Use scribe └── No ↓

Does this require system commands (docker, ssh, proxmox)? ├── Yes → Use lab-operator └── No ↓

Is this code/config creation (Ansible, Python, Terraform)? ├── Yes → Use backend-builder └── No → Use Main Agent


### Agent Collaboration Patterns

**Documentation Workflow:**
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
2. `scribe` updates documentation
3. `librarian` commits all changes

**Infrastructure Deployment:**
1. `backend-builder` writes IaC (Ansible/Terraform/Compose)
2. `lab-operator` deploys to Proxmox/Docker
3. `scribe` documents deployment
4. `librarian` commits configuration

Priority 2: Remove Duplicate Infrastructure Tables (20 minutes)

Replace lines 17-56 with:

## Infrastructure Overview

**For detailed, current infrastructure inventory, see:**
- **Live Status**: `CLAUDE_STATUS.md` (most current)
- **Service Details**: `services/README.md`
- **Complete Index**: `INDEX.md`

**Quick Summary:**
- **VMs**: 10 total (IDs: 100, 101, 104-111)
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
- **Storage Pools**: local, local-lvm, Vault (ZFS), PBS-Backups, iso-share
- **Monitoring**: VM 101 at 192.168.2.114 (Grafana/Prometheus/PVE Exporter)
- **Key Services**: See Quick Reference above

**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate counts, IPs, and status.

Priority 3: Add YAML Frontmatter (5 minutes)

Insert at very beginning of file:

---
version: 2.2.0
last_updated: 2025-12-07
infrastructure_source: CLAUDE_STATUS.md
repository_type: homelab
primary_node: serviceslab
proxmox_version: 8.3.3
vm_count: 10
lxc_count: 4
working_directory: /home/jramos/homelab
git_remote: http://192.168.2.102:3060/jramos/homelab.git
---

1.3 Complete Proposed CLAUDE.md Structure

---
version: 2.2.0
last_updated: 2025-12-07
infrastructure_source: CLAUDE_STATUS.md
---

# CLAUDE.md

This file provides guidance to Claude Code when working with this homelab infrastructure repository.

## Quick Reference
[Key info table - 10 lines]

## Agent Selection Guide
[Task routing decision tree - 30 lines]

## Repository Overview
[High-level purpose - 10 lines]

## Infrastructure Reference
[Link to CLAUDE_STATUS.md - 15 lines]

## Working with This Environment
### Universal Workflow
[Existing content - 15 lines]

## Architecture Principles
[Condensed from current patterns - 20 lines]

## Best Practices
[Updated practices - 15 lines]

## Development Setup
[Existing content - 10 lines]

## Notes
[Updated notes - 5 lines]

Estimated new length: ~130 lines (same as current) Information density: Significantly higher Maintenance burden: Reduced (references instead of duplicates)

Part 2: Sub-Agent Architecture Analysis

2.1 Agent Inventory

Agent	File	Lines	Tools Defined	Status
scribe	sub-agents/scribe.md	30	Read, Grep, Glob, Edit	Missing Write
librarian	sub-agents/librarian.md	127	NONE	NON-FUNCTIONAL
lab-operator	sub-agents/lab-operator.md	33	Bash, Read, Grep, Edit	Missing Glob, Write
backend-builder	sub-agents/backend-builder.md	28	Read, Edit, Grep, Glob	Missing Write, Bash

2.2 Individual Agent Reviews

2.2.1 Scribe Agent

File: /home/jramos/homelab/sub-agents/scribe.md

Frontmatter (Lines 1-8)

---
name: scribe
description: >
  Homelab Architect and Technical Writer. Explains concepts, designs network topologies,
  summarizes project structures, and maintains documentation (READMEs).
tools: [Read, Grep, Glob, Edit]
model: sonnet
---

Strengths:

Clean YAML structure
Clear description
Appropriate model

Issues:

Line	Issue	Impact
6	Missing `Write` tool	Cannot create new documentation files
Missing	No `color` field	Inconsistent with librarian

Prompt Body Analysis

Lines 11-12:

You are the **Scribe** (formerly Steve's Architecture Module).

"Steve" reference confusing without context
Recommendation: Remove "(formerly Steve's Architecture Module)"

Line 16:

1.  **Documentation**: Keep `README.md` and `docs/` up to date

References docs/ directory that doesn't exist
Recommendation: Update to actual docs locations

Line 20 - CRITICAL ISSUE:

[Image of network topology diagram]

Broken placeholder, incomplete
Recommendation: Delete this line immediately

Line 28:

- Do not execute code. Your job is to plan and explain.

Conflicts with having Edit tool (which modifies files)
Recommendation: Clarify "Do not execute system commands via Bash"

Scribe Recommendations

Priority 1 (CRITICAL):

# Line 6
- tools: [Read, Grep, Glob, Edit]
+ tools: [Read, Grep, Glob, Edit, Write]

# Line 20 - DELETE
- [Image of network topology diagram]

# After Line 7
+ color: blue

Priority 2:

# Line 11
- You are the **Scribe** (formerly Steve's Architecture Module).
+ You are the **Scribe** - Documentation Architect and Technical Writer.

# Line 16
- Keep `README.md` and `docs/` up to date
+ Keep `README.md`, `services/README.md`, and infrastructure docs up to date

2.2.2 Librarian Agent

File: /home/jramos/homelab/sub-agents/librarian.md

Frontmatter (Lines 1-6) - CRITICAL ISSUE

---
name: librarian
description: Use this agent when the user needs Git repository management...
model: sonnet
color: purple
---

BLOCKING ISSUE: No tools field defined

Impact: Agent cannot execute ANY git commands. Completely non-functional.

Description Field - Major Problem

Line 3: Description is 552 words with 6 embedded examples

Example excerpt:

description: Use this agent when...

- Example 1 (Commit Operation):
user: "I've finished implementing..."
assistant: "I'll use the git-version-control agent..."
[... 5 more examples ...]

Issues:

Examples should be in prompt body, not frontmatter
Description unparseable by automated systems
Violates YAML frontmatter conventions

Prompt Body (Lines 8-125)

Line count: 118 lines (4x longer than other agents)

Structure: Professional prose (no XML tags like other agents)

Strengths:

Comprehensive Git guidance
Excellent safety protocols
Infrastructure-aware (mentions VM/CT IDs)
Good conventional commit examples

Issues:

Line	Issue
8	Prose style vs XML tags in other agents
14-125	Could be condensed by moving common patterns to CLAUDE.md

Librarian Recommendations

Priority 1 (CRITICAL) - MUST FIX:

# Line 3
- description: Use this agent when the user needs Git repository management, including...
+ description: >
+   Git repository management specialist. Handles commits, branches, merges,
+   history review, .gitignore maintenance, and enforces conventional commit standards.

# After line 5 - ADD THIS
+ tools: [Bash, Read, Grep, Glob, Edit, Write]

Priority 2:

Move examples from description to prompt body:

## Usage Examples

### Commit Operation
User: "I've finished implementing the Ansible playbook for nginx configuration."
Action: Create properly formatted conventional commit.

### Branch Management
User: "Create a new feature branch for NetBox integration."
Action: Create appropriately named feature branch.

[... remaining examples ...]

Priority 3:

Add XML structure for consistency:

<system_role>
You are the **Librarian** - Git Version Control Specialist for the homelab repository.
</system_role>

<core_responsibilities>
[existing commit management section]
</core_responsibilities>

<safety_protocols>
1. NEVER force push to main/master
2. NEVER rewrite published history
3. Require confirmation for destructive operations
4. Block commits containing sensitive data patterns
</safety_protocols>

2.2.3 Lab-Operator Agent

File: /home/jramos/homelab/sub-agents/lab-operator.md

Frontmatter (Lines 1-8)

---
name: lab-operator
description: >
  Expert Homelab SysAdmin. Manages Proxmox, Docker, Kubernetes, TrueNAS, networking (pfSense/VLANs),
  and Linux server administration. Handles package installation and system config.
tools: [Bash, Read, Grep, Edit]
model: sonnet
---

Issues:

Line	Issue	Impact
4-5	Mentions Kubernetes, TrueNAS, pfSense not in homelab	Misleading
6	Missing `Glob` tool	Cannot find files by pattern
6	Missing `Write` tool	Cannot create new configs
Missing	No `color` field	Inconsistent

Prompt Body (Lines 10-33)

Strengths:

XML tag structure consistent with scribe/backend-builder
Excellent <safety_protocols> section
Good response style guidance

Lines 16-20 - Domain Expertise Issues:

<domain_expertise>
- **Virtualization**: Proxmox VE (LXC/VM management), ESXi.
- **Containers**: Docker Compose, Portainer, Kubernetes (k3s/microk8s).
- **Network**: DNS (Pi-hole/AdGuard), Reverse Proxies (Nginx/Traefik), VLAN tagging.
- **Storage**: ZFS pool management, NFS/SMB shares.
</domain_expertise>

Problems:

Mentions ESXi, Portainer, Kubernetes, Pi-hole, AdGuard, Traefik - none in infrastructure
Mentions ZFS but only once in actual setup (Vault storage)
Doesn't mention Nginx Proxy Manager, Grafana, Prometheus, Twingate, n8n

Lab-Operator Recommendations

Priority 1:

# Line 6
- tools: [Bash, Read, Grep, Edit]
+ tools: [Bash, Read, Grep, Glob, Edit, Write]

# After line 7
+ color: green

Priority 2:

# Lines 16-20 - REPLACE
- <domain_expertise>
- - **Virtualization**: Proxmox VE (LXC/VM management), ESXi.
- - **Containers**: Docker Compose, Portainer, Kubernetes (k3s/microk8s).
- - **Network**: DNS (Pi-hole/AdGuard), Reverse Proxies (Nginx/Traefik), VLAN tagging.
- - **Storage**: ZFS pool management, NFS/SMB shares.
- </domain_expertise>
+ <domain_expertise>
+ - **Virtualization**: Proxmox VE 8.3.3 (LXC containers, QEMU/KVM VMs)
+ - **Containers**: Docker Compose, container orchestration on VM hosts
+ - **Network**: Nginx Proxy Manager (CT 102), VLAN tagging, DNS
+ - **Storage**: Proxmox storage pools (local, local-lvm, Vault, PBS-Backups, iso-share)
+ - **Monitoring**: Grafana, Prometheus, PVE Exporter (VM 101 at 192.168.2.114)
+ - **Automation**: n8n workflow platform (CT 113), Ansible (VM 106)
+ - **Security**: Twingate zero-trust connector (CT 112)
+ </domain_expertise>

Priority 3:

Add Proxmox-specific safety protocols:

# After line 26
+ 4.  **Proxmox Safety**: Confirm before `qm destroy`, `pct destroy`, or snapshot deletion.
+ 5.  **Backup Verification**: Before major changes, verify PBS backup exists and is recent.

2.2.4 Backend-Builder Agent

File: /home/jramos/homelab/sub-agents/backend-builder.md

Frontmatter (Lines 1-8)

---
name: backend-builder
description: >
  DevOps and Software Engineer. Writes Python/Java code, Ansible playbooks,
  Terraform configs, and complex Shell scripts. Handles database logic and API integrations.
tools: [Read, Edit, Grep, Glob]
model: sonnet
---

Issues:

Line	Issue	Impact
4	Mentions Java - not in homelab	Misleading
6	Missing `Bash` tool	CRITICAL: Cannot test/validate code
6	Missing `Write` tool	Cannot create new files
Missing	No `color` field	Inconsistent

Prompt Body (Lines 10-27)

Strengths:

Good security focus (secrets management)
Appropriate coding standards
"Do not be lazy" guidance

Line 18-20 - Homelab Stack:

- **Python**: Use modern libraries (`pydantic` for config, `httpx` for APIs).
- **Ansible**: Ensure playbooks are idempotent.
- **Terraform**: precise resource targeting.

Issues:

Missing Docker Compose guidance (major part of homelab)
Terraform guidance vague
No Shell script guidance

Backend-Builder Recommendations

Priority 1 (CRITICAL):

# Line 6
- tools: [Read, Edit, Grep, Glob]
+ tools: [Read, Edit, Grep, Glob, Write, Bash]

# After line 7
+ color: orange

Priority 2:

# After line 20 - ADD
+     - **Docker Compose**: Follow compose spec v3.8+, use named volumes, include healthchecks.
+     - **Shell Scripts**: Use `#!/usr/bin/env bash`, include error handling (`set -euo pipefail`).

# Line 20 - REPLACE
-     - **Terraform**: precise resource targeting.
+     - **Terraform**: Use modules, implement state management, leverage data sources for existing resources.

Priority 3:

Add validation section:

<validation_rules>
After writing code, validate before presenting:
- **Python**: Run `python -m py_compile <file>` to check syntax
- **Ansible**: Run `ansible-playbook --syntax-check <playbook>`
- **Docker Compose**: Run `docker compose config` to validate
- **Shell Scripts**: Run `bash -n <script>` for syntax check
- **YAML/JSON**: Validate structure before writing
</validation_rules>

2.3 Cross-Agent Analysis

Tool Distribution Matrix

Tool	Scribe	Librarian	Lab-Operator	Backend-Builder
Read	✓	✗	✓	✓
Write	✗	✗	✗	✗
Edit	✓	✗	✓	✓
Grep	✓	✗	✓	✓
Glob	✓	✗	✗	✓
Bash	✗	✗	✓	✗

Critical Tool Gaps

Gap	Agent	Impact
No tools at all	Librarian	BLOCKING - Cannot execute ANY git commands
No Bash	Backend-Builder	CRITICAL - Cannot test Python, validate Ansible, check Terraform
No Write	All 4 agents	HIGH - Cannot create new files (only edit existing)
No Glob	Lab-Operator	MEDIUM - Cannot find docker-compose files, configs by pattern

Consistency Issues

Aspect	Scribe	Librarian	Lab-Operator	Backend-Builder
XML tags	Yes	No	Yes	Yes
Tools in frontmatter	Yes	No	Yes	Yes
Color field	No	Yes	No	No
Line count	30	127	33	28
Steve reference	Yes	No	Yes	Yes
Safety protocols	No	Partial	Yes	Partial

Role Boundary Ambiguities

Scenario	Possible Agents	Recommendation
Create docker-compose.yml	Backend-Builder OR Lab-Operator	Backend-Builder creates, Lab-Operator deploys
Write Ansible playbook	Backend-Builder OR Lab-Operator	Backend-Builder writes, Lab-Operator executes
Update README after code change	Scribe OR Backend-Builder	Backend-Builder notifies, Scribe updates
Commit infrastructure changes	Librarian OR Lab-Operator	Lab-Operator makes change, Librarian commits

2.4 Recommended Tool Distribution

Proposed Standard Toolsets

Documentation Agents (Scribe):

tools: [Read, Grep, Glob, Edit, Write]

Rationale: Needs all file operations, no system commands

Operations Agents (Lab-Operator):

tools: [Bash, Read, Grep, Glob, Edit, Write]

Rationale: Needs system commands + all file operations

Development Agents (Backend-Builder):

tools: [Bash, Read, Grep, Glob, Edit, Write]

Rationale: Needs to test/validate code + all file operations

Git Agents (Librarian):

tools: [Bash, Read, Grep, Glob, Edit, Write]

Rationale: Git commands + file inspection + .gitignore management

Part 3: Actionable Recommendations

3.1 Priority 1 - Critical Fixes (15 minutes total)

Fix 1: Librarian - Add Tools (2 minutes) BLOCKING

File: /home/jramos/homelab/sub-agents/librarian.md

---
name: librarian
- description: Use this agent when the user needs Git repository management, including operations like committing changes...
+ description: >
+   Git repository management specialist. Handles commits, branches, merges,
+   history review, .gitignore maintenance, and enforces conventional commit standards.
+ tools: [Bash, Read, Grep, Glob, Edit, Write]
model: sonnet
color: purple
---

Fix 2: Backend-Builder - Add Bash (1 minute) CRITICAL

File: /home/jramos/homelab/sub-agents/backend-builder.md

---
name: backend-builder
description: >
  DevOps and Software Engineer. Writes Python, Ansible playbooks,
  Terraform configs, and Shell scripts. Handles IaC and automation.
- tools: [Read, Edit, Grep, Glob]
+ tools: [Read, Edit, Grep, Glob, Write, Bash]
model: sonnet
+ color: orange
---

Fix 3: CLAUDE.md - Fix GitLab References (5 minutes)

File: /home/jramos/homelab/CLAUDE.md

# Line 62
- **Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103)...
+ **Automation-First Approach**: The presence of Ansible-Control (106), Gitea, and NetBox (103)...

# Line 97
- 5. **Version Control**: GitLab (101) should house all Infrastructure as Code...
+ 5. **Version Control**: Gitea should house all Infrastructure as Code...

# Line 105
- - **GitLab**: CI/CD pipelines for infrastructure testing and deployment
+ - **Gitea**: Version control and repository management

# Line 126
- - Working directory: /mnt/c/Users/fam1n/Documents/homelab
+ - Working directory: /home/jramos/homelab

# Line 127 - DELETE
- - This repository is not yet initialized as a git repository

Fix 4: Scribe - Remove Broken Placeholder (1 minute)

File: /home/jramos/homelab/sub-agents/scribe.md

# Line 20 - DELETE
- [Image of network topology diagram]

Fix 5: Add Write Tool to All Agents (3 minutes)

Scribe (line 6):

- tools: [Read, Grep, Glob, Edit]
+ tools: [Read, Grep, Glob, Edit, Write]

Lab-Operator (line 6):

- tools: [Bash, Read, Grep, Edit]
+ tools: [Bash, Read, Grep, Glob, Edit, Write]

Fix 6: Add Missing Color Fields (3 minutes)

Scribe (after line 7):

model: sonnet
+ color: blue

Lab-Operator (after line 7):

model: sonnet
+ color: green

3.2 Priority 2 - High-Impact Improvements (90 minutes total)

Improvement 1: CLAUDE.md - Add Quick Reference (15 minutes)

File: /home/jramos/homelab/CLAUDE.md Location: After line 8, before "## Infrastructure Overview"

## Quick Reference

| Resource | Value |
|----------|-------|
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
| **Proxmox Version** | PVE 8.3.3 |
| **Infrastructure** | 10 VMs, 4 LXC containers |
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
| **Version Control** | Gitea at 192.168.2.102:3060 |
| **Working Directory** | /home/jramos/homelab |
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |

**Key Services:**
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
- CT 112 (twingate-connector): Zero-trust network access
- CT 113 (n8n): Workflow automation at 192.168.2.107

Improvement 2: CLAUDE.md - Add Agent Routing Guide (30 minutes)

File: /home/jramos/homelab/CLAUDE.md Location: After Quick Reference

## Agent Selection Guide

When working with this repository, choose the appropriate agent based on task type:

| Task Type | Primary Agent | Tools Available | Notes |
|-----------|---------------|-----------------|-------|
| **Git Operations** | `librarian` | Bash, Read, Grep, Glob, Edit, Write | Commits, branches, merges, .gitignore |
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
| **Complex Multi-Agent** | Main Agent | All tools | Coordinates between specialized agents |

### Task Routing Decision Tree

Is this a git/version control task? ├── Yes → Use librarian └── No ↓

Is this documentation (README, guides, diagrams)? ├── Yes → Use scribe └── No ↓

Does this require system commands (docker, ssh, proxmox)? ├── Yes → Use lab-operator └── No ↓

Is this code/config creation (Ansible, Python, Terraform)? ├── Yes → Use backend-builder └── No → Use Main Agent


### Agent Collaboration Patterns

**Documentation Workflow:**
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
2. `scribe` updates documentation to reflect changes
3. `librarian` commits all changes with proper commit message

**Infrastructure Deployment:**
1. `backend-builder` writes IaC (Ansible playbooks, Terraform configs, Docker Compose)
2. `lab-operator` validates and deploys to Proxmox/Docker
3. `scribe` documents deployment procedures and architecture
4. `librarian` commits configuration to repository

**Code Development:**
1. `backend-builder` writes code/scripts
2. `backend-builder` tests with Bash tool
3. `scribe` adds code documentation
4. `librarian` commits with conventional commit message

Improvement 3: CLAUDE.md - Remove Duplicate Tables (20 minutes)

File: /home/jramos/homelab/CLAUDE.md Lines: Replace 17-56

## Infrastructure Overview

**For detailed, current infrastructure inventory, see:**
- **Live Status**: `CLAUDE_STATUS.md` (most current - updated frequently)
- **Service Details**: `services/README.md` (service-specific documentation)
- **Complete Index**: `INDEX.md` (comprehensive repository navigation)

**Quick Summary:**
- **Virtual Machines**: 10 total (IDs: 100, 101, 104-111)
  - Highlights: VM 100 (docker-hub), VM 101 (monitoring-docker), VM 106 (Ansible-Control)
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
  - Highlights: CT 102 (nginx/NPM), CT 112 (twingate), CT 113 (n8n)
- **Storage Pools**: 5 pools
  - local (system), local-lvm (VM disks), Vault (ZFS - secure data)
  - PBS-Backups (Proxmox Backup Server), iso-share (installation media)
- **Monitoring Stack**: VM 101 at 192.168.2.114
  - Grafana (port 3000), Prometheus (port 9090), PVE Exporter (port 9221)
- **Key Network Services**:
  - Nginx Proxy Manager (CT 102), Twingate (CT 112), n8n (CT 113)

**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate VM/CT counts, IP addresses, and current status.

Improvement 4: Lab-Operator - Update Domain Expertise (15 minutes)

File: /home/jramos/homelab/sub-agents/lab-operator.md Lines: Replace 16-20

<domain_expertise>
- **Virtualization**: Proxmox VE 8.3.3 (LXC containers, QEMU/KVM virtual machines)
- **Containers**: Docker Compose orchestration on VM hosts (VM 100, 101, 107)
- **Network**: Nginx Proxy Manager (CT 102), VLAN tagging, DNS configuration, reverse proxy
- **Storage**: Proxmox storage architecture
  - local (Directory): System files, ISOs, templates
  - local-lvm (LVM-Thin): VM disk images (thin provisioned)
  - Vault (ZFS Pool): Secure storage for sensitive data
  - PBS-Backups: Proxmox Backup Server repository
  - iso-share (NFS): Installation media library
- **Monitoring**: Observability stack on VM 101 (192.168.2.114)
  - Grafana: Metrics visualization and dashboards
  - Prometheus: Time-series database and alerting
  - PVE Exporter: Proxmox VE metrics exporter
- **Automation**:
  - n8n workflow automation platform (CT 113 at 192.168.2.107)
  - Ansible automation (VM 106)
- **Security**:
  - Twingate zero-trust network access connector (CT 112)
  - Nginx Proxy Manager with SSL/TLS termination
</domain_expertise>

Improvement 5: Backend-Builder - Add Docker Compose & Validation (10 minutes)

File: /home/jramos/homelab/sub-agents/backend-builder.md After line 21

<coding_standards>
1.  **Secrets Management**: NEVER hardcode passwords or API keys. Use `.env` files or environment variables.
2.  **Homelab Stack**:
    - **Python**: Use modern libraries (`pydantic` for config, `httpx` for APIs).
    - **Ansible**: Ensure playbooks are idempotent with proper error handling.
    - **Terraform**: Use modules, implement state management, leverage data sources.
    - **Docker Compose**: Follow compose spec v3.8+, use named volumes, include healthchecks.
    - **Shell Scripts**: Use `#!/usr/bin/env bash`, include error handling (`set -euo pipefail`).
3.  **Error Handling**: Homelabs are messy. Your code must handle network timeouts and missing files gracefully.
</coding_standards>

<validation_rules>
After writing code, validate before presenting to user:
- **Python**: Run `python -m py_compile <file>` to check syntax
- **Ansible**: Run `ansible-playbook --syntax-check <playbook>`
- **Docker Compose**: Run `docker compose config` to validate syntax
- **Shell Scripts**: Run `bash -n <script>` for syntax validation
- **Terraform**: Run `terraform validate` after init
- **YAML/JSON**: Validate structure before writing
</validation_rules>

3.3 Priority 3 - Quality Enhancements (180 minutes total)

Enhancement 1: CLAUDE.md - Add YAML Frontmatter (5 minutes)

File: /home/jramos/homelab/CLAUDE.md Location: Very beginning of file

---
version: 2.2.0
last_updated: 2025-12-07
infrastructure_source: CLAUDE_STATUS.md
repository_type: homelab_infrastructure
primary_node: serviceslab
primary_node_ip: 192.168.2.200
proxmox_version: 8.3.3
vm_count: 10
lxc_count: 4
working_directory: /home/jramos/homelab
git_remote: http://192.168.2.102:3060/jramos/homelab.git
monitoring_url: http://192.168.2.114:3000
---

Enhancement 2: Remove "Steve" References (5 minutes)

Files: scribe.md (line 11), lab-operator.md (line 11), backend-builder.md (line 11)

# scribe.md line 11
- You are the **Scribe** (formerly Steve's Architecture Module).
+ You are the **Scribe** - Documentation Architect and Technical Writer.

# lab-operator.md line 11
- You are the **Lab Operator** (formerly Steve's Infrastructure Module).
+ You are the **Lab Operator** - Expert Homelab Systems Administrator.

# backend-builder.md line 11
- You are the **Backend Builder** (formerly Steve's Coding Module).
+ You are the **Backend Builder** - DevOps and Infrastructure as Code Specialist.

Enhancement 3: Add Safety Protocols to Scribe (10 minutes)

File: /home/jramos/homelab/sub-agents/scribe.md After line 23

<safety_protocols>
1. **Read Before Edit**: Always read existing documentation before modifying
2. **Preserve User Content**: Never overwrite user-created sections without explicit permission
3. **Timestamp Updates**: Include last-updated dates in documentation headers
4. **Link Validation**: When referencing other docs, verify paths exist
5. **No Code Execution**: Document code, don't execute it (use lab-operator or backend-builder)
</safety_protocols>

Enhancement 4: Librarian - Add XML Structure (30 minutes)

File: /home/jramos/homelab/sub-agents/librarian.md Restructure entire prompt body

<system_role>
You are the **Librarian** - Git Version Control Specialist for the homelab infrastructure repository.
You have deep expertise in Git workflows, branching strategies, commit conventions, and repository hygiene.
</system_role>

<core_responsibilities>
## 1. Commit Management
- Enforce conventional commit format: `type(scope): description`
- Valid types: feat, fix, docs, style, refactor, test, chore, ci, build, perf
- Ensure commit messages are clear, concise (50 char summary), descriptive body
- Example: `feat(ansible): add nginx reverse proxy playbook for Proxmox CT 102`
- Reference VM/CT IDs and service names in infrastructure commits
- Stage appropriate files and verify changes before committing
- NEVER commit sensitive data (credentials, API keys, private keys)

## 2. Branching Strategy
- Use descriptive branch names: `feature/description`, `bugfix/description`, `hotfix/description`
- Infrastructure examples: `feature/ansible-netbox-integration`, `fix/proxmox-storage-config`
- Create branches from appropriate base (main/develop)
- Keep branches focused on single features or fixes
- Delete merged branches to maintain repository cleanliness

## 3. Merging Operations
- Check for conflicts before merging
- Prefer fast-forward merges for linear history when possible
- Use merge commits for feature branches to preserve context
- Verify all tests pass before completing merges
- Write clear merge commit messages explaining integration

## 4. History Management
- Use `git log` with formatting for readability
- Filter history by file paths, authors, date ranges
- Never rewrite public/shared branch history
- Identify when rebasing or amending is appropriate vs prohibited

## 5. .gitignore Hygiene
- Proactively identify files that should be ignored
- Infrastructure-specific patterns:
  * Terraform: `*.tfstate`, `*.tfstate.backup`, `.terraform/`, `terraform.tfvars`
  * Ansible: `*.retry`, `vault_pass.txt`, `.vault_password`
  * Monitoring: `**/pve.yml` (credentials), `.env` files
  * General: `*.log`, `*.swp`, `.DS_Store`
- Organize .gitignore with commented sections
- Check existing .gitignore before suggesting additions
</core_responsibilities>

<safety_protocols>
1. **NEVER** force push to main/master without explicit user confirmation
2. **NEVER** rewrite published/shared history
3. **ALWAYS** verify no sensitive data in staged changes before commit
4. **ALWAYS** require confirmation for destructive operations (hard reset, force push)
5. **BLOCK** commits containing patterns: password, api_key, secret, token (unless in templates)
</safety_protocols>

<quality_assurance>
## Pre-Commit Checks
- Run `git status` to see current state
- Verify no sensitive data in staged changes
- Ensure commit message follows conventional format
- Confirm files being committed are intentional
- Check for debug code, TODOs, temporary files

## Pre-Merge Validation
- Run `git diff` to review changes
- Check for merge conflicts
- Verify branch is up-to-date with target
- Confirm tests pass (if applicable)
</quality_assurance>

<homelab_context>
This homelab infrastructure repository contains:
- Proxmox VM/CT configurations (reference VM/CT IDs in commits)
- Docker Compose service definitions
- Ansible playbooks and roles
- Monitoring stack configs (Grafana/Prometheus)
- Sensitive data in Vault storage (ensure .gitignore coverage)
- Infrastructure as Code (Terraform, Ansible)

Key infrastructure components to reference:
- VMs: 100 (docker-hub), 101 (monitoring-docker), 106 (Ansible-Control), 109-110 (web servers), 111 (database)
- CTs: 102 (nginx/NPM), 103 (netbox), 112 (twingate), 113 (n8n)
- Storage: Vault (sensitive), PBS-Backups (disaster recovery)
</homelab_context>

<output_format>
When performing operations:
1. Explain what you're about to do and why
2. Show the exact Git commands you'll execute
3. Display relevant output or confirmations
4. Summarize the result and next steps
5. Highlight any warnings or recommendations
</output_format>

<escalation>
Seek user clarification when:
- Merge conflicts require manual resolution decisions
- Multiple valid branching strategies could apply
- Commit scope is ambiguous or affects multiple areas
- Destructive operations are requested
- Repository state is unclear or potentially corrupted
</escalation>

Enhancement 5: Add Proxmox Safety to Lab-Operator (5 minutes)

File: /home/jramos/homelab/sub-agents/lab-operator.md After line 26

3.  **Container Safety**: When modifying `docker-compose.yml`, always run `docker compose config` to validate syntax before deploying.
+ 4.  **Proxmox VM/CT Operations**: Confirm before `qm destroy`, `pct destroy`, or snapshot deletion.
+ 5.  **Backup Verification**: Before major infrastructure changes, verify recent PBS backup exists.
+ 6.  **Monitoring Impact**: Consider impact on Grafana/Prometheus metrics when changing infrastructure.

3.4 Agent Architecture Proposals

Should Any Agents Be Split?

Librarian Analysis

Current: Single agent handling all Git operations (127 lines)

Recommendation: DO NOT SPLIT

Rationale:

Git operations are cohesive and related
Splitting would create handoff friction
Same tools needed for all Git tasks
Better solution: Extract common patterns to CLAUDE.md, reduce line count

Lab-Operator Analysis

Current: Single agent for infrastructure operations (33 lines)

Recommendation: DO NOT SPLIT (currently)

Rationale:

Single-node homelab has interconnected operations
Splitting (docker-specialist, proxmox-specialist, network-specialist) would fragment workflow
A single deployment may touch Proxmox, Docker, and networking
Future consideration: If infrastructure grows to multi-node, reconsider

Backend-Builder Analysis

Current: Single agent for all code/IaC (28 lines)

Recommendation: CONSIDER SPLITTING (medium priority)

Proposed Split:

IaC-Builder: Ansible, Terraform, Docker Compose (declarative configs)
Script-Developer: Python, Shell (imperative code, custom tooling)

Rationale:

Different mental models: declarative vs imperative
Different validation approaches
Different integration points (IaC-Builder → lab-operator; Script-Developer → monitoring)
Manageable cognitive load for each

Implementation Effort: 60 minutes

New Agent Proposals

1. Infrastructure-Auditor (HIGH PRIORITY)

Purpose: Security scanning, compliance checking, configuration drift detection

Justification:

Current agents focus on creation/modification, not validation
Homelab has sensitive components (Vault storage, credentials in monitoring configs)
PBS backups need verification
Configuration drift between IaC and reality

Proposed Definition:

---
name: infrastructure-auditor
description: >
  Security and compliance specialist. Scans for misconfigurations, exposed credentials,
  outdated packages, configuration drift, and security vulnerabilities.
tools: [Bash, Read, Grep, Glob]
model: sonnet
color: red
---

<system_role>
You are the **Infrastructure Auditor** - Security and compliance specialist.
Your job is to find problems before they become incidents.
</system_role>

<audit_domains>
1. **Credential Exposure**: Scan for hardcoded secrets, exposed API keys, plaintext passwords
   - Check for patterns: password=, api_key=, token=, secret=
   - Verify .gitignore coverage for sensitive files
   - Validate environment variable usage vs hardcoding

2. **Configuration Drift**: Compare running state to declared state
   - Compare docker-compose configs to running containers
   - Verify Proxmox VM/CT configs match documentation
   - Check Ansible playbook state vs actual system state

3. **Package Security**: Check for outdated packages with known CVEs
   - Proxmox package versions
   - Docker image versions
   - Python package versions

4. **Backup Verification**: Validate PBS backup integrity and recency
   - Check last backup timestamp for critical VMs/CTs
   - Verify backup size and integrity
   - Test restore procedures (read-only simulation)

5. **Permission Audit**: Review file permissions and access controls
   - Docker socket exposure
   - Sudo access configurations
   - File ownership and permissions

6. **Network Security**: Review exposed services and ports
   - Check for services listening on 0.0.0.0
   - Verify firewall rules
   - Audit reverse proxy configurations
</audit_domains>

<safety_protocols>
1. **READ-ONLY OPERATIONS**: NEVER modify anything - audit only
2. **Report Findings**: Document issues, do not auto-remediate
3. **Escalate Critical Issues**: Immediately flag exposed credentials or critical vulnerabilities
4. **No Destructive Checks**: Do not run tests that could impact running services
</safety_protocols>

<audit_checklist>
Run these checks on demand or scheduled:
- [ ] Scan all .env, .yml, .yaml files for hardcoded credentials
- [ ] Verify .gitignore covers all sensitive files
- [ ] Check PBS backup status for all critical VMs/CTs
- [ ] Compare Grafana datasources to prometheus.yml
- [ ] Audit Nginx Proxy Manager SSL certificate expiration
- [ ] Check for exposed Docker sockets
- [ ] Verify Twingate connector status
- [ ] Review n8n workflow credential storage
</audit_checklist>

Implementation Effort: 45 minutes

Priority: HIGH - Addresses security gap in current agent coverage

2. Backup-Manager (DEFER)

Purpose: PBS operations, disaster recovery, restore testing

Recommendation: DEFER - Lab-Operator can handle backup operations

Rationale:

PBS operations infrequent
Lab-Operator has necessary tools and expertise
Would add complexity without significant benefit
Reconsider: When backup operations become more complex or automated

3. Monitoring-Specialist (DEFER)

Purpose: Grafana dashboards, Prometheus queries, alerting

Recommendation: DEFER - Backend-Builder can handle monitoring configs

Rationale:

Monitoring configs are code (YAML, PromQL)
Backend-Builder has appropriate tools
Grafana/Prometheus documentation is good
Reconsider: When alerting becomes complex or requires dedicated expertise

3.5 Proposed Final Agent Architecture

Recommended Structure (5-6 Agents)

┌─────────────────────────────────────────────────────────────────┐
│                      DOCUMENTATION LAYER                         │
│  ┌────────────────────────────────────────────────────────┐    │
│  │  Scribe (documentation, architecture, diagrams)        │    │
│  └────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    VERSION CONTROL LAYER                         │
│  ┌────────────────────────────────────────────────────────┐    │
│  │  Librarian (git operations, commits, branches)         │    │
│  └────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                     OPERATIONS LAYER                             │
│  ┌────────────────────┐  ┌────────────────────────────────┐    │
│  │  Lab-Operator      │  │  Infrastructure-Auditor (NEW)  │    │
│  │  (infra mgmt)      │  │  (security scanning)           │    │
│  └────────────────────┘  └────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    DEVELOPMENT LAYER                             │
│  ┌────────────────────┐  ┌────────────────────────────────┐    │
│  │  IaC-Builder       │  │  Script-Developer              │    │
│  │  (Ansible, Terraform,│  (Python, Shell automation)      │    │
│  │   Docker Compose)  │  │                                │    │
│  └────────────────────┘  └────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Implementation Phases

Phase 1: Critical Fixes (Day 1 - 15 minutes)

Fix librarian tools
Add Bash to backend-builder
Fix CLAUDE.md GitLab references
Add Write tool to all agents

Phase 2: High-Impact (Week 1 - 90 minutes)

Add Quick Reference to CLAUDE.md
Add Agent Routing Guide to CLAUDE.md
Update lab-operator domain expertise
Add validation rules to backend-builder

Phase 3: Quality Enhancements (Week 2 - 180 minutes)

Add YAML frontmatter to CLAUDE.md
Restructure librarian with XML
Add safety protocols to all agents
Remove "Steve" references

Phase 4: Architecture Expansion (Month 1 - 120 minutes)

Create Infrastructure-Auditor agent
Split Backend-Builder into IaC-Builder + Script-Developer
Test and refine agent boundaries

Part 4: Implementation Checklist

Quick Reference: Files to Modify

File	Priority 1	Priority 2	Priority 3	Total Changes
`/home/jramos/homelab/CLAUDE.md`	5 fixes	3 additions	1 frontmatter	9 edits
`/home/jramos/homelab/sub-agents/scribe.md`	3 fixes	0	2 enhancements	5 edits
`/home/jramos/homelab/sub-agents/librarian.md`	2 fixes	1 restructure	1 restructure	4 edits
`/home/jramos/homelab/sub-agents/lab-operator.md`	2 fixes	1 update	2 additions	5 edits
`/home/jramos/homelab/sub-agents/backend-builder.md`	2 fixes	1 addition	1 addition	4 edits
TOTAL	14	6	7	27 edits

Detailed Implementation Checklist

Priority 1: Critical Fixes (15 minutes)

librarian.md: Add tools field (line 5)
- tools: [Bash, Read, Grep, Glob, Edit, Write]
librarian.md: Condense description (line 3)
- Remove examples, keep 2-3 sentences
backend-builder.md: Add Bash and Write (line 6)
- tools: [Read, Edit, Grep, Glob, Write, Bash]
backend-builder.md: Add color field
- color: orange
scribe.md: Add Write tool (line 6)
- tools: [Read, Grep, Glob, Edit, Write]
scribe.md: Add color field
- color: blue
scribe.md: Delete broken placeholder (line 20)
lab-operator.md: Add Glob and Write (line 6)
- tools: [Bash, Read, Grep, Glob, Edit, Write]
lab-operator.md: Add color field
- color: green
CLAUDE.md: Fix GitLab → Gitea (lines 62, 97, 105)
CLAUDE.md: Fix working directory (line 126)
CLAUDE.md: Delete "not initialized" line (127)
CLAUDE.md: Fix storage percentage reference (line 89)

Priority 2: High-Impact Improvements (90 minutes)

CLAUDE.md: Add YAML frontmatter (beginning)
CLAUDE.md: Add Quick Reference section (after line 8)
CLAUDE.md: Add Agent Routing Guide (after Quick Reference)
CLAUDE.md: Replace duplicate tables with references (lines 17-56)
lab-operator.md: Update domain expertise (lines 16-20)
backend-builder.md: Add Docker Compose guidance (after line 20)
backend-builder.md: Add validation rules section (after line 27)

Priority 3: Quality Enhancements (180 minutes)

scribe.md: Remove "Steve" reference (line 11)
scribe.md: Update docs directory reference (line 16)
scribe.md: Add safety protocols section (after line 23)
librarian.md: Restructure with XML tags (entire prompt body)
librarian.md: Move examples to prompt body
lab-operator.md: Remove "Steve" reference (line 11)
lab-operator.md: Add Proxmox safety protocols (after line 26)
backend-builder.md: Remove "Steve" reference (line 11)

Future Enhancements (Optional)

Create infrastructure-auditor.md agent
Split backend-builder into iac-builder and script-developer
Extract common patterns from librarian to CLAUDE.md
Add examples section to CLAUDE.md
Create agent capability testing suite

Part 5: Expected Outcomes

Before vs After Comparison

Current State Issues

Issue	Impact	Affected Agents
Librarian has no tools	BLOCKING - Cannot execute ANY git commands	1
Backend-Builder lacks Bash	CRITICAL - Cannot test code	1
No agent has Write tool	HIGH - Cannot create new files	4
CLAUDE.md has stale GitLab refs	HIGH - Misleading documentation	N/A
Duplicate infrastructure tables	MEDIUM - Maintenance burden	N/A
Inconsistent agent structure	MEDIUM - Confusion, learning curve	4

Post-Implementation Benefits

Improvement	Benefit	Measurable Impact
All agents have proper tools	Functional, can complete tasks	100% → 100% capability
CLAUDE.md has Quick Reference	Faster context gathering	~5 min → ~30 sec
Agent Routing Guide	Clear task assignment	Reduced user decision time
No duplicate tables	Easier maintenance	5 files → 1 file to update
Consistent agent structure	Easier to understand/maintain	Uniform XML structure
Infrastructure-Auditor	Security coverage	New capability

Success Metrics

Quantitative

Tool Coverage: 0% (librarian) → 100% (all agents functional)
Documentation Accuracy: 5 stale references → 0 stale references
Agent Consistency: 25% use XML tags → 100% use XML tags
Color Field Coverage: 25% have color → 100% have color
Information Duplication: Infrastructure in 5 files → 1 canonical file

Qualitative

User Experience: Clear agent selection vs guesswork
Maintenance Burden: Single source of truth for infrastructure
Security Posture: Proactive scanning capability
Documentation Quality: Up-to-date, accurate, easy to navigate
Agent Clarity: Well-defined boundaries and responsibilities

Conclusion

This analysis identified critical blocking issues (librarian non-functional, backend-builder cannot test code) alongside significant structural improvements (outdated references, duplicate information, missing routing guidance).

Immediate Action Required

Fix librarian tools (2 minutes) - BLOCKING issue
Add Bash to backend-builder (1 minute) - CRITICAL issue
Fix CLAUDE.md GitLab references (5 minutes) - HIGH priority

Total time for critical fixes: 15 minutes

High-Value Improvements

Add Quick Reference to CLAUDE.md (15 min)
Add Agent Routing Guide (30 min)
Remove duplicate infrastructure tables (20 min)

Total time for high-impact: 90 minutes

Long-Term Vision

With all improvements implemented:

All agents functional with proper tools
Clear documentation with quick reference and routing guide
Consistent structure across all agent definitions
Security coverage with infrastructure-auditor
Reduced maintenance through single source of truth

Total implementation effort: ~5 hours for complete transformation

Generated: 2025-12-07 Analysis Tool: Claude Opus 4.5 Scope: CLAUDE.md + 4 sub-agents (scribe, librarian, lab-operator, backend-builder) Total Issues Identified: 27 (5 critical, 12 high-impact, 10 enhancements)

55 KiB Raw Blame History

Claude Code Homelab Repository - Comprehensive Analysis & Improvement Recommendations

Executive Summary

Critical Findings

Quick Win Opportunities (5-20 minutes each)

Total Estimated Effort

Part 1: CLAUDE.md Analysis

1.1 Current State Assessment

Strengths

Critical Issues

Structural Issues

1.1.1 Information Duplication

1.1.2 Missing Critical Sections

1.1.3 Outdated Information

1.2 Recommended CLAUDE.md Restructuring

Priority 1: Immediate Fixes (5 minutes total)

Fix 1: Update GitLab References

Fix 2: Correct Working Directory

Fix 3: Remove False Statement

Fix 4: Fix Storage Percentage

Priority 2: Add Quick Reference Section (15 minutes)

Priority 2: Add Agent Routing Guide (30 minutes)

Priority 2: Remove Duplicate Infrastructure Tables (20 minutes)

Priority 3: Add YAML Frontmatter (5 minutes)

1.3 Complete Proposed CLAUDE.md Structure

Part 2: Sub-Agent Architecture Analysis

2.1 Agent Inventory

2.2 Individual Agent Reviews

2.2.1 Scribe Agent

Frontmatter (Lines 1-8)

Prompt Body Analysis

Scribe Recommendations

2.2.2 Librarian Agent

Frontmatter (Lines 1-6) - CRITICAL ISSUE

Description Field - Major Problem

Prompt Body (Lines 8-125)

Librarian Recommendations

2.2.3 Lab-Operator Agent

Frontmatter (Lines 1-8)

Prompt Body (Lines 10-33)

Lab-Operator Recommendations

2.2.4 Backend-Builder Agent

Frontmatter (Lines 1-8)

Prompt Body (Lines 10-27)

Backend-Builder Recommendations

2.3 Cross-Agent Analysis

Tool Distribution Matrix

Critical Tool Gaps

Consistency Issues

Role Boundary Ambiguities

2.4 Recommended Tool Distribution

Proposed Standard Toolsets

Part 3: Actionable Recommendations

3.1 Priority 1 - Critical Fixes (15 minutes total)

Fix 1: Librarian - Add Tools (2 minutes) BLOCKING

Fix 2: Backend-Builder - Add Bash (1 minute) CRITICAL

Fix 3: CLAUDE.md - Fix GitLab References (5 minutes)

Fix 4: Scribe - Remove Broken Placeholder (1 minute)

Fix 5: Add Write Tool to All Agents (3 minutes)

Fix 6: Add Missing Color Fields (3 minutes)

3.2 Priority 2 - High-Impact Improvements (90 minutes total)

Improvement 1: CLAUDE.md - Add Quick Reference (15 minutes)

Improvement 2: CLAUDE.md - Add Agent Routing Guide (30 minutes)

Improvement 3: CLAUDE.md - Remove Duplicate Tables (20 minutes)

Improvement 4: Lab-Operator - Update Domain Expertise (15 minutes)

Improvement 5: Backend-Builder - Add Docker Compose & Validation (10 minutes)

3.3 Priority 3 - Quality Enhancements (180 minutes total)

Enhancement 1: CLAUDE.md - Add YAML Frontmatter (5 minutes)

Enhancement 2: Remove "Steve" References (5 minutes)

Enhancement 3: Add Safety Protocols to Scribe (10 minutes)

Enhancement 4: Librarian - Add XML Structure (30 minutes)

Enhancement 5: Add Proxmox Safety to Lab-Operator (5 minutes)

3.4 Agent Architecture Proposals

Should Any Agents Be Split?

Librarian Analysis

Lab-Operator Analysis

Backend-Builder Analysis

New Agent Proposals

1. Infrastructure-Auditor (HIGH PRIORITY)

2. Backup-Manager (DEFER)

55 KiB

Raw Blame History