Compare commits
11 Commits
52faebb63a
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| e08951de21 | |||
| e481c95da4 | |||
| 472c5be1f1 | |||
| fc9a3c6fd6 | |||
| 7df2b1075e | |||
| c4962194e3 | |||
| 07f9638d8b | |||
| 892684c46e | |||
| 698a5b531a | |||
| d3dc899b30 | |||
| 004e3da77c |
2
.gitignore
vendored
2
.gitignore
vendored
@@ -146,3 +146,5 @@ scripts/fixers/fix_n8n_db_c_locale.sh
|
||||
# ----------------
|
||||
# Add any custom patterns specific to your homelab below:
|
||||
.env
|
||||
*.nullbyte-backup # Nullbyte corruption recovery backups
|
||||
*.control-chars-backup # Control character fix backups
|
||||
|
||||
102
BUG_REPORT.md
Normal file
102
BUG_REPORT.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Bug Report: Scribe Agent Tool Permission Mismatch
|
||||
|
||||
**Date**: 2025-12-18
|
||||
**Severity**: High
|
||||
**Component**: Task Tool / Agent Tooling System
|
||||
|
||||
## Issue Summary
|
||||
|
||||
The `scribe` sub-agent configuration explicitly declares access to `[Read, Grep, Glob, Edit, Write]` tools in `/home/jramos/homelab/sub-agents/scribe.md`, but when launched via the Task tool, it only receives `[Grep, Glob, Edit]` - missing critical `Read` and `Write` tools.
|
||||
|
||||
## Expected Behavior
|
||||
|
||||
When launching a sub-agent via the Task tool, the agent should receive all tools listed in its configuration file under the `tools:` directive.
|
||||
|
||||
From `sub-agents/scribe.md` line 9:
|
||||
```yaml
|
||||
tools: [Read, Grep, Glob, Edit, Write]
|
||||
```
|
||||
|
||||
## Actual Behavior
|
||||
|
||||
When the scribe agent is launched, it reports:
|
||||
> "I sincerely apologize for this limitation. The documentation content is ready and comprehensive - I just cannot execute the file operations with my current restricted toolset (Grep, Glob, Edit only - no Write, no Read)."
|
||||
|
||||
The agent is NOT receiving:
|
||||
- `Read` tool (cannot read files for context)
|
||||
- `Write` tool (cannot create new documentation files)
|
||||
|
||||
## Impact
|
||||
|
||||
**Critical workflow disruption**: The scribe agent's primary purpose is documentation creation and maintenance. Without the Write tool, it cannot:
|
||||
- Create new README.md files
|
||||
- Create service documentation
|
||||
- Generate architecture diagrams in new files
|
||||
|
||||
Without the Read tool, it cannot:
|
||||
- Verify current infrastructure state
|
||||
- Cross-reference existing documentation
|
||||
- Understand context before updating
|
||||
|
||||
## Reproduction Steps
|
||||
|
||||
1. Create a sub-agent configuration with `tools: [Read, Grep, Glob, Edit, Write]`
|
||||
2. Launch the agent via Task tool with `subagent_type: scribe`
|
||||
3. Attempt to use Write tool to create a new file
|
||||
4. Agent reports tool is unavailable
|
||||
|
||||
## Configuration Evidence
|
||||
|
||||
**File**: `/home/jramos/homelab/sub-agents/scribe.md`
|
||||
|
||||
Lines 9-10:
|
||||
```yaml
|
||||
tools: [Read, Grep, Glob, Edit, Write]
|
||||
model: haiku-4.5
|
||||
```
|
||||
|
||||
Lines 27-33 (Scribe's own documentation):
|
||||
```markdown
|
||||
**CRITICAL TOOL INSTRUCTIONS:**
|
||||
You possess a `Write` tool. You must use it correctly:
|
||||
|
||||
1. **To Create a New File**: Use the `Write` tool.
|
||||
* *Do not* attempt to use `Edit` on a non-existent file.
|
||||
* *Do not* say "I cannot create files." You have the `Write` tool specifically for this.
|
||||
* If you think you don't have the `Write` tool, **CHECK AGAIN**. It is authorized in your configuration.
|
||||
```
|
||||
|
||||
The scribe configuration explicitly expects Write tool access, but the runtime environment doesn't provide it.
|
||||
|
||||
## Workaround
|
||||
|
||||
Use the `lab-operator` agent instead, which has `[Bash, Read, Grep, Glob, Edit, Write]` tools and successfully receives all declared tools at runtime.
|
||||
|
||||
## Additional Context
|
||||
|
||||
- **Other affected agents**: Unknown - need to test `backend-builder`, `lab-operator`, and `librarian`
|
||||
- **Main agent**: Has access to all tools without restriction
|
||||
- **Agent launch mechanism**: Task tool with `subagent_type` parameter
|
||||
- **Agent configs location**: `/home/jramos/homelab/sub-agents/*.md`
|
||||
|
||||
## Recommended Fix
|
||||
|
||||
Investigate the Task tool's agent initialization logic to ensure it properly grants all tools listed in the agent's YAML frontmatter configuration. The tool permission system should honor the declarative configuration without filtering.
|
||||
|
||||
## Test Case
|
||||
|
||||
```bash
|
||||
# Verify each agent receives its declared tools
|
||||
for agent in scribe lab-operator backend-builder librarian; do
|
||||
echo "Testing $agent..."
|
||||
# Launch agent and check available tools
|
||||
done
|
||||
```
|
||||
|
||||
Expected: Each agent receives exactly the tools listed in its `tools: []` configuration.
|
||||
|
||||
---
|
||||
|
||||
**Reporter**: Main Agent (Claude Code)
|
||||
**Priority**: High - Breaks core documentation workflow
|
||||
**Status**: Open
|
||||
187
CLAUDE.md
187
CLAUDE.md
@@ -1,3 +1,17 @@
|
||||
---
|
||||
version: 2.2.0
|
||||
last_updated: 2025-12-07
|
||||
infrastructure_source: CLAUDE_STATUS.md
|
||||
repository_type: homelab
|
||||
primary_node: serviceslab
|
||||
proxmox_version: 8.3.3
|
||||
vm_count: 8
|
||||
template_count: 2
|
||||
lxc_count: 4
|
||||
working_directory: /home/jramos/homelab
|
||||
git_remote: http://192.168.2.102:3060/jramos/homelab.git
|
||||
---
|
||||
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
@@ -6,60 +20,91 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||
|
||||
This is a homelab infrastructure repository managing a Proxmox VE 8.3.3-based services and development laboratory environment. The infrastructure follows a hybrid architecture pattern combining traditional virtualization (KVM/QEMU) with containerization (LXC) for optimal resource utilization and service isolation.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Resource | Value |
|
||||
|----------|-------|
|
||||
| **Proxmox Node** | serviceslab (192.168.2.200:8006) |
|
||||
| **Proxmox Version** | PVE 8.3.3 |
|
||||
| **Infrastructure** | 8 VMs, 2 Templates, 4 LXC containers |
|
||||
| **Monitoring** | http://192.168.2.114:3000 (Grafana) |
|
||||
| **Version Control** | Gitea at 192.168.2.102:3060 |
|
||||
| **Working Directory** | /home/jramos/homelab |
|
||||
| **Live Status** | See `CLAUDE_STATUS.md` for current inventory |
|
||||
|
||||
**Key Services:**
|
||||
- VM 101 (monitoring-docker): Grafana, Prometheus, PVE Exporter
|
||||
- CT 102 (nginx): Nginx Proxy Manager (reverse proxy)
|
||||
- CT 112 (twingate-connector): Zero-trust network access
|
||||
- CT 113 (n8n): Workflow automation at 192.168.2.107
|
||||
|
||||
## Agent Selection Guide
|
||||
|
||||
When working with this repository, choose the appropriate agent based on task type:
|
||||
|
||||
| Task Type | Primary Agent | Tools Available | Notes |
|
||||
|-----------|---------------|-----------------|-------|
|
||||
| **Git Operations** | `librarian` | Bash, Read, Grep, Edit, Write | Commits, branches, merges, .gitignore |
|
||||
| **Documentation** | `scribe` | Read, Grep, Glob, Edit, Write | READMEs, architecture docs, diagrams |
|
||||
| **Infrastructure Ops** | `lab-operator` | Bash, Read, Grep, Glob, Edit, Write | Proxmox, Docker, networking, storage |
|
||||
| **Code/IaC Development** | `backend-builder` | Bash, Read, Grep, Glob, Edit, Write | Ansible, Terraform, Python, Shell |
|
||||
| **File Creation** | Main Agent | All tools | Use when sub-agents lack specific tools |
|
||||
| **Complex Multi-Agent Tasks** | Main Agent | All tools | Coordinates between specialized agents |
|
||||
|
||||
### Task Routing Decision Tree
|
||||
|
||||
```
|
||||
Is this a git/version control task?
|
||||
├── Yes → Use librarian
|
||||
└── No ↓
|
||||
|
||||
Is this documentation (README, guides, diagrams)?
|
||||
├── Yes → Use scribe
|
||||
└── No ↓
|
||||
|
||||
Does this require system commands (docker, ssh, proxmox)?
|
||||
├── Yes → Use lab-operator
|
||||
└── No ↓
|
||||
|
||||
Is this code/config creation (Ansible, Python, Terraform)?
|
||||
├── Yes → Use backend-builder
|
||||
└── No → Use Main Agent
|
||||
```
|
||||
|
||||
### Agent Collaboration Patterns
|
||||
|
||||
**Documentation Workflow:**
|
||||
1. `backend-builder` or `lab-operator` creates/modifies infrastructure
|
||||
2. `scribe` updates documentation
|
||||
3. `librarian` commits all changes
|
||||
|
||||
**Infrastructure Deployment:**
|
||||
1. `backend-builder` writes IaC (Ansible/Terraform/Compose)
|
||||
2. `lab-operator` deploys to Proxmox/Docker
|
||||
3. `scribe` documents deployment
|
||||
4. `librarian` commits configuration
|
||||
|
||||
## Infrastructure Overview
|
||||
|
||||
### Proxmox Environment
|
||||
- **Platform**: Proxmox Virtual Environment 8.3.3
|
||||
- **Architecture Pattern**: Services/Development Laboratory
|
||||
- **Primary Node**: `serviceslab` (single-node cluster)
|
||||
- **Deployment Model**: Hybrid VM + LXC container approach
|
||||
**For detailed, current infrastructure inventory, see:**
|
||||
- **Live Status**: `CLAUDE_STATUS.md` (most current)
|
||||
- **Service Details**: `services/README.md`
|
||||
- **Complete Index**: `INDEX.md`
|
||||
|
||||
### Key Services & Virtual Machines (QEMU/KVM)
|
||||
**Quick Summary:**
|
||||
- **VMs**: 8 total (IDs: 100, 101, 105, 106, 108-111)
|
||||
- **Templates**: 2 total (IDs: 104, 107)
|
||||
- **LXC Containers**: 4 total (IDs: 102, 103, 112, 113)
|
||||
- **Storage Pools**: local, local-lvm, Vault (ZFS), PBS-Backups, iso-share
|
||||
- **Monitoring**: VM 101 at 192.168.2.114 (Grafana/Prometheus/PVE Exporter)
|
||||
|
||||
The infrastructure employs full VMs for services requiring kernel-level isolation, complex dependencies, or heavyweight applications:
|
||||
|
||||
| VM ID | Name | Purpose | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| 100 | docker-hub | Container registry/Docker hub mirror | Local container image caching |
|
||||
| 101 | monitoring-docker | Monitoring stack | Grafana/Prometheus/PVE Exporter at 192.168.2.114 |
|
||||
| 104 | ubuntu-dev | Ubuntu development environment | Additional dev workstation |
|
||||
| 105 | dev | Development environment | General-purpose development workstation |
|
||||
| 106 | Ansible-Control | Automation control node | IaC orchestration, configuration management |
|
||||
| 107 | ubuntu-docker | Ubuntu Docker host | Docker-focused environment |
|
||||
| 108 | CML | Cisco Modeling Labs | Network simulation/testing environment |
|
||||
| 109 | web-server-01 | Web application server | Production-like web tier (clustered) |
|
||||
| 110 | web-server-02 | Web application server | Load-balanced pair with web-server-01 |
|
||||
| 111 | db-server-01 | Database server | Backend data tier |
|
||||
|
||||
### Containers (LXC)
|
||||
|
||||
Lightweight services leveraging LXC for reduced overhead and faster provisioning:
|
||||
|
||||
| CT ID | Name | Purpose | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| 102 | nginx | Reverse proxy/load balancer | Front-end traffic management (NPM) |
|
||||
| 103 | netbox | Network documentation/IPAM | Infrastructure source of truth |
|
||||
| 112 | twingate-connector | Zero-trust network access | Secure remote access connector |
|
||||
| 113 | n8n | Workflow automation | n8n.io platform at 192.168.2.107 |
|
||||
|
||||
### Storage Architecture
|
||||
|
||||
The storage layout demonstrates a well-organized approach to data separation:
|
||||
|
||||
| Storage Pool | Type | Usage | Purpose |
|
||||
|--------------|------|-------|---------|
|
||||
| local | Directory | 15.13% | System files, ISOs, templates |
|
||||
| local-lvm | LVM-Thin | 0.0% | VM disk images (thin provisioned) |
|
||||
| Vault | NFS/Directory | 10.88% | Secure storage for sensitive data |
|
||||
| PBS-Backups | Proxmox Backup Server | 27.43% | Automated backup repository |
|
||||
| iso-share | NFS/CIFS | 1.4% | Installation media library |
|
||||
| localnetwork | Network share | N/A | Shared resources across infrastructure |
|
||||
**Note**: Infrastructure details change frequently. Always reference `CLAUDE_STATUS.md` for accurate counts, IPs, and status.
|
||||
|
||||
### Architecture Patterns & Design Decisions
|
||||
|
||||
**Tiered Application Architecture**: The infrastructure implements a classic three-tier design with dedicated web servers (109, 110), database server (111), and reverse proxy (102), suggesting this lab is used for practicing production-like deployments.
|
||||
|
||||
**Automation-First Approach**: The presence of Ansible-Control (106), GitLab (101), and NetBox (103) indicates a focus on Infrastructure as Code and proper documentation practices—rather civilized.
|
||||
**Automation-First Approach**: The presence of Ansible-Control (106), Gitea (100), and NetBox (103) indicates a focus on Infrastructure as Code and proper documentation practices—rather civilized.
|
||||
|
||||
**Network Simulation Capability**: CML (108) suggests network engineering activities, possibly testing configurations before production deployment.
|
||||
|
||||
@@ -69,6 +114,8 @@ The storage layout demonstrates a well-organized approach to data separation:
|
||||
|
||||
**Zero-Trust Security**: Implementation of Twingate connector (CT 112) demonstrates modern security practices, providing secure remote access without traditional VPN complexity.
|
||||
|
||||
**Backup Strategy**: PBS-Backups utilization is at 27.43% (see CLAUDE_STATUS.md for current metrics). Automated daily incremental backups with weekly full backups ensure data protection across all VMs and containers.
|
||||
|
||||
## Working with This Environment
|
||||
|
||||
### Universal Workflow
|
||||
@@ -78,38 +125,43 @@ For every complex task, every Agent must follow this loop:
|
||||
3. **Update**: Edit `CLAUDE_STATUS.md` to mark your step as `[x]` and update the "Current Context".
|
||||
|
||||
### Status File Template
|
||||
If `CLAUDE_STATUS.md` is missing, initialize it with:
|
||||
- **Goal**: [User Goal]
|
||||
- **Phase**: [Planning / Dev / Deploy]
|
||||
- **Checklist**: [List of steps]
|
||||
If `CLAUDE_STATUS.md` is missing or corrupted, recover it from the latest disaster recovery export:
|
||||
- **Location**: `disaster-recovery/homelab-export-YYYYMMDD-HHMMSS/CLAUDE_STATUS.md`
|
||||
- **Alternative**: Use the scribe agent to recreate from current infrastructure state
|
||||
|
||||
**Minimum required structure:**
|
||||
```markdown
|
||||
# Homelab Infrastructure Status
|
||||
**Last Updated**: YYYY-MM-DD HH:MM:SS
|
||||
**Export Reference**: disaster-recovery/homelab-export-YYYYMMDD-HHMMSS
|
||||
|
||||
## Current Infrastructure Snapshot
|
||||
- Proxmox VE 8.3.3 on serviceslab (192.168.2.200)
|
||||
- 8 VMs, 2 Templates, 4 LXC containers
|
||||
|
||||
## Current Initiative
|
||||
**Goal**: [Initiative description]
|
||||
**Phase**: [Planning / Implementation / Testing]
|
||||
**Progress Checklist**: [Task list with checkboxes]
|
||||
|
||||
## Recent Infrastructure Changes
|
||||
[Chronological log of changes with dates]
|
||||
```
|
||||
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Backup Strategy**: With PBS-Backups at 21.6% utilization and excellent uptime (27-68 days), ensure regular backup schedules are maintained. Consider implementing the 3-2-1 rule if not already in place.
|
||||
|
||||
2. **Resource Management**: Monitor the local-lvm pool (currently 0.0%)—this appears to be reserved capacity. Ensure thin provisioning doesn't lead to overcommitment.
|
||||
|
||||
3. **Configuration Management**: Utilize the Ansible-Control node (106) for infrastructure changes. Avoid manual configuration drift.
|
||||
|
||||
4. **Documentation**: NetBox (103) should be the single source of truth for IP addressing, VLANs, and service inventory. Keep it updated.
|
||||
|
||||
5. **Version Control**: GitLab (101) should house all Infrastructure as Code, scripts, and configuration files from this repository.
|
||||
|
||||
6. **Load Balancing**: The paired web servers (109, 110) suggest HA testing—ensure nginx (102) is properly configured for failover.
|
||||
|
||||
### Access Patterns
|
||||
|
||||
- **Proxmox Web UI**: Primary management interface for VM/CT lifecycle operations
|
||||
- **Ansible**: Automated configuration deployment and orchestration
|
||||
- **GitLab**: CI/CD pipelines for infrastructure testing and deployment
|
||||
- **Gitea**: CI/CD pipelines for infrastructure testing and deployment
|
||||
- **NetBox**: Network documentation and IP address management
|
||||
|
||||
### Maintenance Considerations
|
||||
|
||||
- **Uptime**: Services showing 27-68 days uptime—schedule maintenance windows for kernel updates
|
||||
- **Storage Growth**: PBS-Backups at 21.6% allows healthy retention; review backup policies quarterly
|
||||
- **Capacity Planning**: Current utilization suggests comfortable headroom; monitor trends via Proxmox metrics
|
||||
- **Uptime**: Track uptime metrics in disaster recovery exports for trend analysis
|
||||
- **Storage Growth**: PBS-Backups at 27.43%, Vault at 10.88%, local at 15.13% (see CLAUDE_STATUS.md for current metrics)
|
||||
- **Capacity Planning**: Current utilization suggests comfortable headroom; monitor trends via Proxmox metrics in monitoring-docker (101)
|
||||
|
||||
## Development Setup
|
||||
|
||||
@@ -123,7 +175,6 @@ The repository structure will house:
|
||||
## Notes
|
||||
|
||||
- This is a Windows Subsystem for Linux (WSL2) environment
|
||||
- Working directory: /mnt/c/Users/fam1n/Documents/homelab
|
||||
- This repository is not yet initialized as a git repository
|
||||
- Working directory: /home/jramos/homelab
|
||||
- Proxmox node `serviceslab` is the single point of management
|
||||
- Infrastructure demonstrates production-like patterns suitable for learning and testing
|
||||
|
||||
772
CLAUDE_STATUS.md
772
CLAUDE_STATUS.md
@@ -1,16 +1,40 @@
|
||||
# Homelab Infrastructure Status
|
||||
|
||||
**Last Updated**: 2025-12-07 12:00:40
|
||||
**Export Reference**: disaster-recovery/homelab-export-20251207-120040
|
||||
**Last Updated**: 2026-02-03
|
||||
**Export Reference**: disaster-recovery/homelab-export-20251211-144345
|
||||
**Current Session:** OpenClaw Deployment - VM 120
|
||||
|
||||
## Quick Resume (Current Session Context)
|
||||
|
||||
**Where We Are:** OpenClaw deployed and healthy on VM 120. Container running with full security hardening. Backups configured. Manual steps remain for NPM proxy host, Twingate resource, and Prometheus config on VM 101.
|
||||
|
||||
**Completed:**
|
||||
- [x] Config files created (`services/openclaw/`)
|
||||
- [x] VM 120 created and hardened (UFW, fail2ban, node-exporter, openclaw user)
|
||||
- [x] OpenClaw container deployed and healthy (v2026.2.1)
|
||||
- [x] Security verified (cap_drop ALL, non-root, read-only FS, no docker.sock)
|
||||
- [x] Prometheus scrape target added to repo copy
|
||||
- [x] PBS backup job created (daily 02:00, snapshot, zstd)
|
||||
- [x] Application backup script + weekly cron configured
|
||||
- [x] Documentation updated (README, services/README, CLAUDE_STATUS, INDEX)
|
||||
- [x] node_exporter installed and serving metrics on 192.168.2.120:9100
|
||||
|
||||
**Manual Steps Remaining:**
|
||||
- [ ] NPM: Create proxy host for openclaw.apophisnetworking.net -> 192.168.2.120:18789 (WebSocket support, SSL, TinyAuth)
|
||||
- [ ] Twingate: Add resource for 192.168.2.120 ports 18789/18790/1455
|
||||
- [ ] VM 101: Deploy updated prometheus.yml via Proxmox web console (SSH not configured)
|
||||
- [ ] Configure at least one LLM provider API key in /opt/openclaw/.env
|
||||
|
||||
---
|
||||
|
||||
## Current Infrastructure Snapshot
|
||||
|
||||
### Proxmox Environment
|
||||
- **Node**: serviceslab
|
||||
- **Version**: Proxmox VE 8.3.3
|
||||
- **Management IP**: 192.168.2.200
|
||||
- **Version**: Proxmox VE 8.4.0
|
||||
- **Management IP**: 192.168.2.100
|
||||
- **Architecture**: Single-node cluster
|
||||
- **Total Resources**: 10 VMs, 4 LXC Containers
|
||||
- **Total Resources**: 10 VMs, 2 Templates, 5 LXC Containers
|
||||
|
||||
---
|
||||
|
||||
@@ -18,33 +42,47 @@
|
||||
|
||||
| VM ID | Name | IP Address | Status | Purpose |
|
||||
|-------|------|------------|--------|---------|
|
||||
| 100 | docker-hub | 192.168.2.XXX | Running | Container registry/Docker hub mirror |
|
||||
| 100 | docker-hub | 192.168.2.102 | Running | Container registry/Docker hub mirror |
|
||||
| 101 | monitoring-docker | 192.168.2.114 | Running | Monitoring stack (Grafana/Prometheus/PVE Exporter) |
|
||||
| 104 | ubuntu-dev | - | Stopped | Ubuntu development environment |
|
||||
| 105 | dev | - | Stopped | General-purpose development workstation |
|
||||
| 106 | Ansible-Control | 192.168.2.XXX | Running | IaC orchestration, configuration management |
|
||||
| 107 | ubuntu-docker | - | Stopped | Ubuntu Docker host |
|
||||
| 108 | CML | - | Stopped | Cisco Modeling Labs - network simulation |
|
||||
| 109 | web-server-01 | 192.168.2.XXX | Running | Web application server (clustered) |
|
||||
| 110 | web-server-02 | 192.168.2.XXX | Running | Load-balanced pair with web-server-01 |
|
||||
| 111 | db-server-01 | 192.168.2.XXX | Running | Backend database server |
|
||||
| 114 | haos | 192.168.2.XXX | Running | Home Assistant OS - smart home automation platform |
|
||||
| 120 | openclaw | 192.168.2.120 | Running | OpenClaw AI chatbot gateway |
|
||||
|
||||
**Recent Changes**:
|
||||
- Added VM 120 (openclaw) for multi-platform AI chatbot gateway (2026-02-03)
|
||||
- Added VM 101 (monitoring-docker) for dedicated monitoring infrastructure
|
||||
- Removed VM 101 (gitlab) - service decommissioned
|
||||
|
||||
---
|
||||
|
||||
## Containers (LXC) - 4 Containers
|
||||
## VM Templates - 2 Templates
|
||||
|
||||
| Template ID | Name | Purpose |
|
||||
|-------------|------|---------|
|
||||
| 104 | ubuntu-dev | Ubuntu development environment template for cloning |
|
||||
| 107 | ubuntu-docker | Ubuntu Docker host template for rapid deployment |
|
||||
|
||||
**Note**: Templates are immutable base images used for cloning new VMs, not running workloads. They provide standardized configurations for consistent infrastructure provisioning.
|
||||
|
||||
---
|
||||
|
||||
## Containers (LXC) - 5 Containers
|
||||
|
||||
| CT ID | Name | IP Address | Status | Purpose |
|
||||
|-------|------|------------|--------|---------|
|
||||
| 102 | nginx | 192.168.2.101 | Running | Reverse proxy/load balancer & NPM |
|
||||
| 103 | netbox | 192.168.2.XXX | Stopped | Network documentation/IPAM |
|
||||
| 103 | netbox | 192.168.2.XXX | Running | Network documentation/IPAM |
|
||||
| 112 | twingate-connector | 192.168.2.XXX | Running | Zero-trust network access connector |
|
||||
| 113 | n8n | 192.168.2.107 | Running | Workflow automation platform |
|
||||
| 113 | n8n | 192.168.2.113 | Running | Workflow automation platform |
|
||||
| 115 | tinyauth | 192.168.2.10 | Running | SSO authentication layer for NetBox |
|
||||
|
||||
**Recent Changes**:
|
||||
- Added CT 115 (tinyauth) for SSO authentication integration with NetBox
|
||||
- Added CT 112 (twingate-connector) for zero-trust network security
|
||||
- Added CT 113 (n8n) for workflow automation
|
||||
- Removed CT 112 (Anytype) - replaced by n8n
|
||||
@@ -55,17 +93,17 @@
|
||||
|
||||
| Storage Pool | Type | Total | Used | % Used | Purpose |
|
||||
|--------------|------|-------|------|--------|---------|
|
||||
| local | Directory | - | - | 15.13% | System files, ISOs, templates |
|
||||
| local-lvm | LVM-Thin | - | - | 0.0% | VM disk images (thin provisioned) |
|
||||
| Vault | NFS/Directory | - | - | 10.88% | Secure storage for sensitive data |
|
||||
| PBS-Backups | PBS | - | - | 27.43% | Automated backup repository |
|
||||
| iso-share | NFS/CIFS | - | - | 1.4% | Installation media library |
|
||||
| local | Directory | - | - | 19.11% | System files, ISOs, templates |
|
||||
| local-lvm | LVM-Thin | - | - | 0.01% | VM disk images (thin provisioned) |
|
||||
| Vault | NFS/Directory | - | - | 12.13% | Secure storage for sensitive data |
|
||||
| PBS-Backups | PBS | - | - | 28.27% | Automated backup repository |
|
||||
| iso-share | NFS/CIFS | - | - | 1.45% | Installation media library |
|
||||
| localnetwork | Network Share | - | - | N/A | Shared resources across infrastructure |
|
||||
|
||||
**Capacity Notes**:
|
||||
- PBS-Backups utilization increased to 27.43% (healthy retention)
|
||||
- Vault utilization decreased to 10.88% (space optimization)
|
||||
- local storage at 15.13% (system overhead normal)
|
||||
- PBS-Backups utilization increased to 28.27% (healthy retention)
|
||||
- Vault utilization increased to 12.13% (data growth monitored)
|
||||
- local storage at 19.11% (system overhead within normal range)
|
||||
|
||||
---
|
||||
|
||||
@@ -87,7 +125,7 @@
|
||||
- **Integration**: Connects homelab to Twingate network
|
||||
|
||||
### Automation & Integration
|
||||
**CT 113** - n8n (192.168.2.107)
|
||||
**CT 113** - n8n (192.168.2.113)
|
||||
- **Purpose**: Workflow automation platform
|
||||
- **Technology**: n8n.io
|
||||
- **Database**: PostgreSQL 15+
|
||||
@@ -95,6 +133,29 @@
|
||||
- **Documentation**: `/home/jramos/homelab/services/README.md#n8n-workflow-automation`
|
||||
- **Status**: Operational (resolved database locale issues)
|
||||
|
||||
### Authentication & SSO
|
||||
**CT 115** - tinyauth (192.168.2.10)
|
||||
- **Purpose**: Lightweight SSO authentication layer
|
||||
- **Technology**: TinyAuth v4 (Docker container)
|
||||
- **Port**: 8000
|
||||
- **Domain**: tinyauth.apophisnetworking.net
|
||||
- **Integration**: Authentication gateway for NetBox via Nginx Proxy Manager
|
||||
- **Security**: Bcrypt-hashed credentials, HTTPS enforcement
|
||||
- **Documentation**: `/home/jramos/homelab/services/tinyauth/README.md`
|
||||
- **Status**: Operational
|
||||
|
||||
### AI Chatbot Gateway
|
||||
**VM 120** - openclaw (192.168.2.120)
|
||||
- **Purpose**: Multi-platform AI chatbot gateway
|
||||
- **Technology**: OpenClaw (Docker container)
|
||||
- **Ports**: 18789 (Gateway WS+UI), 18790 (Bridge), 1455 (OAuth)
|
||||
- **Domain**: openclaw.apophisnetworking.net
|
||||
- **LLM Providers**: Anthropic, OpenAI, Ollama
|
||||
- **Messaging**: Discord, Telegram, Slack, WhatsApp
|
||||
- **Security**: CVE-2026-25253 patched (v2026.2.1), cap_drop ALL, non-root, read-only FS
|
||||
- **Documentation**: `/home/jramos/homelab/services/openclaw/README.md`
|
||||
- **Status**: Operational - Container healthy
|
||||
|
||||
### Infrastructure Documentation
|
||||
**CT 103** - netbox
|
||||
- **Purpose**: Network documentation and IPAM
|
||||
@@ -187,9 +248,248 @@ Hybrid approach balancing performance and resource efficiency:
|
||||
|
||||
---
|
||||
|
||||
## Recent Infrastructure Changes (2025-12-07)
|
||||
## Recent Infrastructure Changes
|
||||
|
||||
### Additions
|
||||
### 2026-02-03: OpenClaw AI Chatbot Gateway Deployment (In Progress)
|
||||
|
||||
**Service**: VM 120 - OpenClaw multi-platform AI chatbot gateway
|
||||
|
||||
**Purpose**: Bridge messaging platforms (Discord, Telegram, Slack, WhatsApp) with LLM providers (Anthropic, OpenAI, Ollama) through a unified gateway.
|
||||
|
||||
**Specifications**:
|
||||
- **VM**: 120 (cloned from template 107, ubuntu-docker)
|
||||
- **IP**: 192.168.2.120
|
||||
- **Resources**: 4 vCPUs, 16GB RAM, 50GB disk on Vault (ZFS)
|
||||
- **Ports**: 18789 (Gateway WS+UI), 18790 (Bridge), 1455 (OAuth)
|
||||
- **Domain**: openclaw.apophisnetworking.net
|
||||
- **Image**: ghcr.io/openclaw/openclaw:2026.2.1
|
||||
|
||||
**Security Hardening**:
|
||||
- Version >= 2026.2.1 (patches CVE-2026-25253, CVSS 8.8 1-click RCE)
|
||||
- All ports bound to 127.0.0.1 (reverse proxy required)
|
||||
- Docker: cap_drop ALL, no-new-privileges, read-only filesystem, non-root user (1001:1001)
|
||||
- UFW: deny-all + whitelist 192.168.2.0/24 + 192.168.1.91 (desktop PC)
|
||||
- fail2ban on SSH (3 retries), unattended-upgrades
|
||||
- Prometheus node_exporter at port 9100
|
||||
|
||||
**Completed Steps**:
|
||||
- [x] Docker Compose configuration files created
|
||||
- [x] Security hardening overlay (docker-compose.override.yml)
|
||||
- [x] Environment variable template (.env.example)
|
||||
- [x] Prometheus scrape target added
|
||||
- [x] Documentation created (README, services/README, CLAUDE_STATUS, INDEX)
|
||||
- [x] VM 120 Creation & SSH Setup
|
||||
- [x] OS Hardening (UFW, user creation)
|
||||
|
||||
**Pending Steps**:
|
||||
- [ ] NPM reverse proxy configuration (manual - web UI)
|
||||
- [ ] Twingate resource creation (manual - admin console)
|
||||
- [ ] Prometheus config on VM 101 (manual - no SSH access)
|
||||
- [ ] Configure LLM provider API key in .env
|
||||
|
||||
**Status**: Container healthy - Manual network integration remaining
|
||||
|
||||
---
|
||||
|
||||
### 2025-12-20: Comprehensive Security Audit Completed
|
||||
|
||||
**Activity:** Complete infrastructure security assessment and remediation planning
|
||||
|
||||
**Audit Scope:**
|
||||
- All Docker Compose services (Portainer, NPM, Paperless-ngx, ByteStash, Speedtest Tracker, FileBrowser)
|
||||
- Proxmox VE infrastructure and API access
|
||||
- Network security and segmentation
|
||||
- Credential management and storage
|
||||
- SSL/TLS configuration
|
||||
- Container security and runtime configuration
|
||||
|
||||
**Findings Summary:**
|
||||
- **CRITICAL (6)**: Docker socket exposure, hardcoded credentials, database passwords in git
|
||||
- **HIGH (3)**: Missing SSL/TLS, weak passwords, containers running as root
|
||||
- **MEDIUM (2)**: SSL verification disabled, missing authentication
|
||||
- **LOW (20)**: Documentation gaps, monitoring improvements, backup encryption
|
||||
|
||||
**Deliverables:**
|
||||
1. **Security Policy** (`SECURITY.md`): 864 lines - Comprehensive security best practices
|
||||
2. **Audit Report** (`troubleshooting/SECURITY_AUDIT_2025-12-20.md`): 2,350 lines - Detailed findings and remediation plan
|
||||
3. **Security Checklist** (`templates/SECURITY_CHECKLIST.md`): 750 lines - Pre-deployment validation template
|
||||
4. **Validation Report** (`scripts/security/VALIDATION_REPORT.md`): 2,092 lines - Script safety assessment
|
||||
5. **Container Fixes** (`scripts/security/CONTAINER_NAME_FIXES.md`): 621 lines - Container name verification
|
||||
6. **Security Scripts** (8 total):
|
||||
- `verify-service-status.sh` - Service health checker
|
||||
- `backup-before-remediation.sh` - Comprehensive backup utility
|
||||
- `rotate-pve-credentials.sh` - Proxmox credential rotation
|
||||
- `rotate-paperless-password.sh` - Database password rotation
|
||||
- `rotate-bytestash-jwt.sh` - JWT secret rotation
|
||||
- `rotate-logward-credentials.sh` - Multi-service credential rotation
|
||||
- `docker-socket-proxy/docker-compose.yml` - Security proxy deployment
|
||||
- `portainer/docker-compose.socket-proxy.yml` - Portainer migration config
|
||||
|
||||
**Script Validation:**
|
||||
- **Ready for execution**: 5/8 scripts (verify-service-status.sh, rotate-pve-credentials.sh, rotate-bytestash-jwt.sh, backup-before-remediation.sh, docker-socket-proxy)
|
||||
- **Needs container name fixes**: 3/8 scripts (see CONTAINER_NAME_FIXES.md)
|
||||
|
||||
**4-Phase Remediation Roadmap:**
|
||||
- Phase 1 (Week 1): Immediate actions - Backups, secrets migration
|
||||
- Phase 2 (Weeks 2-3): Low-risk changes - Socket proxy, credential rotation
|
||||
- Phase 3 (Month 2): High-risk changes - Service migrations, SSL/TLS
|
||||
- Phase 4 (Quarter 1): Infrastructure - Network segmentation, scanning pipelines
|
||||
|
||||
**Estimated Timeline:**
|
||||
- Total downtime: 6-13 minutes (sequential script execution)
|
||||
- Full remediation: 8-16 weeks
|
||||
|
||||
**Risk Assessment:**
|
||||
- Current risk: HIGH - Multiple CRITICAL vulnerabilities active
|
||||
- Post-Phase 1 risk: MEDIUM - Credential exposure mitigated
|
||||
- Post-Phase 3 risk: LOW - All CRITICAL/HIGH findings remediated
|
||||
- Post-Phase 4 risk: VERY LOW - Defense-in-depth implemented
|
||||
|
||||
**Status:** Documentation complete, awaiting remediation execution approval
|
||||
|
||||
---
|
||||
|
||||
### 2025-12-18: TinyAuth SSO Deployment
|
||||
|
||||
**Service Deployed:** CT 115 - TinyAuth authentication layer
|
||||
|
||||
**Purpose:** Centralized SSO authentication for NetBox and future homelab services
|
||||
|
||||
**Specifications:**
|
||||
- **Container**: CT 115 (LXC with Docker)
|
||||
- **IP Address**: 192.168.2.10
|
||||
- **Domain**: tinyauth.apophisnetworking.net
|
||||
- **Port**: 8000 (external), 3000 (internal)
|
||||
- **Docker Image**: ghcr.io/steveiliop56/tinyauth:v4
|
||||
- **Resource Usage**: ~50-100 MB memory, <1% CPU
|
||||
|
||||
**Integration Architecture:**
|
||||
- Internet → Nginx Proxy Manager (CT 102) → TinyAuth (CT 115) → NetBox (CT 103)
|
||||
- NPM uses `auth_request` directive to validate credentials via TinyAuth
|
||||
- Bcrypt-hashed password storage for security
|
||||
- HTTPS enforcement via NPM SSL termination
|
||||
|
||||
**Issues Resolved During Deployment:**
|
||||
1. **500 Internal Server Error**: Fixed Nginx advanced config syntax
|
||||
2. **IP addresses not allowed**: Changed APP_URL from IP to domain
|
||||
3. **Port mapping**: Corrected Docker port mapping from 8000:8000 to 8000:3000
|
||||
4. **Invalid password**: Implemented bcrypt hash requirement for TinyAuth v4
|
||||
|
||||
**Integration Impact:**
|
||||
- NetBox now protected by centralized authentication
|
||||
- Foundation for extending SSO to other services (Grafana, Proxmox UI future candidates)
|
||||
- Authentication logs available for security auditing
|
||||
|
||||
**Documentation:** Complete guide at `/home/jramos/homelab/services/tinyauth/README.md`
|
||||
|
||||
**Status:** ✅ Operational - Successfully authenticating NetBox access
|
||||
|
||||
---
|
||||
|
||||
### 2025-12-11: Loki-Stack Monitoring Fully Operational
|
||||
|
||||
**Issue Resolved:** Centralized logging pipeline now receiving syslog from UniFi router
|
||||
|
||||
**Root Cause:** rsyslog filter in `/etc/rsyslog.d/unifi-router.conf` was configured for wrong source IP (192.168.1.1 instead of 192.168.2.1)
|
||||
|
||||
**Fix Applied:** Updated rsyslog filter to match VLAN 2 gateway IP (192.168.2.1)
|
||||
|
||||
**Status:** ✅ Complete - Logs flowing UniFi → rsyslog → Promtail → Loki → Grafana
|
||||
|
||||
**Services Affected:**
|
||||
- VM 101 (monitoring-docker): rsyslog configuration updated
|
||||
- Loki-stack: All components operational
|
||||
- Grafana: Dashboards receiving real-time syslog data
|
||||
|
||||
**Technical Details:** See `troubleshooting/loki-stack-bugfix.md` for complete 5-phase troubleshooting history
|
||||
|
||||
---
|
||||
|
||||
### 2025-12-11: Infrastructure Expansion & System Updates
|
||||
|
||||
#### Proxmox VE Platform Upgrade
|
||||
- **Upgraded**: Proxmox VE 8.3.3 → 8.4.0
|
||||
- **Kernel**: 6.8.12-8-pve
|
||||
- **pve-manager**: 8.4.14
|
||||
- **Impact**: Enhanced performance, security updates, bug fixes
|
||||
- **Status**: ✅ Complete - All VMs and containers operating normally
|
||||
|
||||
#### New VM 114: Home Assistant OS Deployment
|
||||
- **Service**: haos (Home Assistant Operating System)
|
||||
- **Purpose**: Smart home automation and integration platform
|
||||
- **Specifications**:
|
||||
- Memory: 4 GB (87% utilized)
|
||||
- CPU: 2 vCPUs
|
||||
- Boot Disk: 50 GB
|
||||
- Status: Running (~3 days uptime)
|
||||
- **Rationale**: Centralized home automation hub for IoT device management
|
||||
- **Integration**: Will integrate with monitoring stack for infrastructure metrics
|
||||
|
||||
#### CT 103: NetBox IPAM Activated
|
||||
- **Service**: netbox (Network Documentation & IPAM)
|
||||
- **Status Change**: Stopped → Running
|
||||
- **Uptime**: ~3.1 days
|
||||
- **Resource Usage**: 1.28 GB / 2 GB memory (64%)
|
||||
- **Purpose**: Active network documentation and IP address management
|
||||
- **Rationale**: Required for ongoing infrastructure expansion planning
|
||||
|
||||
#### Storage Utilization Trends
|
||||
- **PBS-Backups**: 27.43% → 28.27% (+0.84%) - Normal backup retention growth
|
||||
- **Vault (ZFS)**: 10.88% → 12.13% (+1.25%) - Data accumulation monitored
|
||||
- **local**: 15.13% → 19.11% (+3.98%) - New VM deployment and system updates
|
||||
- **iso-share**: 1.4% → 1.45% (+0.05%) - Minimal change
|
||||
- **local-lvm**: 0.0% → 0.01% (+0.01%) - Thin provisioned storage baseline
|
||||
|
||||
---
|
||||
|
||||
### 2025-12-25: RAG Vector Search - Phase 3 Complete
|
||||
|
||||
**Activity:** Implemented and debugged production-ready vector search system for AI-powered documentation retrieval
|
||||
|
||||
**Deliverables:**
|
||||
1. **Production Module** (`n8n/vector_search.py`): Complete API for semantic search
|
||||
- `search_similar_documents()` - Query with natural language
|
||||
- `insert_document()` - Add documents with embeddings
|
||||
- `get_stats()` - Database statistics
|
||||
- `delete_by_repo()` - Bulk cleanup
|
||||
- CLI interface for testing and manual operations
|
||||
|
||||
2. **Documentation Suite:**
|
||||
- `SESSION_HANDOFF_PHASE4_READY.md` (17KB) - Comprehensive learning guide for next session
|
||||
- `PHASE3_COMPLETE.md` (12KB) - Complete debugging summary and deployment guide
|
||||
- `VECTOR_SEARCH_DEBUG.md` (4.7KB) - Technical root cause analysis
|
||||
- `VECTOR_SEARCH_COMPARISON.md` (2.5KB) - Before/after code comparison
|
||||
|
||||
3. **Diagnostic Scripts** (8 total):
|
||||
- Embedding storage repair, parameter binding tests, SQL validation
|
||||
- All scripts validated and preserved for reference
|
||||
|
||||
**Technical Achievement:**
|
||||
- PostgreSQL 16.11 + pgvector 0.8.1 fully operational on CT 113
|
||||
- Vector similarity search returning accurate scores (0.5765 for related concepts)
|
||||
- Resolved 2 critical bugs:
|
||||
1. psycopg2 parameter handling for pgvector types (must cast in SQL, not Python)
|
||||
2. ORDER BY with vector operations (subquery pattern required)
|
||||
|
||||
**Validation Results:**
|
||||
- Query: "How do I create snapshots of virtual machines?"
|
||||
- Result: 0.5765 similarity to backup documentation
|
||||
- Interpretation: Correctly identifies semantic relationship between "snapshots" and "backups"
|
||||
|
||||
**Infrastructure:**
|
||||
- Database: n8n_db on CT 113
|
||||
- Table: rag_embeddings (id, source_repo, file_path, chunk_text, embedding vector(768), metadata jsonb)
|
||||
- Embedding API: Ollama at 192.168.1.81:11434 (nomic-embed-text, 768 dimensions)
|
||||
- Storage overhead: ~3KB per vector, ~5KB per document total
|
||||
|
||||
**Status:** ✅ Phase 3 Complete | Phase 4 Ready to Start
|
||||
**Next Steps:** Build n8n ingestion workflow to load homelab documentation from Gitea
|
||||
|
||||
---
|
||||
|
||||
### 2025-12-07: Infrastructure Documentation & Monitoring Stack
|
||||
|
||||
#### Additions
|
||||
1. **VM 101 (monitoring-docker)**: New dedicated monitoring infrastructure
|
||||
- Grafana for visualization
|
||||
- Prometheus for metrics collection
|
||||
@@ -201,8 +501,9 @@ Hybrid approach balancing performance and resource efficiency:
|
||||
- Secure remote access without VPN
|
||||
|
||||
3. **CT 113 (n8n)**: Workflow automation platform
|
||||
- PostgreSQL 15+ backend
|
||||
- IP: 192.168.2.107
|
||||
- PostgreSQL 16.11 backend (upgraded from 15+)
|
||||
- pgvector 0.8.1 extension for vector search
|
||||
- IP: 192.168.2.113
|
||||
- Resolved database locale issues
|
||||
|
||||
### Modifications
|
||||
@@ -227,36 +528,334 @@ Hybrid approach balancing performance and resource efficiency:
|
||||
|
||||
```
|
||||
homelab/
|
||||
| ||||