feat(openclaw): deploy OpenClaw AI chatbot gateway on VM 120

- Add Docker Compose configs with security hardening (cap_drop ALL, non-root, read-only FS)
- Add Prometheus node_exporter scrape target for 192.168.2.120:9100
- Update services/README.md, INDEX.md, and CLAUDE_STATUS.md with VM 120
- Image pinned to v2026.2.1 (patches CVE-2026-25253)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-03 18:07:09 -07:00
parent e481c95da4
commit e08951de21
9 changed files with 1031 additions and 20 deletions

View File

@@ -1,24 +1,48 @@
# Homelab Infrastructure Status
**Last Updated**: 2025-12-18 17:00:00
**Last Updated**: 2026-02-03
**Export Reference**: disaster-recovery/homelab-export-20251211-144345
**Current Session:** OpenClaw Deployment - VM 120
## Quick Resume (Current Session Context)
**Where We Are:** OpenClaw deployed and healthy on VM 120. Container running with full security hardening. Backups configured. Manual steps remain for NPM proxy host, Twingate resource, and Prometheus config on VM 101.
**Completed:**
- [x] Config files created (`services/openclaw/`)
- [x] VM 120 created and hardened (UFW, fail2ban, node-exporter, openclaw user)
- [x] OpenClaw container deployed and healthy (v2026.2.1)
- [x] Security verified (cap_drop ALL, non-root, read-only FS, no docker.sock)
- [x] Prometheus scrape target added to repo copy
- [x] PBS backup job created (daily 02:00, snapshot, zstd)
- [x] Application backup script + weekly cron configured
- [x] Documentation updated (README, services/README, CLAUDE_STATUS, INDEX)
- [x] node_exporter installed and serving metrics on 192.168.2.120:9100
**Manual Steps Remaining:**
- [ ] NPM: Create proxy host for openclaw.apophisnetworking.net -> 192.168.2.120:18789 (WebSocket support, SSL, TinyAuth)
- [ ] Twingate: Add resource for 192.168.2.120 ports 18789/18790/1455
- [ ] VM 101: Deploy updated prometheus.yml via Proxmox web console (SSH not configured)
- [ ] Configure at least one LLM provider API key in /opt/openclaw/.env
---
## Current Infrastructure Snapshot
### Proxmox Environment
- **Node**: serviceslab
- **Version**: Proxmox VE 8.4.0
- **Management IP**: 192.168.2.200
- **Management IP**: 192.168.2.100
- **Architecture**: Single-node cluster
- **Total Resources**: 9 VMs, 2 Templates, 5 LXC Containers
- **Total Resources**: 10 VMs, 2 Templates, 5 LXC Containers
---
## Virtual Machines (QEMU/KVM) - 9 VMs
## Virtual Machines (QEMU/KVM) - 10 VMs
| VM ID | Name | IP Address | Status | Purpose |
|-------|------|------------|--------|---------|
| 100 | docker-hub | 192.168.2.XXX | Running | Container registry/Docker hub mirror |
| 100 | docker-hub | 192.168.2.102 | Running | Container registry/Docker hub mirror |
| 101 | monitoring-docker | 192.168.2.114 | Running | Monitoring stack (Grafana/Prometheus/PVE Exporter) |
| 105 | dev | - | Stopped | General-purpose development workstation |
| 106 | Ansible-Control | 192.168.2.XXX | Running | IaC orchestration, configuration management |
@@ -27,8 +51,10 @@
| 110 | web-server-02 | 192.168.2.XXX | Running | Load-balanced pair with web-server-01 |
| 111 | db-server-01 | 192.168.2.XXX | Running | Backend database server |
| 114 | haos | 192.168.2.XXX | Running | Home Assistant OS - smart home automation platform |
| 120 | openclaw | 192.168.2.120 | Running | OpenClaw AI chatbot gateway |
**Recent Changes**:
- Added VM 120 (openclaw) for multi-platform AI chatbot gateway (2026-02-03)
- Added VM 101 (monitoring-docker) for dedicated monitoring infrastructure
- Removed VM 101 (gitlab) - service decommissioned
@@ -52,7 +78,7 @@
| 102 | nginx | 192.168.2.101 | Running | Reverse proxy/load balancer & NPM |
| 103 | netbox | 192.168.2.XXX | Running | Network documentation/IPAM |
| 112 | twingate-connector | 192.168.2.XXX | Running | Zero-trust network access connector |
| 113 | n8n | 192.168.2.107 | Running | Workflow automation platform |
| 113 | n8n | 192.168.2.113 | Running | Workflow automation platform |
| 115 | tinyauth | 192.168.2.10 | Running | SSO authentication layer for NetBox |
**Recent Changes**:
@@ -99,7 +125,7 @@
- **Integration**: Connects homelab to Twingate network
### Automation & Integration
**CT 113** - n8n (192.168.2.107)
**CT 113** - n8n (192.168.2.113)
- **Purpose**: Workflow automation platform
- **Technology**: n8n.io
- **Database**: PostgreSQL 15+
@@ -118,6 +144,18 @@
- **Documentation**: `/home/jramos/homelab/services/tinyauth/README.md`
- **Status**: Operational
### AI Chatbot Gateway
**VM 120** - openclaw (192.168.2.120)
- **Purpose**: Multi-platform AI chatbot gateway
- **Technology**: OpenClaw (Docker container)
- **Ports**: 18789 (Gateway WS+UI), 18790 (Bridge), 1455 (OAuth)
- **Domain**: openclaw.apophisnetworking.net
- **LLM Providers**: Anthropic, OpenAI, Ollama
- **Messaging**: Discord, Telegram, Slack, WhatsApp
- **Security**: CVE-2026-25253 patched (v2026.2.1), cap_drop ALL, non-root, read-only FS
- **Documentation**: `/home/jramos/homelab/services/openclaw/README.md`
- **Status**: Operational - Container healthy
### Infrastructure Documentation
**CT 103** - netbox
- **Purpose**: Network documentation and IPAM
@@ -212,6 +250,47 @@ Hybrid approach balancing performance and resource efficiency:
## Recent Infrastructure Changes
### 2026-02-03: OpenClaw AI Chatbot Gateway Deployment (In Progress)
**Service**: VM 120 - OpenClaw multi-platform AI chatbot gateway
**Purpose**: Bridge messaging platforms (Discord, Telegram, Slack, WhatsApp) with LLM providers (Anthropic, OpenAI, Ollama) through a unified gateway.
**Specifications**:
- **VM**: 120 (cloned from template 107, ubuntu-docker)
- **IP**: 192.168.2.120
- **Resources**: 4 vCPUs, 16GB RAM, 50GB disk on Vault (ZFS)
- **Ports**: 18789 (Gateway WS+UI), 18790 (Bridge), 1455 (OAuth)
- **Domain**: openclaw.apophisnetworking.net
- **Image**: ghcr.io/openclaw/openclaw:2026.2.1
**Security Hardening**:
- Version >= 2026.2.1 (patches CVE-2026-25253, CVSS 8.8 1-click RCE)
- All ports bound to 127.0.0.1 (reverse proxy required)
- Docker: cap_drop ALL, no-new-privileges, read-only filesystem, non-root user (1001:1001)
- UFW: deny-all + whitelist 192.168.2.0/24 + 192.168.1.91 (desktop PC)
- fail2ban on SSH (3 retries), unattended-upgrades
- Prometheus node_exporter at port 9100
**Completed Steps**:
- [x] Docker Compose configuration files created
- [x] Security hardening overlay (docker-compose.override.yml)
- [x] Environment variable template (.env.example)
- [x] Prometheus scrape target added
- [x] Documentation created (README, services/README, CLAUDE_STATUS, INDEX)
- [x] VM 120 Creation & SSH Setup
- [x] OS Hardening (UFW, user creation)
**Pending Steps**:
- [ ] NPM reverse proxy configuration (manual - web UI)
- [ ] Twingate resource creation (manual - admin console)
- [ ] Prometheus config on VM 101 (manual - no SSH access)
- [ ] Configure LLM provider API key in .env
**Status**: Container healthy - Manual network integration remaining
---
### 2025-12-20: Comprehensive Security Audit Completed
**Activity:** Complete infrastructure security assessment and remediation planning
@@ -363,6 +442,51 @@ Hybrid approach balancing performance and resource efficiency:
---
### 2025-12-25: RAG Vector Search - Phase 3 Complete
**Activity:** Implemented and debugged production-ready vector search system for AI-powered documentation retrieval
**Deliverables:**
1. **Production Module** (`n8n/vector_search.py`): Complete API for semantic search
- `search_similar_documents()` - Query with natural language
- `insert_document()` - Add documents with embeddings
- `get_stats()` - Database statistics
- `delete_by_repo()` - Bulk cleanup
- CLI interface for testing and manual operations
2. **Documentation Suite:**
- `SESSION_HANDOFF_PHASE4_READY.md` (17KB) - Comprehensive learning guide for next session
- `PHASE3_COMPLETE.md` (12KB) - Complete debugging summary and deployment guide
- `VECTOR_SEARCH_DEBUG.md` (4.7KB) - Technical root cause analysis
- `VECTOR_SEARCH_COMPARISON.md` (2.5KB) - Before/after code comparison
3. **Diagnostic Scripts** (8 total):
- Embedding storage repair, parameter binding tests, SQL validation
- All scripts validated and preserved for reference
**Technical Achievement:**
- PostgreSQL 16.11 + pgvector 0.8.1 fully operational on CT 113
- Vector similarity search returning accurate scores (0.5765 for related concepts)
- Resolved 2 critical bugs:
1. psycopg2 parameter handling for pgvector types (must cast in SQL, not Python)
2. ORDER BY with vector operations (subquery pattern required)
**Validation Results:**
- Query: "How do I create snapshots of virtual machines?"
- Result: 0.5765 similarity to backup documentation
- Interpretation: Correctly identifies semantic relationship between "snapshots" and "backups"
**Infrastructure:**
- Database: n8n_db on CT 113
- Table: rag_embeddings (id, source_repo, file_path, chunk_text, embedding vector(768), metadata jsonb)
- Embedding API: Ollama at 192.168.1.81:11434 (nomic-embed-text, 768 dimensions)
- Storage overhead: ~3KB per vector, ~5KB per document total
**Status:** ✅ Phase 3 Complete | Phase 4 Ready to Start
**Next Steps:** Build n8n ingestion workflow to load homelab documentation from Gitea
---
### 2025-12-07: Infrastructure Documentation & Monitoring Stack
#### Additions
@@ -377,8 +501,9 @@ Hybrid approach balancing performance and resource efficiency:
- Secure remote access without VPN
3. **CT 113 (n8n)**: Workflow automation platform
- PostgreSQL 15+ backend
- IP: 192.168.2.107
- PostgreSQL 16.11 backend (upgraded from 15+)
- pgvector 0.8.1 extension for vector search
- IP: 192.168.2.113
- Resolved database locale issues
### Modifications
@@ -403,7 +528,19 @@ Hybrid approach balancing performance and resource efficiency:
```
homelab/
monitoring/ # NEW: Monitoring stack configurations
n8n/ # RAG Vector Search Implementation (NEW)
vector_search.py # Production module for vector operations
SESSION_HANDOFF_PHASE4_READY.md # Learning guide for next session
PHASE3_COMPLETE.md # Phase 3 debugging and achievements summary
fix_embedding_storage.py # Diagnostic script (embedding repair)
test_direct_sql.py # Diagnostic script (query testing)
test_vector_search_working.py # Validated working implementation
test_parameter_binding.py # Diagnostic script (psycopg2 debugging)
test_pgvector_direct.sql # Raw SQL tests for pgvector
VECTOR_SEARCH_DEBUG.md # Technical debugging documentation
VECTOR_SEARCH_COMPARISON.md # Before/after code comparison
README_VECTOR_SEARCH.md # Comprehensive setup guide
monitoring/ # Monitoring stack configurations
README.md # Comprehensive monitoring documentation
grafana/
docker-compose.yml
@@ -417,6 +554,8 @@ homelab/
services/ # Docker Compose service configurations
n8n/ # n8n workflow automation
netbox/ # Network documentation & IPAM
openclaw/ # OpenClaw AI chatbot gateway (VM 120)
tinyauth/ # SSO authentication layer
README.md # Services overview (updated)
disaster-recovery/
homelab-export-20251207-120040/ # Latest infrastructure export
@@ -424,7 +563,16 @@ homelab/
crawlers-exporters/ # Infrastructure collection scripts
fixers/ # Problem-solving scripts
qol/ # Quality of life improvements
security/ # Security audit and remediation scripts (NEW)
verify-service-status.sh
backup-before-remediation.sh
rotate-*.sh # Credential rotation scripts
QUICK_REFERENCE.md # Security operations guide
troubleshooting/
SECURITY_AUDIT_2025-12-20.md # Comprehensive security assessment
loki-stack-bugfix.md # Loki logging troubleshooting
CLAUDE.md # AI assistant guidance (updated)
SECURITY.md # Security policy and best practices (NEW)
INDEX.md # Navigation index (updated)
README.md # Repository overview (updated)
CLAUDE_STATUS.md # This file - current infrastructure status
@@ -454,7 +602,116 @@ homelab/
---
## Current Initiative: Security Audit Remediation - Q4 2025
## Current Initiative: n8n RAG Workflow for Homelab Documentation - Q4 2025
### Goal
Build an interactive n8n workflow that implements Retrieval-Augmented Generation (RAG) to query homelab documentation stored in Gitea using local AI (Ollama). This is a learning-focused project to understand RAG architecture, embeddings, vector storage, and LLM integration.
### Phase
Phase 3 Complete - Vector Storage Operational | Moving to Phase 4 - n8n Workflow Development
### Infrastructure Components
- **AI Backend**: Ollama running on Windows 11 PC (192.168.1.81)
- Hardware: AMD 7900 GRE GPU, i7-12700KF, 32GB RAM @ 4000MHz, 2TB NVMe
- Installation: Native Windows application (not Docker)
- Open-WebUI: Running in Docker Desktop on same machine (port 3000)
- **Orchestrator**: n8n workflow automation (CT 113, 192.168.2.113)
- **Data Source**: Gitea repositories (192.168.2.102:3060)
- Repositories: homelab, truenas
- **Vector Storage**: PostgreSQL 16.11 + pgvector 0.8.1 (operational on CT 113)
### Progress Checklist
**Phase 1: Network & Connectivity Setup**
- [x] Verify Gitea API accessibility (working: http://192.168.2.102:3060/api/v1)
- [x] Verify n8n instance running (CT 113, 192.168.2.113)
- [x] Configure Ollama network binding (set OLLAMA_HOST=0.0.0.0 via environment variables)
- [x] Verify Ollama API accessible from homelab (curl http://192.168.1.81:11434/api/tags)
- [x] Identify available Ollama models (LLMs: deepseek-r1:8.2B, gpt-oss:20.9B, llama3.2:3.2B, phi3:3.8B)
- [x] Pull embedding model (nomic-embed-text - 768 dimensions, 274MB)
**Phase 2: Understanding Embeddings (Learning Phase)**
- [x] Pull sample document from Gitea API
- [x] Send text to Ollama for embedding generation
- [x] Examine vector output (768-dimensional vectors for each text)
- [x] Understand semantic similarity concept (cosine similarity demo: 0.5764 for related topics)
**Phase 3: Vector Storage Implementation** ✅ COMPLETE
- [x] Evaluate PostgreSQL + pgvector (uses existing n8n database)
- [x] Evaluate Qdrant (lightweight Docker deployment)
- [x] Choose storage backend based on learning goals (PostgreSQL + pgvector selected)
- [x] Install pgvector extension on CT 113 (PostgreSQL 16.11, pgvector 0.8.1)
- [x] Create rag_embeddings table with vector(768) column
- [x] Debug and fix vector insertion (corrected string→vector conversion)
- [x] Debug and fix ORDER BY issue (subquery approach working)
- [x] Verify cosine similarity search (working: 0.5765 similarity for related concepts)
- [x] Create production-ready vector_search.py module with insert/search/stats functions
**Phase 4: Build Ingestion Workflow (n8n)** - READY TO START
- [ ] Deploy vector_search.py production module to CT 113
- [ ] Test manual document insertion via CLI
- [ ] Implement text chunking strategy (500 char chunks, 100 char overlap)
- [ ] Create minimal n8n workflow: Manual Trigger → Gitea API → Chunk → Ollama → PostgreSQL
- [ ] Test workflow with single README.md file from homelab repo
- [ ] Scale to process all .md files in homelab repository
- [ ] Add error handling and deduplication logic
- [ ] Schedule automated daily ingestion runs
**Phase 5: Build Query Workflow (n8n)** - NOT STARTED
- [ ] Create workflow: Webhook → User question
- [ ] Generate embedding for user query
- [ ] Implement vector similarity search (threshold >0.5)
- [ ] Retrieve top 3-5 relevant chunks
- [ ] Construct prompt with retrieved context
- [ ] Call Ollama LLM for answer generation (llama3.2 or deepseek-r1)
- [ ] Return formatted response with source references
- [ ] Add webhook endpoint for external integrations
### Context
**RAG Architecture Overview:**
1. **Ingestion Pipeline**: Gitea API → Text Chunking → Ollama Embeddings → Vector Database
2. **Query Pipeline**: User Question → Embedding → Vector Search → Context Retrieval → LLM Generation → Answer
**Phase 3 Achievements (2025-12-25):**
- ✅ PostgreSQL + pgvector fully operational on CT 113
- ✅ Vector search working with 0.5765 similarity for related concepts
- ✅ Production-ready Python module (`vector_search.py`) with insert/search/stats functions
- ✅ Debugged and resolved 2 critical issues:
1. Embedding storage: Fixed psycopg2 parameter handling (must cast to `::vector(768)` in SQL, not Python)
2. ORDER BY bug: Subquery approach works, CTE approach fails (use `ORDER BY similarity DESC` instead of vector operation)
**Key Learnings:**
- ✅ Embeddings convert text to 768-dimensional vectors representing semantic meaning
- ✅ Vector databases enable semantic search (meaning-based, not keyword-based)
- ✅ pgvector cosine distance operator (`<=>`) measures similarity: 0=identical, 2=opposite
- ✅ Similarity scores: >0.7=highly relevant, 0.5-0.7=related, 0.3-0.5=somewhat related, <0.3=unrelated
- ✅ psycopg2 doesn't natively support pgvector - must format vectors as strings and cast in SQL
- ✅ Reusing vector parameters in ORDER BY causes silent failures - use subqueries instead
**Technical Stack Validated:**
- Ollama API (192.168.1.81:11434) ✅ Accessible across subnets
- nomic-embed-text model ✅ 768 dimensions, fast generation
- PostgreSQL 16.11 + pgvector 0.8.1 ✅ Operators working correctly
- Python psycopg2 ✅ With workarounds for vector handling
**Success Metrics - Phase 3:**
- ✅ Successfully query "how to backup VM" and retrieve relevant homelab documentation (0.5765 similarity)
- ✅ Understand each component of the vector storage pipeline
- ✅ Create reusable Python module for n8n integration
**Next Steps - Phase 4:**
- Deploy vector_search.py to CT 113 and test CLI interface
- Create text chunking function (500 char chunks, 100 char overlap)
- Build minimal n8n workflow: Manual Trigger → Gitea API → Chunk → Ollama → PostgreSQL
- Scale to process all .md files in homelab repository
- Add error handling and deduplication logic
**Session Handoff Document:** `/home/jramos/homelab/n8n/SESSION_HANDOFF_PHASE4_READY.md`
**Learning Resources:** Step-by-step lessons with examples, mental models, troubleshooting guide
---
## Previous Initiative: Security Audit Remediation - Q4 2025
### Goal
Remediate 31 security findings identified in comprehensive security audit (2025-12-20), addressing critical vulnerabilities in Docker socket exposure, credential management, and SSL/TLS configuration.
@@ -632,16 +889,18 @@ Documentation & Maintenance
- **Grafana**: http://192.168.2.114:3000
- **Prometheus**: http://192.168.2.114:9090
- **Nginx Proxy Manager**: http://192.168.2.101:81
- **n8n**: http://192.168.2.107:5678
- **n8n**: http://192.168.2.113:5678
- **TinyAuth**: https://tinyauth.apophisnetworking.net (internal: http://192.168.2.10:8000)
- **OpenClaw**: https://openclaw.apophisnetworking.net (internal: http://192.168.2.120:18789)
### Key Network Segments
- **Management Network**: 192.168.2.0/24
- **Proxmox Host**: 192.168.2.200
- **Reverse Proxy**: 192.168.2.101 (CT 102)
- **TinyAuth**: 192.168.2.10 (CT 115)
- **n8n**: 192.168.2.107 (CT 113)
- **n8n**: 192.168.2.113 (CT 113)
- **Monitoring**: 192.168.2.114 (VM 101)
- **OpenClaw**: 192.168.2.120 (VM 120)
---
@@ -726,5 +985,5 @@ Documentation & Maintenance
**Maintained by**: jramos
**Repository**: Homelab Infrastructure Configuration
**Platform**: Proxmox VE 8.4.0
**Infrastructure Scale**: 9 VMs, 2 Templates, 4 Containers
**Current Status**: Operational - Home Automation Integration Deployed
**Infrastructure Scale**: 10 VMs, 2 Templates, 5 Containers
**Current Status**: Operational - OpenClaw Deployment In Progress