feat(openclaw): deploy OpenClaw AI chatbot gateway on VM 120

- Add Docker Compose configs with security hardening (cap_drop ALL, non-root, read-only FS) - Add Prometheus node_exporter scrape target for 192.168.2.120:9100 - Update services/README.md, INDEX.md, and CLAUDE_STATUS.md with VM 120 - Image pinned to v2026.2.1 (patches CVE-2026-25253) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:07:09 -07:00
parent e481c95da4
commit e08951de21
9 changed files with 1031 additions and 20 deletions
--- a/CLAUDE_STATUS.md
+++ b/CLAUDE_STATUS.md
@@ -1,24 +1,48 @@
 # Homelab Infrastructure Status

-**Last Updated**: 2025-12-18 17:00:00
+**Last Updated**: 2026-02-03
 **Export Reference**: disaster-recovery/homelab-export-20251211-144345
+**Current Session:** OpenClaw Deployment - VM 120
+
+## Quick Resume (Current Session Context)
+
+**Where We Are:** OpenClaw deployed and healthy on VM 120. Container running with full security hardening. Backups configured. Manual steps remain for NPM proxy host, Twingate resource, and Prometheus config on VM 101.
+
+**Completed:**
+- [x] Config files created (`services/openclaw/`)
+- [x] VM 120 created and hardened (UFW, fail2ban, node-exporter, openclaw user)
+- [x] OpenClaw container deployed and healthy (v2026.2.1)
+- [x] Security verified (cap_drop ALL, non-root, read-only FS, no docker.sock)
+- [x] Prometheus scrape target added to repo copy
+- [x] PBS backup job created (daily 02:00, snapshot, zstd)
+- [x] Application backup script + weekly cron configured
+- [x] Documentation updated (README, services/README, CLAUDE_STATUS, INDEX)
+- [x] node_exporter installed and serving metrics on 192.168.2.120:9100
+
+**Manual Steps Remaining:**
+- [ ] NPM: Create proxy host for openclaw.apophisnetworking.net -> 192.168.2.120:18789 (WebSocket support, SSL, TinyAuth)
+- [ ] Twingate: Add resource for 192.168.2.120 ports 18789/18790/1455
+- [ ] VM 101: Deploy updated prometheus.yml via Proxmox web console (SSH not configured)
+- [ ] Configure at least one LLM provider API key in /opt/openclaw/.env
+
+---

 ## Current Infrastructure Snapshot

 ### Proxmox Environment
 - **Node**: serviceslab
 - **Version**: Proxmox VE 8.4.0
- **Management IP**: 192.168.2.200
+- **Management IP**: 192.168.2.100
 - **Architecture**: Single-node cluster
- **Total Resources**: 9 VMs, 2 Templates, 5 LXC Containers
+- **Total Resources**: 10 VMs, 2 Templates, 5 LXC Containers

 ---

-## Virtual Machines (QEMU/KVM) - 9 VMs
+## Virtual Machines (QEMU/KVM) - 10 VMs

 | VM ID | Name | IP Address | Status | Purpose |
 |-------|------|------------|--------|---------|
-| 100 | docker-hub | 192.168.2.XXX | Running | Container registry/Docker hub mirror |
+| 100 | docker-hub | 192.168.2.102 | Running | Container registry/Docker hub mirror |
 | 101 | monitoring-docker | 192.168.2.114 | Running | Monitoring stack (Grafana/Prometheus/PVE Exporter) |
 | 105 | dev | - | Stopped | General-purpose development workstation |
 | 106 | Ansible-Control | 192.168.2.XXX | Running | IaC orchestration, configuration management |
@@ -27,8 +51,10 @@
 | 110 | web-server-02 | 192.168.2.XXX | Running | Load-balanced pair with web-server-01 |
 | 111 | db-server-01 | 192.168.2.XXX | Running | Backend database server |
 | 114 | haos | 192.168.2.XXX | Running | Home Assistant OS - smart home automation platform |
+| 120 | openclaw | 192.168.2.120 | Running | OpenClaw AI chatbot gateway |

 **Recent Changes**:
+- Added VM 120 (openclaw) for multi-platform AI chatbot gateway (2026-02-03)
 - Added VM 101 (monitoring-docker) for dedicated monitoring infrastructure
 - Removed VM 101 (gitlab) - service decommissioned

@@ -52,7 +78,7 @@
 | 102 | nginx | 192.168.2.101 | Running | Reverse proxy/load balancer & NPM |
 | 103 | netbox | 192.168.2.XXX | Running | Network documentation/IPAM |
 | 112 | twingate-connector | 192.168.2.XXX | Running | Zero-trust network access connector |
-| 113 | n8n | 192.168.2.107 | Running | Workflow automation platform |
+| 113 | n8n | 192.168.2.113 | Running | Workflow automation platform |
 | 115 | tinyauth | 192.168.2.10 | Running | SSO authentication layer for NetBox |

 **Recent Changes**:
@@ -99,7 +125,7 @@
 - **Integration**: Connects homelab to Twingate network

 ### Automation & Integration
-**CT 113** - n8n (192.168.2.107)
+**CT 113** - n8n (192.168.2.113)
 - **Purpose**: Workflow automation platform
 - **Technology**: n8n.io
 - **Database**: PostgreSQL 15+
@@ -118,6 +144,18 @@
 - **Documentation**: `/home/jramos/homelab/services/tinyauth/README.md`
 - **Status**: Operational

+### AI Chatbot Gateway
+**VM 120** - openclaw (192.168.2.120)
+- **Purpose**: Multi-platform AI chatbot gateway
+- **Technology**: OpenClaw (Docker container)
+- **Ports**: 18789 (Gateway WS+UI), 18790 (Bridge), 1455 (OAuth)
+- **Domain**: openclaw.apophisnetworking.net
+- **LLM Providers**: Anthropic, OpenAI, Ollama
+- **Messaging**: Discord, Telegram, Slack, WhatsApp
+- **Security**: CVE-2026-25253 patched (v2026.2.1), cap_drop ALL, non-root, read-only FS
+- **Documentation**: `/home/jramos/homelab/services/openclaw/README.md`
+- **Status**: Operational - Container healthy
+
 ### Infrastructure Documentation
 **CT 103** - netbox
 - **Purpose**: Network documentation and IPAM
@@ -212,6 +250,47 @@ Hybrid approach balancing performance and resource efficiency:

 ## Recent Infrastructure Changes

+### 2026-02-03: OpenClaw AI Chatbot Gateway Deployment (In Progress)
+
+**Service**: VM 120 - OpenClaw multi-platform AI chatbot gateway
+
+**Purpose**: Bridge messaging platforms (Discord, Telegram, Slack, WhatsApp) with LLM providers (Anthropic, OpenAI, Ollama) through a unified gateway.
+
+**Specifications**:
+- **VM**: 120 (cloned from template 107, ubuntu-docker)
+- **IP**: 192.168.2.120
+- **Resources**: 4 vCPUs, 16GB RAM, 50GB disk on Vault (ZFS)
+- **Ports**: 18789 (Gateway WS+UI), 18790 (Bridge), 1455 (OAuth)
+- **Domain**: openclaw.apophisnetworking.net
+- **Image**: ghcr.io/openclaw/openclaw:2026.2.1
+
+**Security Hardening**:
+- Version >= 2026.2.1 (patches CVE-2026-25253, CVSS 8.8 1-click RCE)
+- All ports bound to 127.0.0.1 (reverse proxy required)
+- Docker: cap_drop ALL, no-new-privileges, read-only filesystem, non-root user (1001:1001)
+- UFW: deny-all + whitelist 192.168.2.0/24 + 192.168.1.91 (desktop PC)
+- fail2ban on SSH (3 retries), unattended-upgrades
+- Prometheus node_exporter at port 9100
+
+**Completed Steps**:
+- [x] Docker Compose configuration files created
+- [x] Security hardening overlay (docker-compose.override.yml)
+- [x] Environment variable template (.env.example)
+- [x] Prometheus scrape target added
+- [x] Documentation created (README, services/README, CLAUDE_STATUS, INDEX)
+- [x] VM 120 Creation & SSH Setup
+- [x] OS Hardening (UFW, user creation)
+
+**Pending Steps**:
+- [ ] NPM reverse proxy configuration (manual - web UI)
+- [ ] Twingate resource creation (manual - admin console)
+- [ ] Prometheus config on VM 101 (manual - no SSH access)
+- [ ] Configure LLM provider API key in .env
+
+**Status**: Container healthy - Manual network integration remaining
+
+---
+
 ### 2025-12-20: Comprehensive Security Audit Completed

 **Activity:** Complete infrastructure security assessment and remediation planning
@@ -363,6 +442,51 @@ Hybrid approach balancing performance and resource efficiency:

 ---

+### 2025-12-25: RAG Vector Search - Phase 3 Complete
+
+**Activity:** Implemented and debugged production-ready vector search system for AI-powered documentation retrieval
+
+**Deliverables:**
+1. **Production Module** (`n8n/vector_search.py`): Complete API for semantic search
+   - `search_similar_documents()` - Query with natural language
+   - `insert_document()` - Add documents with embeddings
+   - `get_stats()` - Database statistics
+   - `delete_by_repo()` - Bulk cleanup
+   - CLI interface for testing and manual operations
+
+2. **Documentation Suite:**
+   - `SESSION_HANDOFF_PHASE4_READY.md` (17KB) - Comprehensive learning guide for next session
+   - `PHASE3_COMPLETE.md` (12KB) - Complete debugging summary and deployment guide
+   - `VECTOR_SEARCH_DEBUG.md` (4.7KB) - Technical root cause analysis
+   - `VECTOR_SEARCH_COMPARISON.md` (2.5KB) - Before/after code comparison
+
+3. **Diagnostic Scripts** (8 total):
+   - Embedding storage repair, parameter binding tests, SQL validation
+   - All scripts validated and preserved for reference
+
+**Technical Achievement:**
+- PostgreSQL 16.11 + pgvector 0.8.1 fully operational on CT 113
+- Vector similarity search returning accurate scores (0.5765 for related concepts)
+- Resolved 2 critical bugs:
+  1. psycopg2 parameter handling for pgvector types (must cast in SQL, not Python)
+  2. ORDER BY with vector operations (subquery pattern required)
+
+**Validation Results:**
+- Query: "How do I create snapshots of virtual machines?"
+- Result: 0.5765 similarity to backup documentation
+- Interpretation: Correctly identifies semantic relationship between "snapshots" and "backups"
+
+**Infrastructure:**
+- Database: n8n_db on CT 113
+- Table: rag_embeddings (id, source_repo, file_path, chunk_text, embedding vector(768), metadata jsonb)
+- Embedding API: Ollama at 192.168.1.81:11434 (nomic-embed-text, 768 dimensions)
+- Storage overhead: ~3KB per vector, ~5KB per document total
+
+**Status:** ✅ Phase 3 Complete | Phase 4 Ready to Start
+**Next Steps:** Build n8n ingestion workflow to load homelab documentation from Gitea
+
+---
+
 ### 2025-12-07: Infrastructure Documentation & Monitoring Stack

 #### Additions
@@ -377,8 +501,9 @@ Hybrid approach balancing performance and resource efficiency:
   - Secure remote access without VPN

 3. **CT 113 (n8n)**: Workflow automation platform
-   - PostgreSQL 15+ backend
-   - IP: 192.168.2.107
+   - PostgreSQL 16.11 backend (upgraded from 15+)
+   - pgvector 0.8.1 extension for vector search
+   - IP: 192.168.2.113
   - Resolved database locale issues

 ### Modifications
@@ -403,7 +528,19 @@ Hybrid approach balancing performance and resource efficiency:

 ```
 homelab/
-    monitoring/                      # NEW: Monitoring stack configurations
+    n8n/                             # RAG Vector Search Implementation (NEW)
+        vector_search.py            # Production module for vector operations
+        SESSION_HANDOFF_PHASE4_READY.md  # Learning guide for next session
+        PHASE3_COMPLETE.md          # Phase 3 debugging and achievements summary
+        fix_embedding_storage.py    # Diagnostic script (embedding repair)
+        test_direct_sql.py          # Diagnostic script (query testing)
+        test_vector_search_working.py  # Validated working implementation
+        test_parameter_binding.py   # Diagnostic script (psycopg2 debugging)
+        test_pgvector_direct.sql    # Raw SQL tests for pgvector
+        VECTOR_SEARCH_DEBUG.md      # Technical debugging documentation
+        VECTOR_SEARCH_COMPARISON.md # Before/after code comparison
+        README_VECTOR_SEARCH.md     # Comprehensive setup guide
+    monitoring/                      # Monitoring stack configurations
        README.md                   # Comprehensive monitoring documentation
        grafana/
            docker-compose.yml
@@ -417,6 +554,8 @@ homelab/
    services/                        # Docker Compose service configurations
        n8n/                        # n8n workflow automation
        netbox/                     # Network documentation & IPAM
+        openclaw/                   # OpenClaw AI chatbot gateway (VM 120)
+        tinyauth/                   # SSO authentication layer
        README.md                   # Services overview (updated)
    disaster-recovery/
        homelab-export-20251207-120040/  # Latest infrastructure export
@@ -424,7 +563,16 @@ homelab/
        crawlers-exporters/         # Infrastructure collection scripts
        fixers/                     # Problem-solving scripts
        qol/                        # Quality of life improvements
+        security/                   # Security audit and remediation scripts (NEW)
+            verify-service-status.sh
+            backup-before-remediation.sh
+            rotate-*.sh             # Credential rotation scripts
+            QUICK_REFERENCE.md      # Security operations guide
+    troubleshooting/
+        SECURITY_AUDIT_2025-12-20.md  # Comprehensive security assessment
+        loki-stack-bugfix.md        # Loki logging troubleshooting
    CLAUDE.md                        # AI assistant guidance (updated)
+    SECURITY.md                      # Security policy and best practices (NEW)
    INDEX.md                         # Navigation index (updated)
    README.md                        # Repository overview (updated)
    CLAUDE_STATUS.md                # This file - current infrastructure status
@@ -454,7 +602,116 @@ homelab/

 ---

-## Current Initiative: Security Audit Remediation - Q4 2025
+## Current Initiative: n8n RAG Workflow for Homelab Documentation - Q4 2025
+
+### Goal
+Build an interactive n8n workflow that implements Retrieval-Augmented Generation (RAG) to query homelab documentation stored in Gitea using local AI (Ollama). This is a learning-focused project to understand RAG architecture, embeddings, vector storage, and LLM integration.
+
+### Phase
+Phase 3 Complete - Vector Storage Operational | Moving to Phase 4 - n8n Workflow Development
+
+### Infrastructure Components
+- **AI Backend**: Ollama running on Windows 11 PC (192.168.1.81)
+  - Hardware: AMD 7900 GRE GPU, i7-12700KF, 32GB RAM @ 4000MHz, 2TB NVMe
+  - Installation: Native Windows application (not Docker)
+  - Open-WebUI: Running in Docker Desktop on same machine (port 3000)
+- **Orchestrator**: n8n workflow automation (CT 113, 192.168.2.113)
+- **Data Source**: Gitea repositories (192.168.2.102:3060)
+  - Repositories: homelab, truenas
+- **Vector Storage**: PostgreSQL 16.11 + pgvector 0.8.1 (operational on CT 113)
+
+### Progress Checklist
+
+**Phase 1: Network & Connectivity Setup**
+- [x] Verify Gitea API accessibility (working: http://192.168.2.102:3060/api/v1)
+- [x] Verify n8n instance running (CT 113, 192.168.2.113)
+- [x] Configure Ollama network binding (set OLLAMA_HOST=0.0.0.0 via environment variables)
+- [x] Verify Ollama API accessible from homelab (curl http://192.168.1.81:11434/api/tags)
+- [x] Identify available Ollama models (LLMs: deepseek-r1:8.2B, gpt-oss:20.9B, llama3.2:3.2B, phi3:3.8B)
+- [x] Pull embedding model (nomic-embed-text - 768 dimensions, 274MB)
+
+**Phase 2: Understanding Embeddings (Learning Phase)**
+- [x] Pull sample document from Gitea API
+- [x] Send text to Ollama for embedding generation
+- [x] Examine vector output (768-dimensional vectors for each text)
+- [x] Understand semantic similarity concept (cosine similarity demo: 0.5764 for related topics)
+
+**Phase 3: Vector Storage Implementation** ✅ COMPLETE
+- [x] Evaluate PostgreSQL + pgvector (uses existing n8n database)
+- [x] Evaluate Qdrant (lightweight Docker deployment)
+- [x] Choose storage backend based on learning goals (PostgreSQL + pgvector selected)
+- [x] Install pgvector extension on CT 113 (PostgreSQL 16.11, pgvector 0.8.1)
+- [x] Create rag_embeddings table with vector(768) column
+- [x] Debug and fix vector insertion (corrected string→vector conversion)
+- [x] Debug and fix ORDER BY issue (subquery approach working)
+- [x] Verify cosine similarity search (working: 0.5765 similarity for related concepts)
+- [x] Create production-ready vector_search.py module with insert/search/stats functions
+
+**Phase 4: Build Ingestion Workflow (n8n)** - READY TO START
+- [ ] Deploy vector_search.py production module to CT 113
+- [ ] Test manual document insertion via CLI
+- [ ] Implement text chunking strategy (500 char chunks, 100 char overlap)
+- [ ] Create minimal n8n workflow: Manual Trigger → Gitea API → Chunk → Ollama → PostgreSQL
+- [ ] Test workflow with single README.md file from homelab repo
+- [ ] Scale to process all .md files in homelab repository
+- [ ] Add error handling and deduplication logic
+- [ ] Schedule automated daily ingestion runs
+
+**Phase 5: Build Query Workflow (n8n)** - NOT STARTED
+- [ ] Create workflow: Webhook → User question
+- [ ] Generate embedding for user query
+- [ ] Implement vector similarity search (threshold >0.5)
+- [ ] Retrieve top 3-5 relevant chunks
+- [ ] Construct prompt with retrieved context
+- [ ] Call Ollama LLM for answer generation (llama3.2 or deepseek-r1)
+- [ ] Return formatted response with source references
+- [ ] Add webhook endpoint for external integrations
+
+### Context
+**RAG Architecture Overview:**
+1. **Ingestion Pipeline**: Gitea API → Text Chunking → Ollama Embeddings → Vector Database
+2. **Query Pipeline**: User Question → Embedding → Vector Search → Context Retrieval → LLM Generation → Answer
+
+**Phase 3 Achievements (2025-12-25):**
+- ✅ PostgreSQL + pgvector fully operational on CT 113
+- ✅ Vector search working with 0.5765 similarity for related concepts
+- ✅ Production-ready Python module (`vector_search.py`) with insert/search/stats functions
+- ✅ Debugged and resolved 2 critical issues:
+  1. Embedding storage: Fixed psycopg2 parameter handling (must cast to `::vector(768)` in SQL, not Python)
+  2. ORDER BY bug: Subquery approach works, CTE approach fails (use `ORDER BY similarity DESC` instead of vector operation)
+
+**Key Learnings:**
+- ✅ Embeddings convert text to 768-dimensional vectors representing semantic meaning
+- ✅ Vector databases enable semantic search (meaning-based, not keyword-based)
+- ✅ pgvector cosine distance operator (`<=>`) measures similarity: 0=identical, 2=opposite
+- ✅ Similarity scores: >0.7=highly relevant, 0.5-0.7=related, 0.3-0.5=somewhat related, <0.3=unrelated
+- ✅ psycopg2 doesn't natively support pgvector - must format vectors as strings and cast in SQL
+- ✅ Reusing vector parameters in ORDER BY causes silent failures - use subqueries instead
+
+**Technical Stack Validated:**
+- Ollama API (192.168.1.81:11434) ✅ Accessible across subnets
+- nomic-embed-text model ✅ 768 dimensions, fast generation
+- PostgreSQL 16.11 + pgvector 0.8.1 ✅ Operators working correctly
+- Python psycopg2 ✅ With workarounds for vector handling
+
+**Success Metrics - Phase 3:**
+- ✅ Successfully query "how to backup VM" and retrieve relevant homelab documentation (0.5765 similarity)
+- ✅ Understand each component of the vector storage pipeline
+- ✅ Create reusable Python module for n8n integration
+
+**Next Steps - Phase 4:**
+- Deploy vector_search.py to CT 113 and test CLI interface
+- Create text chunking function (500 char chunks, 100 char overlap)
+- Build minimal n8n workflow: Manual Trigger → Gitea API → Chunk → Ollama → PostgreSQL
+- Scale to process all .md files in homelab repository
+- Add error handling and deduplication logic
+
+**Session Handoff Document:** `/home/jramos/homelab/n8n/SESSION_HANDOFF_PHASE4_READY.md`
+**Learning Resources:** Step-by-step lessons with examples, mental models, troubleshooting guide
+
+---
+
+## Previous Initiative: Security Audit Remediation - Q4 2025

 ### Goal
 Remediate 31 security findings identified in comprehensive security audit (2025-12-20), addressing critical vulnerabilities in Docker socket exposure, credential management, and SSL/TLS configuration.
@@ -632,16 +889,18 @@ Documentation & Maintenance
 - **Grafana**: http://192.168.2.114:3000
 - **Prometheus**: http://192.168.2.114:9090
 - **Nginx Proxy Manager**: http://192.168.2.101:81
- **n8n**: http://192.168.2.107:5678
+- **n8n**: http://192.168.2.113:5678
 - **TinyAuth**: https://tinyauth.apophisnetworking.net (internal: http://192.168.2.10:8000)
+- **OpenClaw**: https://openclaw.apophisnetworking.net (internal: http://192.168.2.120:18789)

 ### Key Network Segments
 - **Management Network**: 192.168.2.0/24
 - **Proxmox Host**: 192.168.2.200
 - **Reverse Proxy**: 192.168.2.101 (CT 102)
 - **TinyAuth**: 192.168.2.10 (CT 115)
- **n8n**: 192.168.2.107 (CT 113)
+- **n8n**: 192.168.2.113 (CT 113)
 - **Monitoring**: 192.168.2.114 (VM 101)
+- **OpenClaw**: 192.168.2.120 (VM 120)

 ---

@@ -726,5 +985,5 @@ Documentation & Maintenance
 **Maintained by**: jramos
 **Repository**: Homelab Infrastructure Configuration
 **Platform**: Proxmox VE 8.4.0
-**Infrastructure Scale**: 9 VMs, 2 Templates, 4 Containers
-**Current Status**: Operational - Home Automation Integration Deployed
+**Infrastructure Scale**: 10 VMs, 2 Templates, 5 Containers
+**Current Status**: Operational - OpenClaw Deployment In Progress