Root Cause: - N8N_ENCRYPTION_KEY in /opt/n8n/.env contained literal shell command string $(openssl rand -hex 32) instead of executed value - .env files do not execute shell commands, only parse literal strings - Caused n8n service crash loop preventing startup Troubleshooting Process: - Identified service crash loop via journalctl logs - Backend-Builder diagnosed invalid encryption key issue - Multiple heredoc script attempts failed due to Windows/Linux line ending issues in WSL environment - Created simple fix script using echo statements (no heredoc) Solution: - Fix script created at /tmp/fix_n8n_simple.sh - Generates proper encryption key using openssl rand -hex 32 - Recreates .env with corrected configuration including missing N8N_LISTEN_ADDRESS=0.0.0.0 and NODE_ENV=production - Backs up existing .env before changes - Sets proper permissions (600, n8n:n8n) Reviews: - Backend-Builder: APPROVED (95% confidence, technically sound) - Lab-Operator: APPROVED with safeguards (ZFS snapshot, DB backup) Status: Ready for deployment by user on CT 113 tomorrow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
Homelab Status Tracker
Last Updated: 2025-11-30 17:37:00
Goal: Document and commit recent infrastructure planning and integration documentation
Phase: Completed
Current Context: All documentation corrections committed. Architecture updates for Debian 12 and NPM committed to repository. Latest commit hash: c16d521070
Current Tasks
Pre-Commit Security & Sanitization
-
Step 1: Sanitize API key in OBSIDIAN-MCP-SETUP.md
- Status: Completed at 2025-11-30 13:20:00
- Owner: Librarian
- Action: Replaced all 5 occurrences of real API key with placeholder
- Result: Verified no production secrets remain in file
-
Step 2: Update .gitignore to exclude Claude config files
- Status: Completed at 2025-11-30 13:21:00
- Owner: Librarian
- Action: Added .claude.json, *.claude.json, and .claude/ patterns
- Result: Claude configuration files will not be committed to repository
-
Step 3: Stage all changes for commit
- Status: Completed at 2025-11-30 13:22:00
- Owner: Librarian
- Action: Executed git add -A
- Result: Staged 6 files (1 deleted, 2 modified, 3 new)
-
Step 4: Create commit with proper message
- Status: Completed at 2025-11-30 13:24:29
- Owner: Librarian
- Action: Created commit with comprehensive conventional commit message
- Result: Commit hash
a1841f1c41 - Changes: 6 files changed, 2,849 insertions(+), 73 deletions(-)
Completed Reviews
- Scribe Review: Documented all changes comprehensively
- Librarian Security Review: Identified security concerns
- Lab-Operator Infrastructure Review: Validated operational impact
Changes Being Committed
Modified Files
- CLAUDE.md: Enhanced with Universal Workflow sections
Deleted Files
- .claude/agents/homelab-steve.md: Removed legacy agent definition
New Files
- CLAUDE_STATUS.md: Status tracking file
- OBSIDIAN-MCP-SETUP.md: Obsidian MCP guide (820 lines)
- n8n/N8N-SETUP-PLAN.md: n8n deployment plan (1,948 lines)
Post-Commit Documentation Corrections
-
Fix PostgreSQL Installation Instructions: n8n/N8N-SETUP-PLAN.md
- Status: Completed at 2025-11-30 13:30:00
- Owner: Scribe
- Issue: PostgreSQL 16 installation failed - package not in standard repos
- Action: Added PostgreSQL official repository setup steps (lines 587-605)
- Result: Installation instructions now work correctly
- Reported by: User (real-world deployment feedback)
-
Architecture Corrections - Batch Updates: n8n/N8N-SETUP-PLAN.md
- Status: Completed at 2025-11-30 14:00:00
- Owners: Scribe (documentation), Lab-Operator (validation)
- Issues Identified:
- OS mismatch: Document referenced Ubuntu, actual deployment is Debian 12
- Reverse proxy mismatch: Document described standalone nginx, actual is Nginx Proxy Manager (NPM)
- Total Changes Applied: 30+ corrections across 4 batches
Batch 1 - OS Corrections (2 changes):
- Line 200: Updated OS template "Debian 12 or Ubuntu" → "Debian 12"
- Line 588: Updated comment "Ubuntu repositories" → "Debian repositories"
Batch 2 - NPM Terminology Updates (10 changes):
- Line 12: Executive summary updated to reference NPM
- Lines 112-113: CT 102 specs updated (2 cores, 4GB RAM, 10GB disk) and renamed to nginx-proxy-mgr
- Line 170: LXC consistency reference updated to NPM
- Lines 260, 286, 308-309: Network diagrams updated (nginx → NPM, added port 81)
- Line 320: Firewall comment updated
- Lines 583-584: Removed nginx-light and certbot from prerequisites
- Line 893: Firewall rule comment updated to NPM
Batch 3 - Major Section Rewrites (2 sections):
- Lines 379-437: Section VI-A completely rewritten for NPM architecture
- Added NPM overview with GitHub link
- Replaced manual nginx config with NPM web UI instructions
- Documented NPM admin access (port 81)
- Updated SSL configuration approach (GUI vs certbot)
- Lines 765-917: Phase 7 completely rewritten (reduced from 20min to 10min)
- Replaced SSH/manual config with browser-based NPM UI steps
- Added step-by-step proxy host creation guide
- Included SSL certificate request via NPM interface
- Added NPM-specific troubleshooting section
Batch 4 - Remaining Updates (15+ changes):
- Line 1093: "HTTPS through nginx" → "HTTPS through NPM"
- Lines 1360-1372: Troubleshooting section updated for NPM (Docker commands, UI access)
- Line 1376: Firewall check comment updated
- Line 1392: Timeout check reference updated to NPM Advanced settings
- Line 1444: Security hardening checklist updated
- Lines 1478-1487: Rate limiting implementation updated for NPM
- Line 1575: Workflow diagram updated
- Line 1801: Architecture diagram updated (nginx → NPM)
- Line 1868: Deployment checklist updated
Key Architecture Changes Documented:
- Debian 12 vs Ubuntu: Package repositories differ, PostgreSQL requires official apt repo
- NPM vs Standalone Nginx:
- Configuration: Web UI at :81 vs manual config files
- SSL Management: Automatic via UI vs manual certbot commands
- Monitoring: Built-in dashboard vs log file review
- Architecture: Docker-based NPM vs system nginx service
- Maintenance: GUI-based vs SSH/command-line
Lab-Operator Validation: ✅ APPROVED
- All changes verified against actual Proxmox infrastructure
- NPM compatibility confirmed (Docker on LXC with nesting=1)
- Security implications reviewed and documented
- No operational risks identified
Impact:
- Phase 7 time reduced: 20 minutes → 10 minutes
- Deployment complexity reduced (no SSH to CT 102 required)
- Maintenance simplified (web UI vs config files)
- Documentation accuracy: Aligned with real deployment environment
-
Commit Architecture Corrections to Repository
- Status: Completed at 2025-11-30 17:37:00
- Owner: Librarian
- Action: Created commit with conventional commit message for n8n architecture corrections
- Result: Commit hash
c16d521070 - Changes: 2 files changed, 325 insertions(+), 194 deletions(-)
- CLAUDE_STATUS.md: Updated with detailed change log
- n8n/N8N-SETUP-PLAN.md: 30+ architecture corrections (Debian 12 + NPM)
Active Troubleshooting: n8n 502 Bad Gateway
Started: 2025-11-30 Updated: 2025-12-01 Status: Ready for Deployment Issue: n8n returns 502 Bad Gateway - Root cause identified and fix script prepared
Problem Summary
Symptoms:
- ❌ External access:
https://n8n.apophisnetworking.netreturns 502 Bad Gateway (from mobile) - ❌ Internal access: Returns nginx default page or connection issues
- ✅ Comparison:
beszel.apophisnetworking.networks perfectly (both internal and external)
Configuration Context:
- n8n location: CT 113 at 192.168.2.113:5678
- NPM location: CT 102 at 192.168.2.101
- Beszel location: 192.168.2.102:8090 (working reference)
- All services behind same NPM, same Cloudflare DNS setup
n8n Configuration (from /opt/n8n/.env)
# n8n Configuration
N8N_PROTOCOL=https
N8N_HOST=n8n.apophisnetworking.net
N8N_PORT=5678
N8N_PATH=/
WEBHOOK_URL=https://n8n.apophisnetworking.net/
# Database
DB_TYPE=postgresdb
DB_POSTGRESDB_HOST=localhost
DB_POSTGRESDB_PORT=5432
DB_POSTGRESDB_DATABASE=n8n_db
DB_POSTGRESDB_USER=n8n_user
NPM Proxy Host Configuration (from screenshots)
Details Tab:
- Domain:
n8n.apophisnetworking.net - Scheme:
http - Forward to:
192.168.2.113:5678 - Websockets: ✓ Enabled
- Status: Online (green)
SSL Tab:
- Certificate:
*.apophisnetworking.net(wildcard) - Force SSL: ✓ Enabled
- HTTP/2: ✓ Enabled
- HSTS: ✓ Enabled
Diagnostic Steps Completed
-
Verify n8n service status (Lab-Operator)
- Status: Service in crash loop - repeatedly starting and failing
- Command:
systemctl status n8nshowed "activating (auto-restart)"
-
Review service logs (Lab-Operator)
- Command:
journalctl -u n8n -n 100 - Errors found: Encryption key validation failures
- Log showed: n8n exiting immediately after start attempt
- Command:
-
Analyze .env configuration (Backend-Builder)
- Found:
N8N_ENCRYPTION_KEY=$(openssl rand -hex 32) - Issue: .env files don't execute shell commands - this is a literal string
- Missing:
N8N_LISTEN_ADDRESS=0.0.0.0 - Missing:
NODE_ENV=production - Password needs quoting:
DB_POSTGRESDB_PASSWORD="Nbkx4mdmay1)"
- Found:
Root Cause Analysis
PRIMARY ISSUE: Invalid N8N_ENCRYPTION_KEY in /opt/n8n/.env
Technical Explanation:
The .env file contained N8N_ENCRYPTION_KEY=$(openssl rand -hex 32) which was intended to generate a random encryption key. However, .env files are not shell scripts - they don't execute commands. The variable was set to the literal string $(openssl rand -hex 32) instead of an actual 64-character hexadecimal key.
Impact:
- n8n service fails encryption key validation on startup
- Service enters crash loop (start → fail → restart → fail)
- NPM returns 502 Bad Gateway because backend service is down
- Issue was NOT hairpin NAT or NPM misconfiguration (beszel works fine with same setup)
Additional Configuration Issues Identified:
- Missing
N8N_LISTEN_ADDRESS=0.0.0.0- would cause service to listen only on localhost - Missing
NODE_ENV=production- affects performance and security - Database password not quoted - special characters need proper escaping
Attempted Solutions & Lessons Learned
Attempt 1-3: Heredoc Script Failures
- Created fix script using heredoc syntax for .env generation
- Error:
warning: here-document at line 22 delimited by end-of-file (wanted 'ENVEOF') - Root cause: Windows/Linux line ending issues when copying script from WSL to LXC container
- Backend-Builder's first attempt incorrectly changed to SQLite (corrected to maintain PostgreSQL)
- Lesson: Heredoc syntax fragile in cross-platform environments
Final Solution: Simple Echo-Based Script
- Replaced heredoc with simple
echostatements - More robust to copy-paste and line ending issues
- Avoids CRLF/LF conversion problems
Solution: Fix Script Ready for Deployment
Script Location: /tmp/fix_n8n_simple.sh (on WSL, ready to transfer to CT 113)
Script Actions:
- Generates proper encryption key:
ENCRYPTION_KEY=$(openssl rand -hex 32) - Backs up existing .env with timestamp:
/opt/n8n/.env.backup.YYYYMMDD_HHMMSS - Creates new .env file with corrected configuration:
- Actual generated encryption key (not shell command)
- Adds
N8N_LISTEN_ADDRESS=0.0.0.0 - Adds
NODE_ENV=production - Properly quotes
DB_POSTGRESDB_PASSWORD - Maintains PostgreSQL database configuration
- Sets secure permissions:
chmod 600andchown n8n:n8n - Restarts n8n service
- Verifies service status and local connectivity
Reviews Completed:
- ✅ Backend-Builder: Code review APPROVED (95% confidence, technically sound)
- ✅ Lab-Operator: Operational review APPROVED with safeguards documented
- Minimal downtime (~13 seconds)
- No database corruption risk
- Rollback procedures documented
- Security recommendations provided
Pre-Execution Safeguards:
- Create ZFS snapshot of CT 113:
pct snapshot 113 pre-n8n-fix - Backup PostgreSQL database:
pg_dump n8n_db > /tmp/n8n_db_pre_fix_backup.sql - Verify no encrypted credentials exist (likely none since service never started)
Security Notes:
- Script contains hardcoded password - delete after use:
shred -u /tmp/fix_n8n_simple.sh - Do NOT commit script to git repository
- Encryption key properly secured in .env with 600 permissions
Next Actions
- User to deploy fix script on CT 113 tomorrow (2025-12-02)
- Test external access after fix:
https://n8n.apophisnetworking.net - Verify service stability for 24 hours
- Update this status file to RESOLVED after successful deployment
Files Referenced
/home/jramos/homelab/n8n/N8N-SETUP-PLAN.md- Phase 5 configuration/opt/n8n/.env- n8n configuration (on CT 113)/tmp/fix_n8n_simple.sh- Fix script (NOT committed to git - contains password)/data/nginx/proxy_host/*.conf- NPM proxy configs (on CT 102)
Repository: /home/jramos/homelab | Branch: main