Files
homelab/CLAUDE_STATUS.md
Jordan Ramos fe75402738 docs(n8n): document troubleshooting session for 502 Bad Gateway issue
Root Cause:
- N8N_ENCRYPTION_KEY in /opt/n8n/.env contained literal shell command
  string $(openssl rand -hex 32) instead of executed value
- .env files do not execute shell commands, only parse literal strings
- Caused n8n service crash loop preventing startup

Troubleshooting Process:
- Identified service crash loop via journalctl logs
- Backend-Builder diagnosed invalid encryption key issue
- Multiple heredoc script attempts failed due to Windows/Linux line
  ending issues in WSL environment
- Created simple fix script using echo statements (no heredoc)

Solution:
- Fix script created at /tmp/fix_n8n_simple.sh
- Generates proper encryption key using openssl rand -hex 32
- Recreates .env with corrected configuration including missing
  N8N_LISTEN_ADDRESS=0.0.0.0 and NODE_ENV=production
- Backs up existing .env before changes
- Sets proper permissions (600, n8n:n8n)

Reviews:
- Backend-Builder: APPROVED (95% confidence, technically sound)
- Lab-Operator: APPROVED with safeguards (ZFS snapshot, DB backup)

Status: Ready for deployment by user on CT 113 tomorrow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 00:17:55 -07:00

12 KiB

Homelab Status Tracker

Last Updated: 2025-11-30 17:37:00 Goal: Document and commit recent infrastructure planning and integration documentation Phase: Completed Current Context: All documentation corrections committed. Architecture updates for Debian 12 and NPM committed to repository. Latest commit hash: c16d521070


Current Tasks

Pre-Commit Security & Sanitization

  • Step 1: Sanitize API key in OBSIDIAN-MCP-SETUP.md

    • Status: Completed at 2025-11-30 13:20:00
    • Owner: Librarian
    • Action: Replaced all 5 occurrences of real API key with placeholder
    • Result: Verified no production secrets remain in file
  • Step 2: Update .gitignore to exclude Claude config files

    • Status: Completed at 2025-11-30 13:21:00
    • Owner: Librarian
    • Action: Added .claude.json, *.claude.json, and .claude/ patterns
    • Result: Claude configuration files will not be committed to repository
  • Step 3: Stage all changes for commit

    • Status: Completed at 2025-11-30 13:22:00
    • Owner: Librarian
    • Action: Executed git add -A
    • Result: Staged 6 files (1 deleted, 2 modified, 3 new)
  • Step 4: Create commit with proper message

    • Status: Completed at 2025-11-30 13:24:29
    • Owner: Librarian
    • Action: Created commit with comprehensive conventional commit message
    • Result: Commit hash a1841f1c41
    • Changes: 6 files changed, 2,849 insertions(+), 73 deletions(-)

Completed Reviews

  • Scribe Review: Documented all changes comprehensively
  • Librarian Security Review: Identified security concerns
  • Lab-Operator Infrastructure Review: Validated operational impact

Changes Being Committed

Modified Files

  • CLAUDE.md: Enhanced with Universal Workflow sections

Deleted Files

  • .claude/agents/homelab-steve.md: Removed legacy agent definition

New Files

  • CLAUDE_STATUS.md: Status tracking file
  • OBSIDIAN-MCP-SETUP.md: Obsidian MCP guide (820 lines)
  • n8n/N8N-SETUP-PLAN.md: n8n deployment plan (1,948 lines)

Post-Commit Documentation Corrections

  • Fix PostgreSQL Installation Instructions: n8n/N8N-SETUP-PLAN.md

    • Status: Completed at 2025-11-30 13:30:00
    • Owner: Scribe
    • Issue: PostgreSQL 16 installation failed - package not in standard repos
    • Action: Added PostgreSQL official repository setup steps (lines 587-605)
    • Result: Installation instructions now work correctly
    • Reported by: User (real-world deployment feedback)
  • Architecture Corrections - Batch Updates: n8n/N8N-SETUP-PLAN.md

    • Status: Completed at 2025-11-30 14:00:00
    • Owners: Scribe (documentation), Lab-Operator (validation)
    • Issues Identified:
      1. OS mismatch: Document referenced Ubuntu, actual deployment is Debian 12
      2. Reverse proxy mismatch: Document described standalone nginx, actual is Nginx Proxy Manager (NPM)
    • Total Changes Applied: 30+ corrections across 4 batches

    Batch 1 - OS Corrections (2 changes):

    • Line 200: Updated OS template "Debian 12 or Ubuntu" → "Debian 12"
    • Line 588: Updated comment "Ubuntu repositories" → "Debian repositories"

    Batch 2 - NPM Terminology Updates (10 changes):

    • Line 12: Executive summary updated to reference NPM
    • Lines 112-113: CT 102 specs updated (2 cores, 4GB RAM, 10GB disk) and renamed to nginx-proxy-mgr
    • Line 170: LXC consistency reference updated to NPM
    • Lines 260, 286, 308-309: Network diagrams updated (nginx → NPM, added port 81)
    • Line 320: Firewall comment updated
    • Lines 583-584: Removed nginx-light and certbot from prerequisites
    • Line 893: Firewall rule comment updated to NPM

    Batch 3 - Major Section Rewrites (2 sections):

    • Lines 379-437: Section VI-A completely rewritten for NPM architecture
      • Added NPM overview with GitHub link
      • Replaced manual nginx config with NPM web UI instructions
      • Documented NPM admin access (port 81)
      • Updated SSL configuration approach (GUI vs certbot)
    • Lines 765-917: Phase 7 completely rewritten (reduced from 20min to 10min)
      • Replaced SSH/manual config with browser-based NPM UI steps
      • Added step-by-step proxy host creation guide
      • Included SSL certificate request via NPM interface
      • Added NPM-specific troubleshooting section

    Batch 4 - Remaining Updates (15+ changes):

    • Line 1093: "HTTPS through nginx" → "HTTPS through NPM"
    • Lines 1360-1372: Troubleshooting section updated for NPM (Docker commands, UI access)
    • Line 1376: Firewall check comment updated
    • Line 1392: Timeout check reference updated to NPM Advanced settings
    • Line 1444: Security hardening checklist updated
    • Lines 1478-1487: Rate limiting implementation updated for NPM
    • Line 1575: Workflow diagram updated
    • Line 1801: Architecture diagram updated (nginx → NPM)
    • Line 1868: Deployment checklist updated

    Key Architecture Changes Documented:

    1. Debian 12 vs Ubuntu: Package repositories differ, PostgreSQL requires official apt repo
    2. NPM vs Standalone Nginx:
      • Configuration: Web UI at :81 vs manual config files
      • SSL Management: Automatic via UI vs manual certbot commands
      • Monitoring: Built-in dashboard vs log file review
      • Architecture: Docker-based NPM vs system nginx service
      • Maintenance: GUI-based vs SSH/command-line

    Lab-Operator Validation: APPROVED

    • All changes verified against actual Proxmox infrastructure
    • NPM compatibility confirmed (Docker on LXC with nesting=1)
    • Security implications reviewed and documented
    • No operational risks identified

    Impact:

    • Phase 7 time reduced: 20 minutes → 10 minutes
    • Deployment complexity reduced (no SSH to CT 102 required)
    • Maintenance simplified (web UI vs config files)
    • Documentation accuracy: Aligned with real deployment environment
  • Commit Architecture Corrections to Repository

    • Status: Completed at 2025-11-30 17:37:00
    • Owner: Librarian
    • Action: Created commit with conventional commit message for n8n architecture corrections
    • Result: Commit hash c16d521070
    • Changes: 2 files changed, 325 insertions(+), 194 deletions(-)
      • CLAUDE_STATUS.md: Updated with detailed change log
      • n8n/N8N-SETUP-PLAN.md: 30+ architecture corrections (Debian 12 + NPM)


Active Troubleshooting: n8n 502 Bad Gateway

Started: 2025-11-30 Updated: 2025-12-01 Status: Ready for Deployment Issue: n8n returns 502 Bad Gateway - Root cause identified and fix script prepared

Problem Summary

Symptoms:

  • External access: https://n8n.apophisnetworking.net returns 502 Bad Gateway (from mobile)
  • Internal access: Returns nginx default page or connection issues
  • Comparison: beszel.apophisnetworking.net works perfectly (both internal and external)

Configuration Context:

  • n8n location: CT 113 at 192.168.2.113:5678
  • NPM location: CT 102 at 192.168.2.101
  • Beszel location: 192.168.2.102:8090 (working reference)
  • All services behind same NPM, same Cloudflare DNS setup

n8n Configuration (from /opt/n8n/.env)

# n8n Configuration
N8N_PROTOCOL=https
N8N_HOST=n8n.apophisnetworking.net
N8N_PORT=5678
N8N_PATH=/
WEBHOOK_URL=https://n8n.apophisnetworking.net/

# Database
DB_TYPE=postgresdb
DB_POSTGRESDB_HOST=localhost
DB_POSTGRESDB_PORT=5432
DB_POSTGRESDB_DATABASE=n8n_db
DB_POSTGRESDB_USER=n8n_user

NPM Proxy Host Configuration (from screenshots)

Details Tab:

  • Domain: n8n.apophisnetworking.net
  • Scheme: http
  • Forward to: 192.168.2.113:5678
  • Websockets: ✓ Enabled
  • Status: Online (green)

SSL Tab:

  • Certificate: *.apophisnetworking.net (wildcard)
  • Force SSL: ✓ Enabled
  • HTTP/2: ✓ Enabled
  • HSTS: ✓ Enabled

Diagnostic Steps Completed

  • Verify n8n service status (Lab-Operator)

    • Status: Service in crash loop - repeatedly starting and failing
    • Command: systemctl status n8n showed "activating (auto-restart)"
  • Review service logs (Lab-Operator)

    • Command: journalctl -u n8n -n 100
    • Errors found: Encryption key validation failures
    • Log showed: n8n exiting immediately after start attempt
  • Analyze .env configuration (Backend-Builder)

    • Found: N8N_ENCRYPTION_KEY=$(openssl rand -hex 32)
    • Issue: .env files don't execute shell commands - this is a literal string
    • Missing: N8N_LISTEN_ADDRESS=0.0.0.0
    • Missing: NODE_ENV=production
    • Password needs quoting: DB_POSTGRESDB_PASSWORD="Nbkx4mdmay1)"

Root Cause Analysis

PRIMARY ISSUE: Invalid N8N_ENCRYPTION_KEY in /opt/n8n/.env

Technical Explanation: The .env file contained N8N_ENCRYPTION_KEY=$(openssl rand -hex 32) which was intended to generate a random encryption key. However, .env files are not shell scripts - they don't execute commands. The variable was set to the literal string $(openssl rand -hex 32) instead of an actual 64-character hexadecimal key.

Impact:

  • n8n service fails encryption key validation on startup
  • Service enters crash loop (start → fail → restart → fail)
  • NPM returns 502 Bad Gateway because backend service is down
  • Issue was NOT hairpin NAT or NPM misconfiguration (beszel works fine with same setup)

Additional Configuration Issues Identified:

  1. Missing N8N_LISTEN_ADDRESS=0.0.0.0 - would cause service to listen only on localhost
  2. Missing NODE_ENV=production - affects performance and security
  3. Database password not quoted - special characters need proper escaping

Attempted Solutions & Lessons Learned

Attempt 1-3: Heredoc Script Failures

  • Created fix script using heredoc syntax for .env generation
  • Error: warning: here-document at line 22 delimited by end-of-file (wanted 'ENVEOF')
  • Root cause: Windows/Linux line ending issues when copying script from WSL to LXC container
  • Backend-Builder's first attempt incorrectly changed to SQLite (corrected to maintain PostgreSQL)
  • Lesson: Heredoc syntax fragile in cross-platform environments

Final Solution: Simple Echo-Based Script

  • Replaced heredoc with simple echo statements
  • More robust to copy-paste and line ending issues
  • Avoids CRLF/LF conversion problems

Solution: Fix Script Ready for Deployment

Script Location: /tmp/fix_n8n_simple.sh (on WSL, ready to transfer to CT 113)

Script Actions:

  1. Generates proper encryption key: ENCRYPTION_KEY=$(openssl rand -hex 32)
  2. Backs up existing .env with timestamp: /opt/n8n/.env.backup.YYYYMMDD_HHMMSS
  3. Creates new .env file with corrected configuration:
    • Actual generated encryption key (not shell command)
    • Adds N8N_LISTEN_ADDRESS=0.0.0.0
    • Adds NODE_ENV=production
    • Properly quotes DB_POSTGRESDB_PASSWORD
    • Maintains PostgreSQL database configuration
  4. Sets secure permissions: chmod 600 and chown n8n:n8n
  5. Restarts n8n service
  6. Verifies service status and local connectivity

Reviews Completed:

  • Backend-Builder: Code review APPROVED (95% confidence, technically sound)
  • Lab-Operator: Operational review APPROVED with safeguards documented
    • Minimal downtime (~13 seconds)
    • No database corruption risk
    • Rollback procedures documented
    • Security recommendations provided

Pre-Execution Safeguards:

  1. Create ZFS snapshot of CT 113: pct snapshot 113 pre-n8n-fix
  2. Backup PostgreSQL database: pg_dump n8n_db > /tmp/n8n_db_pre_fix_backup.sql
  3. Verify no encrypted credentials exist (likely none since service never started)

Security Notes:

  • Script contains hardcoded password - delete after use: shred -u /tmp/fix_n8n_simple.sh
  • Do NOT commit script to git repository
  • Encryption key properly secured in .env with 600 permissions

Next Actions

  • User to deploy fix script on CT 113 tomorrow (2025-12-02)
  • Test external access after fix: https://n8n.apophisnetworking.net
  • Verify service stability for 24 hours
  • Update this status file to RESOLVED after successful deployment

Files Referenced

  • /home/jramos/homelab/n8n/N8N-SETUP-PLAN.md - Phase 5 configuration
  • /opt/n8n/.env - n8n configuration (on CT 113)
  • /tmp/fix_n8n_simple.sh - Fix script (NOT committed to git - contains password)
  • /data/nginx/proxy_host/*.conf - NPM proxy configs (on CT 102)

Repository: /home/jramos/homelab | Branch: main