Files
homelab/SECURITY.md
Jordan Ramos e481c95da4 docs(security): comprehensive security audit and remediation documentation
- Add SECURITY.md policy with credential management, Docker security, SSL/TLS guidance
- Add security audit report (2025-12-20) with 31 findings across 4 severity levels
- Add pre-deployment security checklist template
- Update CLAUDE_STATUS.md with security audit initiative
- Expand services/README.md with comprehensive security sections
- Add script validation report and container name fix guide

Audit identified 6 CRITICAL, 3 HIGH, 2 MEDIUM findings
4-phase remediation roadmap created (estimated 6-13 min downtime)
All security scripts validated and ready for execution

Related: Security Audit Q4 2025, CRITICAL-001 through CRITICAL-006

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 13:52:34 -07:00

24 KiB

Security Policy

Version: 1.0 Last Updated: 2025-12-20 Effective Date: 2025-12-20

Overview

This document establishes the security policy and best practices for the homelab infrastructure environment running on Proxmox VE. The policy applies to all virtual machines (VMs), LXC containers, Docker services, and network resources deployed within the homelab.

Scope

This security policy covers:

  • Proxmox VE infrastructure (serviceslab node at 192.168.2.200)
  • All virtual machines and LXC containers
  • Docker containers and compose stacks
  • Network services and reverse proxies
  • Authentication and access control systems
  • Data storage and backup systems
  • Monitoring and logging infrastructure

Vulnerability Disclosure

Reporting Security Issues

Security vulnerabilities should be reported immediately to the infrastructure maintainer:

Contact: jramos Repository: http://192.168.2.102:3060/jramos/homelab Documentation: /home/jramos/homelab/troubleshooting/

Disclosure Process

  1. Report: Submit vulnerability details via secure channel
  2. Acknowledge: Receipt confirmation within 24 hours
  3. Investigate: Assessment and validation within 72 hours
  4. Remediate: Fix deployment based on severity (see SLA below)
  5. Document: Post-remediation documentation in /troubleshooting/
  6. Review: Security audit update and lessons learned

Severity Classification

Severity Response Time Example
CRITICAL < 4 hours Docker socket exposure, root credential leaks
HIGH < 24 hours Unencrypted credentials, missing authentication
MEDIUM < 72 hours Weak passwords, missing SSL/TLS
LOW < 7 days Informational findings, optimization opportunities

Security Best Practices

1. Credential Management

1.1 Password Requirements

Minimum Standards:

  • Length: 16+ characters for administrative accounts
  • Complexity: Mixed case, numbers, special characters
  • Uniqueness: No password reuse across services
  • Rotation: Every 90 days for privileged accounts

Prohibited Practices:

  • Default passwords (e.g., admin/admin, password, changeme)
  • Hardcoded credentials in docker-compose files
  • Plaintext passwords in configuration files
  • Credentials committed to version control

1.2 Secrets Management

Docker Secrets Strategy:

# BAD: Hardcoded in docker-compose.yml
environment:
  - POSTGRES_PASSWORD=mypassword123

# GOOD: Environment file (.env)
environment:
  - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}

# BETTER: Docker secrets (for swarm mode)
secrets:
  - postgres_password

Environment File Protection:

# Ensure .env files are gitignored
echo "*.env" >> .gitignore
echo ".env.*" >> .gitignore

# Set restrictive permissions
chmod 600 /path/to/service/.env
chown root:root /path/to/service/.env

Credential Storage Locations:

  • Docker service secrets: /path/to/service/.env (gitignored)
  • Proxmox credentials: Stored in Proxmox secret storage or .env files
  • Database passwords: Environment variables, rotated quarterly
  • API tokens: Environment variables, scoped to minimum permissions

1.3 Credential Rotation

Rotation Schedule:

Credential Type Frequency Tool/Script
Proxmox root/API users 90 days scripts/security/rotate-pve-credentials.sh
Database passwords 90 days scripts/security/rotate-paperless-password.sh
JWT secrets 90 days scripts/security/rotate-bytestash-jwt.sh
Service passwords 90 days scripts/security/rotate-logward-credentials.sh
SSH keys 365 days Manual rotation via Ansible

Rotation Workflow:

  1. Backup: Create full backup before rotation (scripts/security/backup-before-remediation.sh)
  2. Generate: Create new credential using password manager or openssl rand -base64 32
  3. Update: Modify .env file or service configuration
  4. Restart: Restart affected service: docker compose restart <service>
  5. Verify: Test service functionality post-rotation
  6. Document: Record rotation in /troubleshooting/ log file

2. Docker Security

2.1 Docker Socket Protection

CRITICAL: The Docker socket (/var/run/docker.sock) provides root-level access to the host system.

Current Exposures (as of 2025-12-20 audit):

  • Portainer: Direct socket mount
  • Nginx Proxy Manager: Direct socket mount
  • Speedtest Tracker: Direct socket mount

Remediation Strategy:

# INSECURE: Direct socket mount
volumes:
  - /var/run/docker.sock:/var/run/docker.sock

# SECURE: Use docker-socket-proxy
services:
  socket-proxy:
    image: tecnativa/docker-socket-proxy
    environment:
      - CONTAINERS=1
      - NETWORKS=1
      - SERVICES=1
      - TASKS=0
      - POST=0
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    restart: unless-stopped

  portainer:
    image: portainer/portainer-ce
    environment:
      - DOCKER_HOST=tcp://socket-proxy:2375
    # No direct socket mount

Implementation Guide: See scripts/security/docker-socket-proxy/README.md

2.2 Container User Privileges

Principle: Containers should run as non-root users whenever possible.

Current Issues (2025-12-20 audit):

  • Multiple containers running as root (UID 0)
  • Missing user: directive in docker-compose files

Remediation:

# Add to docker-compose.yml
services:
  myapp:
    image: myapp:latest
    user: "1000:1000"  # Run as non-root user
    # OR use image-specific variables
    environment:
      - PUID=1000
      - PGID=1000

Verification:

# Check running container user
docker exec <container> id

# Should show non-root user:
# uid=1000(appuser) gid=1000(appuser)

2.3 Container Hardening

Security Checklist:

  • Run as non-root user
  • Use read-only root filesystem where possible: read_only: true
  • Drop unnecessary capabilities: cap_drop: [ALL]
  • Limit resources: mem_limit, cpus
  • Enable no-new-privileges: security_opt: [no-new-privileges:true]
  • Use minimal base images (Alpine, distroless)
  • Scan images for vulnerabilities: docker scan <image>

Example Hardened Service:

services:
  secure-app:
    image: secure-app:latest
    user: "1000:1000"
    read_only: true
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if needed
    mem_limit: 512m
    cpus: 0.5
    tmpfs:
      - /tmp:size=100M,mode=1777

2.4 Image Security

Best Practices:

  1. Pin image versions: Use specific tags, not latest

    image: nginx:1.25.3-alpine  # GOOD
    image: nginx:latest          # BAD
    
  2. Verify image signatures: Enable Docker Content Trust

    export DOCKER_CONTENT_TRUST=1
    
  3. Scan for vulnerabilities: Use Trivy or Grype

    # Install trivy
    docker run aquasec/trivy image nginx:1.25.3-alpine
    
  4. Use official images: Prefer verified publishers from Docker Hub

  5. Regular updates: Monthly image update cycle

    docker compose pull
    docker compose up -d
    

3. SSL/TLS Configuration

3.1 Certificate Management

Nginx Proxy Manager (NPM):

  • Primary SSL termination point for external services
  • Let's Encrypt integration for automatic certificate renewal
  • Deployed on CT 102 (192.168.2.101)

Certificate Lifecycle:

  1. Generation: Use Let's Encrypt via NPM UI (http://192.168.2.101:81)
  2. Deployment: Automatic via NPM
  3. Renewal: Automatic via NPM (60 days before expiry)
  4. Monitoring: Check NPM dashboard for expiry warnings

Manual Certificate Installation (if needed):

# Copy certificate to service
cp /path/to/cert.pem /path/to/service/certs/
cp /path/to/key.pem /path/to/service/certs/

# Set permissions
chmod 644 /path/to/service/certs/cert.pem
chmod 600 /path/to/service/certs/key.pem

3.2 SSL/TLS Best Practices

Current Gaps (2025-12-20 audit):

  • Internal services using HTTP (Grafana, Prometheus, PVE Exporter)
  • Missing HSTS headers on some NPM proxies
  • No TLS 1.3 enforcement

Remediation Checklist:

  • Enable SSL for all web UIs (Grafana, Prometheus, Portainer)
  • Configure NPM to force HTTPS redirects
  • Enable HSTS headers: Strict-Transport-Security: max-age=31536000
  • Disable TLS 1.0 and 1.1 (use TLS 1.2+ only)
  • Use strong cipher suites (Mozilla Intermediate configuration)

NPM SSL Configuration:

# Custom Nginx Configuration (NPM Advanced tab)
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;

3.3 Internal Service SSL

Grafana HTTPS:

# /etc/grafana/grafana.ini
[server]
protocol = https
cert_file = /etc/grafana/certs/cert.pem
cert_key = /etc/grafana/certs/key.pem

Prometheus HTTPS:

# prometheus.yml
web:
  tls_server_config:
    cert_file: /etc/prometheus/certs/cert.pem
    key_file: /etc/prometheus/certs/key.pem

4. Network Security

4.1 Network Segmentation

Current Architecture:

  • Single flat network: 192.168.2.0/24
  • All VMs and containers on same subnet

Recommended Segmentation:

Management VLAN (VLAN 10): 192.168.10.0/24
  - Proxmox node (192.168.10.200)
  - Ansible-Control (192.168.10.106)

Services VLAN (VLAN 20): 192.168.20.0/24
  - Web servers (109, 110)
  - Database server (111)
  - Docker services

DMZ VLAN (VLAN 30): 192.168.30.0/24
  - Nginx Proxy Manager (exposed to internet)
  - Public-facing services

Monitoring VLAN (VLAN 40): 192.168.40.0/24
  - Grafana, Prometheus, PVE Exporter
  - Logging services

Implementation: Use Proxmox VLANs and firewall rules (Phase 4 remediation)

4.2 Firewall Rules

Proxmox Firewall Best Practices:

# Enable Proxmox firewall
pveum cluster firewall enable

# Default deny incoming
pveum cluster firewall rules add --action DROP --dir in

# Allow management access
pveum cluster firewall rules add --action ACCEPT --proto tcp --dport 8006 --source 192.168.2.0/24

# Allow SSH (key-based only)
pveum cluster firewall rules add --action ACCEPT --proto tcp --dport 22 --source 192.168.2.0/24

Docker Network Isolation:

# Create isolated networks per service
networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # No external access

services:
  web:
    networks:
      - frontend
      - backend

  db:
    networks:
      - backend  # Database not exposed to frontend

4.3 Rate Limiting & DDoS Protection

Current Gaps:

  • No rate limiting on NPM proxies
  • No fail2ban deployment
  • No intrusion detection system (IDS)

NPM Rate Limiting:

# Custom Nginx Configuration (NPM)
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=web_limit:10m rate=100r/s;

location /api/ {
    limit_req zone=api_limit burst=20 nodelay;
}

location / {
    limit_req zone=web_limit burst=50 nodelay;
}

Fail2ban Deployment (Phase 3 remediation):

# Install on NPM container or host
apt-get install fail2ban

# Configure jail for NPM
cat > /etc/fail2ban/jail.d/npm.conf << EOF
[npm]
enabled = true
port = http,https
filter = npm
logpath = /var/log/nginx/error.log
maxretry = 5
bantime = 3600
EOF

5. Access Control

5.1 Authentication

Multi-Factor Authentication (MFA):

  • Proxmox: Enable 2FA via TOTP (Google Authenticator, Authy)
    # Enable 2FA for user
    pveum user tfa <user@pam> <TFA-ID>
    
  • Portainer: Enable MFA in Portainer settings
  • Grafana: Enable TOTP 2FA in user preferences
  • NPM: No native MFA (use reverse proxy authentication)

SSO Integration:

  • TinyAuth (CT 115) provides SSO for NetBox
  • Extend to other services using OAuth2/OIDC (Phase 4)

5.2 Authorization

Principle of Least Privilege:

  • Grant minimum required permissions
  • Use role-based access control (RBAC) where available
  • Regular access reviews (quarterly)

Proxmox Roles:

# Create limited user for monitoring
pveum user add monitor@pve
pveum acl modify / --user monitor@pve --role PVEAuditor

Docker/Portainer Roles:

  • Admin: Full access to all stacks
  • User: Access to specific stacks only
  • Read-only: View-only access for monitoring

5.3 SSH Access

SSH Hardening:

# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
Port 22  # Consider non-standard port
AllowUsers jramos ansible-user
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2

SSH Key Management:

  • Use ED25519 keys: ssh-keygen -t ed25519 -C "your_email@example.com"
  • Rotate keys annually
  • Store private keys securely (password manager, SSH agent)
  • Distribute public keys via Ansible

6. Logging and Monitoring

6.1 Centralized Logging

Current State:

  • Individual service logs: docker compose logs
  • No centralized log aggregation

Recommended Stack (Phase 4):

  • Loki: Log aggregation
  • Promtail: Log shipping
  • Grafana: Log visualization

Implementation:

# loki/docker-compose.yml
services:
  loki:
    image: grafana/loki:latest
    ports:
      - 3100:3100
    volumes:
      - ./loki-config.yml:/etc/loki/loki-config.yml
      - loki-data:/loki

  promtail:
    image: grafana/promtail:latest
    volumes:
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - ./promtail-config.yml:/etc/promtail/promtail-config.yml

6.2 Security Monitoring

Key Metrics to Monitor:

  • Failed authentication attempts (Proxmox, SSH, services)
  • Docker socket access events
  • Privilege escalation attempts
  • Network traffic anomalies
  • Resource exhaustion (CPU, memory, disk)

Alerting Rules (Prometheus):

# alerts.yml
groups:
  - name: security
    rules:
      - alert: HighFailedSSHLogins
        expr: rate(ssh_failed_login_total[5m]) > 5
        for: 5m
        annotations:
          summary: "High rate of failed SSH logins"

      - alert: DockerSocketAccess
        expr: increase(docker_socket_access_total[1h]) > 100
        annotations:
          summary: "Unusual Docker socket activity"

6.3 Audit Logging

Proxmox Audit Log:

# View Proxmox audit log
cat /var/log/pve/tasks/index

# Monitor in real-time
tail -f /var/log/pve/tasks/index

Docker Audit Logging:

# docker-compose.yml
services:
  myapp:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
        labels: "service,environment"

7. Backup and Recovery

7.1 Backup Strategy

Current Implementation:

  • Proxmox Backup Server (PBS) at 28.27% utilization
  • Automated daily incremental backups
  • Weekly full backups

Backup Scope:

  • All VMs and LXC containers
  • Docker volumes (manual backup via scripts)
  • Configuration files (version controlled in Git)

Backup Verification:

# Pre-remediation backup
/home/jramos/homelab/scripts/security/backup-before-remediation.sh

# Verify backup integrity
proxmox-backup-client list --repository <repo>

7.2 Encryption at Rest

Current Gaps (2025-12-20 audit):

  • PBS backups not encrypted
  • Docker volumes not encrypted
  • Sensitive configuration files unencrypted

Remediation (Phase 4):

# Enable PBS encryption
proxmox-backup-client backup ... --encrypt

# LUKS encryption for sensitive volumes
cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted-volume
mkfs.ext4 /dev/mapper/encrypted-volume

7.3 Disaster Recovery

Recovery Time Objective (RTO): 4 hours Recovery Point Objective (RPO): 24 hours

Recovery Procedure:

  1. Assess Damage: Identify failed components
  2. Restore Infrastructure: Rebuild Proxmox node if needed
  3. Restore VMs/Containers: Use PBS restore
  4. Restore Data: Mount backup volumes
  5. Verify Functionality: Test all services
  6. Document Incident: Post-mortem in /troubleshooting/

Recovery Testing: Quarterly DR drills

8. Vulnerability Management

8.1 Vulnerability Scanning

Container Scanning:

# Install Trivy
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install trivy

# Scan all running containers
docker ps --format '{{.Image}}' | xargs -I {} trivy image {}

# Scan docker-compose stack
trivy config docker-compose.yml

Host Scanning:

# Install OpenSCAP
apt-get install libopenscap8 openscap-scanner

# Run CIS benchmark scan
oscap xccdf eval --profile cis --results scan-results.xml /usr/share/xml/scap/ssg/content/ssg-ubuntu2004-xccdf.xml

8.2 Patch Management

Update Schedule:

  • Proxmox VE: Monthly (during maintenance window)
  • VMs/Containers: Bi-weekly (automated via Ansible)
  • Docker Images: Monthly (CI/CD pipeline)
  • Host OS: Weekly (security patches only)

Ansible Patch Playbook:

# playbooks/patch-systems.yml
- hosts: all
  become: yes
  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes

    - name: Upgrade all packages
      apt:
        upgrade: dist

    - name: Reboot if required
      reboot:
        msg: "Rebooting after patching"
      when: reboot_required_file.stat.exists

8.3 Security Baseline Compliance

CIS Docker Benchmark:

  • See audit report: /home/jramos/homelab/troubleshooting/SECURITY_AUDIT_2025-12-20.md
  • Current compliance: ~40% (as of 2025-12-20)
  • Target compliance: 80% (by Q1 2026)

NIST Cybersecurity Framework:

  • Identify: Asset inventory (CLAUDE_STATUS.md)
  • Protect: Access control, encryption (this document)
  • Detect: Monitoring, logging (Grafana, Prometheus)
  • Respond: Incident response plan (Section 9)
  • Recover: Backup and DR (Section 7)

9. Incident Response

9.1 Incident Classification

Severity Definition Examples
P1 - Critical Service outage, data breach Proxmox node failure, credential leak
P2 - High Degraded service, security vulnerability Single VM down, HIGH severity finding
P3 - Medium Non-critical issue SSL certificate expiry warning
P4 - Low Informational, enhancement Log rotation, optimization

9.2 Response Procedure

Phase 1: Detection

  • Monitor alerts from Grafana/Prometheus
  • Review logs for anomalies
  • User-reported issues

Phase 2: Containment

  • Isolate affected systems (firewall rules, network disconnect)
  • Preserve evidence (logs, disk images)
  • Prevent spread (patch vulnerable services)

Phase 3: Eradication

  • Remove malware/backdoors
  • Patch vulnerabilities
  • Reset compromised credentials

Phase 4: Recovery

  • Restore from clean backups
  • Verify service functionality
  • Monitor for recurrence

Phase 5: Post-Incident

  • Document incident in /troubleshooting/
  • Update security controls
  • Conduct lessons learned review

9.3 Communication Plan

Internal Communication:

  • Incident lead: jramos
  • Status updates: CLAUDE_STATUS.md
  • Documentation: /troubleshooting/INCIDENT-YYYY-MM-DD.md

External Communication:

  • For homelab: Not applicable (internal environment)
  • For production: Define stakeholder notification procedure

10. Compliance and Auditing

10.1 Security Audits

Audit Schedule:

  • Quarterly: Internal security review
  • Annually: Comprehensive security audit
  • Ad-hoc: After major infrastructure changes

Audit Scope:

  • Credential management practices
  • Docker security configuration
  • SSL/TLS certificate status
  • Access control policies
  • Backup and recovery procedures
  • Vulnerability scan results

Audit Documentation:

  • Location: /home/jramos/homelab/troubleshooting/SECURITY_AUDIT_*.md
  • Latest Audit: 2025-12-20 (31 findings)
  • Next Audit: 2026-03-20 (Q1 2026)

10.2 Compliance Standards

Applicable Standards (for reference/practice):

  • CIS Docker Benchmark v1.6.0
  • NIST Cybersecurity Framework v1.1
  • OWASP Top 10 (for web services)
  • PCI-DSS v4.0 (if handling payment data - N/A for homelab)

Compliance Tracking:

  • Checklist: /home/jramos/homelab/templates/SECURITY_CHECKLIST.md
  • Status: CLAUDE_STATUS.md (Security Status section)
  • Evidence: /troubleshooting/ and /scripts/security/

10.3 Documentation Requirements

Required Security Documentation:

  • Security Policy (this document)
  • Security Audit Reports (/troubleshooting/SECURITY_AUDIT_*.md)
  • Pre-Deployment Security Checklist (/templates/SECURITY_CHECKLIST.md)
  • Credential Rotation Procedures (/scripts/security/*.sh)
  • Incident Response Plan (Section 9 of this document)
  • Network Topology Diagram (TBD in Phase 4)
  • Data Flow Diagrams (TBD in Phase 4)
  • Risk Assessment Matrix (TBD in Q1 2026)

11. Security Checklists

Pre-Deployment Security Checklist

See comprehensive checklist: /home/jramos/homelab/templates/SECURITY_CHECKLIST.md

Quick Validation:

# Run quick security check
bash /home/jramos/homelab/templates/SECURITY_CHECKLIST.md#quick-validation-script

Quarterly Security Review Checklist

  • Review and rotate all service credentials
  • Scan all containers for vulnerabilities (Trivy)
  • Update all Docker images to latest versions
  • Review Proxmox audit logs for anomalies
  • Verify backup integrity and test restore
  • Review firewall rules and network ACLs
  • Update SSL certificates (if manual)
  • Review user access and permissions (RBAC)
  • Patch Proxmox VE, VMs, and containers
  • Update security documentation (this file)
  • Conduct penetration testing (if applicable)
  • Review and update incident response plan

12. Security Resources

Internal Documentation

  • Security Audit Report: /home/jramos/homelab/troubleshooting/SECURITY_AUDIT_2025-12-20.md
  • Security Scripts: /home/jramos/homelab/scripts/security/
  • Security Checklist: /home/jramos/homelab/templates/SECURITY_CHECKLIST.md
  • Infrastructure Status: /home/jramos/homelab/CLAUDE_STATUS.md
  • Service Documentation: /home/jramos/homelab/services/README.md

External Resources

Docker Security:

Proxmox Security:

General Security:

Security Tools:

13. Change Log

Date Version Changes Author
2025-12-20 1.0 Initial security policy creation following comprehensive security audit jramos / Claude Sonnet 4.5

Document Owner: jramos Review Frequency: Quarterly Next Review: 2026-03-20 Classification: Internal Use Repository: http://192.168.2.102:3060/jramos/homelab