- Add SECURITY.md policy with credential management, Docker security, SSL/TLS guidance - Add security audit report (2025-12-20) with 31 findings across 4 severity levels - Add pre-deployment security checklist template - Update CLAUDE_STATUS.md with security audit initiative - Expand services/README.md with comprehensive security sections - Add script validation report and container name fix guide Audit identified 6 CRITICAL, 3 HIGH, 2 MEDIUM findings 4-phase remediation roadmap created (estimated 6-13 min downtime) All security scripts validated and ready for execution Related: Security Audit Q4 2025, CRITICAL-001 through CRITICAL-006 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
24 KiB
Security Policy
Version: 1.0 Last Updated: 2025-12-20 Effective Date: 2025-12-20
Overview
This document establishes the security policy and best practices for the homelab infrastructure environment running on Proxmox VE. The policy applies to all virtual machines (VMs), LXC containers, Docker services, and network resources deployed within the homelab.
Scope
This security policy covers:
- Proxmox VE infrastructure (serviceslab node at 192.168.2.200)
- All virtual machines and LXC containers
- Docker containers and compose stacks
- Network services and reverse proxies
- Authentication and access control systems
- Data storage and backup systems
- Monitoring and logging infrastructure
Vulnerability Disclosure
Reporting Security Issues
Security vulnerabilities should be reported immediately to the infrastructure maintainer:
Contact: jramos
Repository: http://192.168.2.102:3060/jramos/homelab
Documentation: /home/jramos/homelab/troubleshooting/
Disclosure Process
- Report: Submit vulnerability details via secure channel
- Acknowledge: Receipt confirmation within 24 hours
- Investigate: Assessment and validation within 72 hours
- Remediate: Fix deployment based on severity (see SLA below)
- Document: Post-remediation documentation in
/troubleshooting/ - Review: Security audit update and lessons learned
Severity Classification
| Severity | Response Time | Example |
|---|---|---|
| CRITICAL | < 4 hours | Docker socket exposure, root credential leaks |
| HIGH | < 24 hours | Unencrypted credentials, missing authentication |
| MEDIUM | < 72 hours | Weak passwords, missing SSL/TLS |
| LOW | < 7 days | Informational findings, optimization opportunities |
Security Best Practices
1. Credential Management
1.1 Password Requirements
Minimum Standards:
- Length: 16+ characters for administrative accounts
- Complexity: Mixed case, numbers, special characters
- Uniqueness: No password reuse across services
- Rotation: Every 90 days for privileged accounts
Prohibited Practices:
- Default passwords (e.g.,
admin/admin,password,changeme) - Hardcoded credentials in docker-compose files
- Plaintext passwords in configuration files
- Credentials committed to version control
1.2 Secrets Management
Docker Secrets Strategy:
# BAD: Hardcoded in docker-compose.yml
environment:
- POSTGRES_PASSWORD=mypassword123
# GOOD: Environment file (.env)
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
# BETTER: Docker secrets (for swarm mode)
secrets:
- postgres_password
Environment File Protection:
# Ensure .env files are gitignored
echo "*.env" >> .gitignore
echo ".env.*" >> .gitignore
# Set restrictive permissions
chmod 600 /path/to/service/.env
chown root:root /path/to/service/.env
Credential Storage Locations:
- Docker service secrets:
/path/to/service/.env(gitignored) - Proxmox credentials: Stored in Proxmox secret storage or
.envfiles - Database passwords: Environment variables, rotated quarterly
- API tokens: Environment variables, scoped to minimum permissions
1.3 Credential Rotation
Rotation Schedule:
| Credential Type | Frequency | Tool/Script |
|---|---|---|
| Proxmox root/API users | 90 days | scripts/security/rotate-pve-credentials.sh |
| Database passwords | 90 days | scripts/security/rotate-paperless-password.sh |
| JWT secrets | 90 days | scripts/security/rotate-bytestash-jwt.sh |
| Service passwords | 90 days | scripts/security/rotate-logward-credentials.sh |
| SSH keys | 365 days | Manual rotation via Ansible |
Rotation Workflow:
- Backup: Create full backup before rotation (
scripts/security/backup-before-remediation.sh) - Generate: Create new credential using password manager or
openssl rand -base64 32 - Update: Modify
.envfile or service configuration - Restart: Restart affected service:
docker compose restart <service> - Verify: Test service functionality post-rotation
- Document: Record rotation in
/troubleshooting/log file
2. Docker Security
2.1 Docker Socket Protection
CRITICAL: The Docker socket (/var/run/docker.sock) provides root-level access to the host system.
Current Exposures (as of 2025-12-20 audit):
- Portainer: Direct socket mount
- Nginx Proxy Manager: Direct socket mount
- Speedtest Tracker: Direct socket mount
Remediation Strategy:
# INSECURE: Direct socket mount
volumes:
- /var/run/docker.sock:/var/run/docker.sock
# SECURE: Use docker-socket-proxy
services:
socket-proxy:
image: tecnativa/docker-socket-proxy
environment:
- CONTAINERS=1
- NETWORKS=1
- SERVICES=1
- TASKS=0
- POST=0
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: unless-stopped
portainer:
image: portainer/portainer-ce
environment:
- DOCKER_HOST=tcp://socket-proxy:2375
# No direct socket mount
Implementation Guide: See scripts/security/docker-socket-proxy/README.md
2.2 Container User Privileges
Principle: Containers should run as non-root users whenever possible.
Current Issues (2025-12-20 audit):
- Multiple containers running as root (UID 0)
- Missing
user:directive in docker-compose files
Remediation:
# Add to docker-compose.yml
services:
myapp:
image: myapp:latest
user: "1000:1000" # Run as non-root user
# OR use image-specific variables
environment:
- PUID=1000
- PGID=1000
Verification:
# Check running container user
docker exec <container> id
# Should show non-root user:
# uid=1000(appuser) gid=1000(appuser)
2.3 Container Hardening
Security Checklist:
- Run as non-root user
- Use read-only root filesystem where possible:
read_only: true - Drop unnecessary capabilities:
cap_drop: [ALL] - Limit resources:
mem_limit,cpus - Enable no-new-privileges:
security_opt: [no-new-privileges:true] - Use minimal base images (Alpine, distroless)
- Scan images for vulnerabilities:
docker scan <image>
Example Hardened Service:
services:
secure-app:
image: secure-app:latest
user: "1000:1000"
read_only: true
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if needed
mem_limit: 512m
cpus: 0.5
tmpfs:
- /tmp:size=100M,mode=1777
2.4 Image Security
Best Practices:
-
Pin image versions: Use specific tags, not
latestimage: nginx:1.25.3-alpine # GOOD image: nginx:latest # BAD -
Verify image signatures: Enable Docker Content Trust
export DOCKER_CONTENT_TRUST=1 -
Scan for vulnerabilities: Use Trivy or Grype
# Install trivy docker run aquasec/trivy image nginx:1.25.3-alpine -
Use official images: Prefer verified publishers from Docker Hub
-
Regular updates: Monthly image update cycle
docker compose pull docker compose up -d
3. SSL/TLS Configuration
3.1 Certificate Management
Nginx Proxy Manager (NPM):
- Primary SSL termination point for external services
- Let's Encrypt integration for automatic certificate renewal
- Deployed on CT 102 (192.168.2.101)
Certificate Lifecycle:
- Generation: Use Let's Encrypt via NPM UI (http://192.168.2.101:81)
- Deployment: Automatic via NPM
- Renewal: Automatic via NPM (60 days before expiry)
- Monitoring: Check NPM dashboard for expiry warnings
Manual Certificate Installation (if needed):
# Copy certificate to service
cp /path/to/cert.pem /path/to/service/certs/
cp /path/to/key.pem /path/to/service/certs/
# Set permissions
chmod 644 /path/to/service/certs/cert.pem
chmod 600 /path/to/service/certs/key.pem
3.2 SSL/TLS Best Practices
Current Gaps (2025-12-20 audit):
- Internal services using HTTP (Grafana, Prometheus, PVE Exporter)
- Missing HSTS headers on some NPM proxies
- No TLS 1.3 enforcement
Remediation Checklist:
- Enable SSL for all web UIs (Grafana, Prometheus, Portainer)
- Configure NPM to force HTTPS redirects
- Enable HSTS headers:
Strict-Transport-Security: max-age=31536000 - Disable TLS 1.0 and 1.1 (use TLS 1.2+ only)
- Use strong cipher suites (Mozilla Intermediate configuration)
NPM SSL Configuration:
# Custom Nginx Configuration (NPM Advanced tab)
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
ssl_prefer_server_ciphers on;
3.3 Internal Service SSL
Grafana HTTPS:
# /etc/grafana/grafana.ini
[server]
protocol = https
cert_file = /etc/grafana/certs/cert.pem
cert_key = /etc/grafana/certs/key.pem
Prometheus HTTPS:
# prometheus.yml
web:
tls_server_config:
cert_file: /etc/prometheus/certs/cert.pem
key_file: /etc/prometheus/certs/key.pem
4. Network Security
4.1 Network Segmentation
Current Architecture:
- Single flat network: 192.168.2.0/24
- All VMs and containers on same subnet
Recommended Segmentation:
Management VLAN (VLAN 10): 192.168.10.0/24
- Proxmox node (192.168.10.200)
- Ansible-Control (192.168.10.106)
Services VLAN (VLAN 20): 192.168.20.0/24
- Web servers (109, 110)
- Database server (111)
- Docker services
DMZ VLAN (VLAN 30): 192.168.30.0/24
- Nginx Proxy Manager (exposed to internet)
- Public-facing services
Monitoring VLAN (VLAN 40): 192.168.40.0/24
- Grafana, Prometheus, PVE Exporter
- Logging services
Implementation: Use Proxmox VLANs and firewall rules (Phase 4 remediation)
4.2 Firewall Rules
Proxmox Firewall Best Practices:
# Enable Proxmox firewall
pveum cluster firewall enable
# Default deny incoming
pveum cluster firewall rules add --action DROP --dir in
# Allow management access
pveum cluster firewall rules add --action ACCEPT --proto tcp --dport 8006 --source 192.168.2.0/24
# Allow SSH (key-based only)
pveum cluster firewall rules add --action ACCEPT --proto tcp --dport 22 --source 192.168.2.0/24
Docker Network Isolation:
# Create isolated networks per service
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No external access
services:
web:
networks:
- frontend
- backend
db:
networks:
- backend # Database not exposed to frontend
4.3 Rate Limiting & DDoS Protection
Current Gaps:
- No rate limiting on NPM proxies
- No fail2ban deployment
- No intrusion detection system (IDS)
NPM Rate Limiting:
# Custom Nginx Configuration (NPM)
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=web_limit:10m rate=100r/s;
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
}
location / {
limit_req zone=web_limit burst=50 nodelay;
}
Fail2ban Deployment (Phase 3 remediation):
# Install on NPM container or host
apt-get install fail2ban
# Configure jail for NPM
cat > /etc/fail2ban/jail.d/npm.conf << EOF
[npm]
enabled = true
port = http,https
filter = npm
logpath = /var/log/nginx/error.log
maxretry = 5
bantime = 3600
EOF
5. Access Control
5.1 Authentication
Multi-Factor Authentication (MFA):
- Proxmox: Enable 2FA via TOTP (Google Authenticator, Authy)
# Enable 2FA for user pveum user tfa <user@pam> <TFA-ID> - Portainer: Enable MFA in Portainer settings
- Grafana: Enable TOTP 2FA in user preferences
- NPM: No native MFA (use reverse proxy authentication)
SSO Integration:
- TinyAuth (CT 115) provides SSO for NetBox
- Extend to other services using OAuth2/OIDC (Phase 4)
5.2 Authorization
Principle of Least Privilege:
- Grant minimum required permissions
- Use role-based access control (RBAC) where available
- Regular access reviews (quarterly)
Proxmox Roles:
# Create limited user for monitoring
pveum user add monitor@pve
pveum acl modify / --user monitor@pve --role PVEAuditor
Docker/Portainer Roles:
- Admin: Full access to all stacks
- User: Access to specific stacks only
- Read-only: View-only access for monitoring
5.3 SSH Access
SSH Hardening:
# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
Port 22 # Consider non-standard port
AllowUsers jramos ansible-user
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2
SSH Key Management:
- Use ED25519 keys:
ssh-keygen -t ed25519 -C "your_email@example.com" - Rotate keys annually
- Store private keys securely (password manager, SSH agent)
- Distribute public keys via Ansible
6. Logging and Monitoring
6.1 Centralized Logging
Current State:
- Individual service logs:
docker compose logs - No centralized log aggregation
Recommended Stack (Phase 4):
- Loki: Log aggregation
- Promtail: Log shipping
- Grafana: Log visualization
Implementation:
# loki/docker-compose.yml
services:
loki:
image: grafana/loki:latest
ports:
- 3100:3100
volumes:
- ./loki-config.yml:/etc/loki/loki-config.yml
- loki-data:/loki
promtail:
image: grafana/promtail:latest
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail-config.yml:/etc/promtail/promtail-config.yml
6.2 Security Monitoring
Key Metrics to Monitor:
- Failed authentication attempts (Proxmox, SSH, services)
- Docker socket access events
- Privilege escalation attempts
- Network traffic anomalies
- Resource exhaustion (CPU, memory, disk)
Alerting Rules (Prometheus):
# alerts.yml
groups:
- name: security
rules:
- alert: HighFailedSSHLogins
expr: rate(ssh_failed_login_total[5m]) > 5
for: 5m
annotations:
summary: "High rate of failed SSH logins"
- alert: DockerSocketAccess
expr: increase(docker_socket_access_total[1h]) > 100
annotations:
summary: "Unusual Docker socket activity"
6.3 Audit Logging
Proxmox Audit Log:
# View Proxmox audit log
cat /var/log/pve/tasks/index
# Monitor in real-time
tail -f /var/log/pve/tasks/index
Docker Audit Logging:
# docker-compose.yml
services:
myapp:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
labels: "service,environment"
7. Backup and Recovery
7.1 Backup Strategy
Current Implementation:
- Proxmox Backup Server (PBS) at 28.27% utilization
- Automated daily incremental backups
- Weekly full backups
Backup Scope:
- All VMs and LXC containers
- Docker volumes (manual backup via scripts)
- Configuration files (version controlled in Git)
Backup Verification:
# Pre-remediation backup
/home/jramos/homelab/scripts/security/backup-before-remediation.sh
# Verify backup integrity
proxmox-backup-client list --repository <repo>
7.2 Encryption at Rest
Current Gaps (2025-12-20 audit):
- PBS backups not encrypted
- Docker volumes not encrypted
- Sensitive configuration files unencrypted
Remediation (Phase 4):
# Enable PBS encryption
proxmox-backup-client backup ... --encrypt
# LUKS encryption for sensitive volumes
cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted-volume
mkfs.ext4 /dev/mapper/encrypted-volume
7.3 Disaster Recovery
Recovery Time Objective (RTO): 4 hours Recovery Point Objective (RPO): 24 hours
Recovery Procedure:
- Assess Damage: Identify failed components
- Restore Infrastructure: Rebuild Proxmox node if needed
- Restore VMs/Containers: Use PBS restore
- Restore Data: Mount backup volumes
- Verify Functionality: Test all services
- Document Incident: Post-mortem in
/troubleshooting/
Recovery Testing: Quarterly DR drills
8. Vulnerability Management
8.1 Vulnerability Scanning
Container Scanning:
# Install Trivy
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install trivy
# Scan all running containers
docker ps --format '{{.Image}}' | xargs -I {} trivy image {}
# Scan docker-compose stack
trivy config docker-compose.yml
Host Scanning:
# Install OpenSCAP
apt-get install libopenscap8 openscap-scanner
# Run CIS benchmark scan
oscap xccdf eval --profile cis --results scan-results.xml /usr/share/xml/scap/ssg/content/ssg-ubuntu2004-xccdf.xml
8.2 Patch Management
Update Schedule:
- Proxmox VE: Monthly (during maintenance window)
- VMs/Containers: Bi-weekly (automated via Ansible)
- Docker Images: Monthly (CI/CD pipeline)
- Host OS: Weekly (security patches only)
Ansible Patch Playbook:
# playbooks/patch-systems.yml
- hosts: all
become: yes
tasks:
- name: Update apt cache
apt:
update_cache: yes
- name: Upgrade all packages
apt:
upgrade: dist
- name: Reboot if required
reboot:
msg: "Rebooting after patching"
when: reboot_required_file.stat.exists
8.3 Security Baseline Compliance
CIS Docker Benchmark:
- See audit report:
/home/jramos/homelab/troubleshooting/SECURITY_AUDIT_2025-12-20.md - Current compliance: ~40% (as of 2025-12-20)
- Target compliance: 80% (by Q1 2026)
NIST Cybersecurity Framework:
- Identify: Asset inventory (CLAUDE_STATUS.md)
- Protect: Access control, encryption (this document)
- Detect: Monitoring, logging (Grafana, Prometheus)
- Respond: Incident response plan (Section 9)
- Recover: Backup and DR (Section 7)
9. Incident Response
9.1 Incident Classification
| Severity | Definition | Examples |
|---|---|---|
| P1 - Critical | Service outage, data breach | Proxmox node failure, credential leak |
| P2 - High | Degraded service, security vulnerability | Single VM down, HIGH severity finding |
| P3 - Medium | Non-critical issue | SSL certificate expiry warning |
| P4 - Low | Informational, enhancement | Log rotation, optimization |
9.2 Response Procedure
Phase 1: Detection
- Monitor alerts from Grafana/Prometheus
- Review logs for anomalies
- User-reported issues
Phase 2: Containment
- Isolate affected systems (firewall rules, network disconnect)
- Preserve evidence (logs, disk images)
- Prevent spread (patch vulnerable services)
Phase 3: Eradication
- Remove malware/backdoors
- Patch vulnerabilities
- Reset compromised credentials
Phase 4: Recovery
- Restore from clean backups
- Verify service functionality
- Monitor for recurrence
Phase 5: Post-Incident
- Document incident in
/troubleshooting/ - Update security controls
- Conduct lessons learned review
9.3 Communication Plan
Internal Communication:
- Incident lead: jramos
- Status updates: CLAUDE_STATUS.md
- Documentation:
/troubleshooting/INCIDENT-YYYY-MM-DD.md
External Communication:
- For homelab: Not applicable (internal environment)
- For production: Define stakeholder notification procedure
10. Compliance and Auditing
10.1 Security Audits
Audit Schedule:
- Quarterly: Internal security review
- Annually: Comprehensive security audit
- Ad-hoc: After major infrastructure changes
Audit Scope:
- Credential management practices
- Docker security configuration
- SSL/TLS certificate status
- Access control policies
- Backup and recovery procedures
- Vulnerability scan results
Audit Documentation:
- Location:
/home/jramos/homelab/troubleshooting/SECURITY_AUDIT_*.md - Latest Audit: 2025-12-20 (31 findings)
- Next Audit: 2026-03-20 (Q1 2026)
10.2 Compliance Standards
Applicable Standards (for reference/practice):
- CIS Docker Benchmark v1.6.0
- NIST Cybersecurity Framework v1.1
- OWASP Top 10 (for web services)
- PCI-DSS v4.0 (if handling payment data - N/A for homelab)
Compliance Tracking:
- Checklist:
/home/jramos/homelab/templates/SECURITY_CHECKLIST.md - Status: CLAUDE_STATUS.md (Security Status section)
- Evidence:
/troubleshooting/and/scripts/security/
10.3 Documentation Requirements
Required Security Documentation:
- Security Policy (this document)
- Security Audit Reports (
/troubleshooting/SECURITY_AUDIT_*.md) - Pre-Deployment Security Checklist (
/templates/SECURITY_CHECKLIST.md) - Credential Rotation Procedures (
/scripts/security/*.sh) - Incident Response Plan (Section 9 of this document)
- Network Topology Diagram (TBD in Phase 4)
- Data Flow Diagrams (TBD in Phase 4)
- Risk Assessment Matrix (TBD in Q1 2026)
11. Security Checklists
Pre-Deployment Security Checklist
See comprehensive checklist: /home/jramos/homelab/templates/SECURITY_CHECKLIST.md
Quick Validation:
# Run quick security check
bash /home/jramos/homelab/templates/SECURITY_CHECKLIST.md#quick-validation-script
Quarterly Security Review Checklist
- Review and rotate all service credentials
- Scan all containers for vulnerabilities (Trivy)
- Update all Docker images to latest versions
- Review Proxmox audit logs for anomalies
- Verify backup integrity and test restore
- Review firewall rules and network ACLs
- Update SSL certificates (if manual)
- Review user access and permissions (RBAC)
- Patch Proxmox VE, VMs, and containers
- Update security documentation (this file)
- Conduct penetration testing (if applicable)
- Review and update incident response plan
12. Security Resources
Internal Documentation
- Security Audit Report:
/home/jramos/homelab/troubleshooting/SECURITY_AUDIT_2025-12-20.md - Security Scripts:
/home/jramos/homelab/scripts/security/ - Security Checklist:
/home/jramos/homelab/templates/SECURITY_CHECKLIST.md - Infrastructure Status:
/home/jramos/homelab/CLAUDE_STATUS.md - Service Documentation:
/home/jramos/homelab/services/README.md
External Resources
Docker Security:
Proxmox Security:
General Security:
Security Tools:
13. Change Log
| Date | Version | Changes | Author |
|---|---|---|---|
| 2025-12-20 | 1.0 | Initial security policy creation following comprehensive security audit | jramos / Claude Sonnet 4.5 |
Document Owner: jramos Review Frequency: Quarterly Next Review: 2026-03-20 Classification: Internal Use Repository: http://192.168.2.102:3060/jramos/homelab