Files
homelab/templates/SECURITY_CHECKLIST.md
Jordan Ramos e481c95da4 docs(security): comprehensive security audit and remediation documentation
- Add SECURITY.md policy with credential management, Docker security, SSL/TLS guidance
- Add security audit report (2025-12-20) with 31 findings across 4 severity levels
- Add pre-deployment security checklist template
- Update CLAUDE_STATUS.md with security audit initiative
- Expand services/README.md with comprehensive security sections
- Add script validation report and container name fix guide

Audit identified 6 CRITICAL, 3 HIGH, 2 MEDIUM findings
4-phase remediation roadmap created (estimated 6-13 min downtime)
All security scripts validated and ready for execution

Related: Security Audit Q4 2025, CRITICAL-001 through CRITICAL-006

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 13:52:34 -07:00

20 KiB

Security Pre-Deployment Checklist

Purpose: Ensure all new services and infrastructure components meet security standards before deployment to production.

Usage: Complete this checklist for every new service, VM, container, or infrastructure component. Archive completed checklists in /home/jramos/homelab/docs/deployment-records/.


Service Information

Field Value
Service Name
Deployment Type [ ] VM [ ] LXC Container [ ] Docker Container [ ] Bare Metal
Deployment Date
Owner
Purpose
Criticality [ ] Critical [ ] High [ ] Medium [ ] Low
Data Classification [ ] Public [ ] Internal [ ] Confidential [ ] Restricted

1. Authentication & Authorization

1.1 User Accounts

  • Default credentials changed (admin/admin, root/password, etc.)
  • Strong password policy enforced (minimum 16 characters)
  • Separate user accounts created (no shared credentials)
  • Root/administrator login disabled
  • Service accounts use principle of least privilege
  • User account list documented in /home/jramos/homelab/docs/accounts/

Default Credentials to Check:

Grafana:        admin / admin
NPM:            admin@example.com / changeme
Proxmox:        root / <install_password>
PostgreSQL:     postgres / postgres
TinyAuth:       (check .env file)
Portainer:      admin / <first_login>
n8n:            (set on first login)
Home Assistant: (set on first login)

1.2 Multi-Factor Authentication (MFA)

  • MFA enabled for administrative accounts
  • MFA method documented (TOTP, U2F, etc.)
  • Recovery codes generated and stored securely
  • MFA enforcement tested and verified

1.3 Single Sign-On (SSO)

  • SSO integration configured (if applicable via TinyAuth)
  • SSO tested with test account
  • Fallback authentication method configured
  • Direct IP access blocked (must go through SSO gateway)

1.4 SSH Access

  • Password authentication disabled
  • SSH key authentication only
  • SSH keys use passphrase protection
  • Root SSH login disabled (PermitRootLogin no)
  • SSH port changed from 22 (optional hardening)
  • SSH AllowUsers configured (whitelist approach)
  • SSH configuration validated (sshd -t)

SSH Hardening Verification:

# Verify configuration
grep -E "PermitRootLogin|PasswordAuthentication|AllowUsers" /etc/ssh/sshd_config

# Expected output:
# PermitRootLogin no
# PasswordAuthentication no
# AllowUsers jramos

2. Secrets Management

2.1 Credentials Storage

  • No hardcoded passwords in docker-compose.yaml
  • No secrets in environment variables (visible in docker inspect)
  • Secrets stored in .env files (excluded from git)
  • Docker secrets used for production deployments
  • .env files have restrictive permissions (600)
  • Secrets documented in password manager (Vault, Bitwarden, etc.)

2.2 API Keys & Tokens

  • API keys generated with minimal required permissions
  • API keys rotated regularly (document rotation schedule)
  • API key usage monitored in logs
  • Unused API keys revoked
  • API keys never logged or displayed in UI

2.3 Encryption Keys

  • Database encryption keys generated
  • TLS certificate private keys protected (600 permissions)
  • Encryption keys backed up securely
  • Key recovery procedure documented
  • LUKS encryption keys for volumes (if applicable)

2.4 JWT & Session Secrets

  • JWT secrets generated with cryptographic randomness
    openssl rand -base64 64
    
  • Session secrets rotated on schedule
  • JWT expiration configured (not indefinite)
  • Session timeout configured (30 minutes idle recommended)

Secret Generation Examples:

# PostgreSQL password
openssl rand -base64 32

# JWT secret
openssl rand -base64 64

# AES-256 encryption key
openssl rand -hex 32

# API token
uuidgen

3. Network Security

3.1 Port Exposure

  • Only required ports exposed to network
  • Unnecessary ports firewalled off
  • Port scan performed to verify (nmap -sS -sV <ip>)
  • Administrative ports not exposed to Internet
  • Database ports (5432, 3306, 27017) not publicly accessible

Port Exposure Rules:

Internet-facing:
  - 80 (HTTP - redirects to HTTPS)
  - 443 (HTTPS)

Internal-only:
  - 22 (SSH)
  - 8006 (Proxmox)
  - 9090 (Prometheus)
  - 3000 (Grafana)
  - 5432 (PostgreSQL)
  - All other services

3.2 Reverse Proxy Configuration

  • Service behind Nginx Proxy Manager (CT 102)
  • HTTPS configured with valid certificate
  • HTTP redirects to HTTPS (Force SSL enabled)
  • Direct IP access blocked (only accessible via proxy)
  • Proxy headers configured (X-Real-IP, X-Forwarded-For)

NPM Configuration Checklist:

Proxy Host Settings:
  ✓ Domain name configured
  ✓ Forward to internal IP:PORT
  ✓ Force SSL: Enabled
  ✓ HTTP/2 Support: Enabled
  ✓ HSTS Enabled: Yes
  ✓ HSTS Subdomains: Yes

SSL Settings:
  ✓ Let's Encrypt certificate requested
  ✓ Auto-renewal enabled
  ✓ Force SSL: Enabled

Advanced:
  ✓ Custom Nginx Configuration (security headers)
  ✓ Authentication (TinyAuth if applicable)

3.3 TLS/SSL Configuration

  • TLS 1.2 minimum (TLS 1.3 preferred)
  • Strong cipher suites only (no RC4, 3DES, MD5)
  • Certificate from trusted CA (Let's Encrypt)
  • Certificate expiration monitored
  • HSTS header configured (Strict-Transport-Security)
  • Certificate tested with SSL Labs (A+ rating)

TLS Testing:

# Test TLS configuration
testssl.sh https://service.apophisnetworking.net

# Or use SSL Labs
# https://www.ssllabs.com/ssltest/

3.4 Firewall Rules

  • Proxmox firewall enabled (if applicable)
  • VM/CT firewall enabled
  • iptables rules configured
  • Default deny policy for inbound traffic
  • Egress filtering configured (if applicable)
  • Firewall rules documented

Example iptables Rules:

# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow SSH from management network
iptables -A INPUT -p tcp -s 192.168.2.0/24 --dport 22 -j ACCEPT

# Allow service port from proxy only
iptables -A INPUT -p tcp -s 192.168.2.101 --dport 8080 -j ACCEPT

# Log dropped packets
iptables -A INPUT -j LOG --log-prefix "IPTABLES-DROP: "

# Save rules
iptables-save > /etc/iptables/rules.v4

3.5 Network Segmentation

  • Service deployed on appropriate VLAN (if VLANs implemented)
  • Database servers isolated from Internet-facing services
  • Management network separated from production
  • Docker networks isolated per service stack

VLAN Assignment (if applicable):

VLAN 10 - Management: Proxmox, Ansible-Control
VLAN 20 - DMZ: Web servers, reverse proxy
VLAN 30 - Internal: Databases, monitoring
VLAN 40 - IoT: Home Assistant, isolated devices

4. Container Security

4.1 Docker Image Security

  • Base image from trusted registry (Docker Hub official, ghcr.io)
  • Image pinned to specific version tag (not latest)
  • Image scanned for vulnerabilities (Trivy, Snyk)
  • No critical or high CVEs in image
  • Image layers reviewed for suspicious content
  • Multi-stage build used to minimize image size

Image Scanning:

# Scan image with Trivy
trivy image <image-name>:tag

# Only show HIGH and CRITICAL
trivy image --severity HIGH,CRITICAL <image-name>:tag

# Generate JSON report
trivy image --format json --output results.json <image-name>:tag

4.2 Container Runtime Security

  • Container runs as non-root user
    user: "1000:1000"  # Or named user
    
  • Read-only root filesystem (if applicable)
    read_only: true
    
  • No privileged mode (privileged: false)
  • Capabilities dropped to minimum required
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if needed
    
  • Security options configured
    security_opt:
      - no-new-privileges:true
      - apparmor=docker-default
    

4.3 Volume Mounts

  • No root filesystem mounts (/:/host)
  • Sensitive directories not mounted (/etc, /root, /home)
  • Docker socket not mounted (unless absolutely required)
    • If socket required, use docker-socket-proxy
  • Volume mounts use least privilege (read-only where possible)
    volumes:
      - ./config:/config:ro  # Read-only
    
  • Host paths documented and justified

Dangerous Volume Mounts to Avoid:

# NEVER DO THIS
volumes:
  - /:/srv  # Full filesystem access
  - /var/run/docker.sock:/var/run/docker.sock  # Root-equivalent
  - /etc:/host-etc  # System configuration access
  - /root:/root  # Root home directory

4.4 Resource Limits

  • Memory limits configured
    mem_limit: 512m
    mem_reservation: 256m
    
  • CPU limits configured
    cpus: '0.5'
    cpu_shares: 512
    
  • Restart policy configured appropriately
    restart: unless-stopped  # Recommended
    
  • Log limits configured (prevent disk exhaustion)
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    

4.5 Container Naming

  • Container name follows standard convention
    Format: <service>-<component>
    Example: paperless-webserver, monitoring-grafana
    
  • Container name documented in services README
  • Name does not conflict with existing containers

See: /home/jramos/homelab/scripts/security/CONTAINER_NAME_FIXES.md


5. Data Protection

5.1 Backup Configuration

  • Backup job configured in Proxmox Backup Server
  • Backup schedule documented (daily incremental + weekly full)
  • Backup retention policy configured
    Recommended:
    - Keep last 7 daily backups
    - Keep last 4 weekly backups
    - Keep last 6 monthly backups
    
  • Backup encryption enabled
  • Backup encryption key stored securely
  • Backup restoration tested successfully

Backup Job Configuration:

# Create backup job in Proxmox
# Storage: PBS-Backups
# Schedule: Daily at 0200
# Retention: 7 daily, 4 weekly, 6 monthly
# Compression: ZSTD
# Mode: Snapshot

5.2 Data Encryption

  • Data encrypted at rest (LUKS, ZFS encryption)
  • Database encryption enabled (if supported)
  • Application-level encryption configured (if available)
  • Encryption keys documented and backed up
  • Key rotation schedule documented

PostgreSQL Encryption (example):

-- Enable pgcrypto extension
CREATE EXTENSION pgcrypto;

-- Encrypt sensitive columns
UPDATE users SET ssn = pgp_sym_encrypt(ssn, 'encryption_key');

5.3 Data Retention

  • Data retention policy documented
  • PII data retention compliant with regulations (GDPR, CCPA)
  • Automated data purge scripts configured
  • User data deletion procedure documented
  • Log retention configured (default: 90 days)

5.4 Sensitive Data Handling

  • No PII in logs
  • Credit card data not stored (if applicable)
  • Health information protected (HIPAA compliance if applicable)
  • Passwords never logged
  • API responses sanitized before logging

6. Monitoring & Logging

6.1 Application Logging

  • Application logs configured
  • Log level set appropriately (INFO for production)
  • Logs forwarded to centralized logging (Loki)
  • Log format standardized (JSON preferred)
  • Sensitive data redacted from logs
  • Log rotation configured

Docker Logging Configuration:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"
    labels: "service,environment"

6.2 Security Event Logging

  • Failed authentication attempts logged
  • Privilege escalation logged
  • Configuration changes logged
  • File access logged (for sensitive data)
  • Security events forwarded to monitoring

Security Events to Log:

- Failed login attempts
- Successful privileged access (sudo, docker exec root)
- SSH key usage
- Configuration file modifications
- User account creation/deletion
- Permission changes
- Firewall rule modifications

6.3 Metrics Collection

  • Service added to Prometheus scrape targets
    # prometheus.yml
    scrape_configs:
      - job_name: 'new-service'
        static_configs:
          - targets: ['192.168.2.XXX:9090']
    
  • Service exposes metrics endpoint (if supported)
  • Grafana dashboard created for service
  • Alerting rules configured for service health

6.4 Alerting

  • Critical alerts configured (service down, high error rate)
  • Alert notification destination configured (email, Slack, etc.)
  • Alert escalation policy documented
  • Alert thresholds tested and validated

Example Alerting Rules:

# Service down alert
- alert: ServiceDown
  expr: up{job="new-service"} == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Service {{ $labels.instance }} is down"

# High error rate alert
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High error rate on {{ $labels.instance }}"

7. Application Security

7.1 Security Headers

  • Content-Security-Policy configured
  • X-Frame-Options: SAMEORIGIN
  • X-Content-Type-Options: nosniff
  • X-XSS-Protection: 1; mode=block
  • Strict-Transport-Security configured (HSTS)
  • Referrer-Policy: strict-origin-when-cross-origin
  • Permissions-Policy configured

NPM Custom Nginx Configuration:

add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline';" always;
add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;

Verification:

curl -I https://service.apophisnetworking.net | grep -E "X-Frame-Options|Content-Security-Policy|Strict-Transport-Security"

7.2 Input Validation

  • SQL injection protection (parameterized queries, ORM)
  • XSS protection (input sanitization, output encoding)
  • CSRF protection (tokens, SameSite cookies)
  • File upload validation (type, size, content)
  • Rate limiting configured (prevent brute force)

7.3 Session Management

  • Secure session cookies (Secure, HttpOnly, SameSite)
  • Session timeout configured (30 minutes recommended)
  • Session invalidation on logout
  • Concurrent session limits configured

7.4 API Security

  • API authentication required (API key, OAuth, JWT)
  • API rate limiting configured
  • API input validation
  • API versioning implemented
  • API documentation does not expose sensitive endpoints

8. Compliance & Documentation

8.1 Documentation

  • Service documented in /home/jramos/homelab/services/README.md
  • Configuration files added to git repository
  • Architecture diagram updated (if applicable)
  • Dependencies documented
  • Troubleshooting guide created

Documentation Requirements:

Required sections in services/README.md:
- Service name and purpose
- Port mappings
- Environment variables
- Volume mounts
- Dependencies
- Deployment instructions
- Troubleshooting common issues
- Maintenance procedures

8.2 Change Management

  • Change request created (if required)
  • Change approved by infrastructure owner
  • Rollback plan documented
  • Change window scheduled
  • Stakeholders notified

8.3 Compliance

  • GDPR compliance verified (if handling EU data)
  • HIPAA compliance verified (if handling health data)
  • PCI-DSS compliance verified (if handling payment data)
  • License compliance checked (open-source licenses)
  • Data residency requirements met

8.4 Asset Inventory

  • Service added to NetBox (CT 103) inventory
  • IP address documented in IPAM
  • Service owner recorded
  • Criticality level assigned
  • Support contacts documented

9. Testing & Validation

9.1 Functional Testing

  • Service starts successfully
  • Service accessible via configured URL
  • Authentication works correctly
  • Core functionality tested
  • Dependencies verified (database connection, etc.)

9.2 Security Testing

  • Port scan performed (no unexpected open ports)
  • Vulnerability scan performed (Trivy, Nessus)
  • Penetration test completed (if critical service)
  • SSL/TLS configuration tested (SSL Labs A+ rating)
  • Security headers verified

Security Testing Tools:

# Port scan
nmap -sS -sV 192.168.2.XXX

# Vulnerability scan
trivy image <image-name>

# SSL test
testssl.sh https://service.apophisnetworking.net

# Security headers
curl -I https://service.apophisnetworking.net

9.3 Performance Testing

  • Load testing performed (if applicable)
  • Resource usage monitored under load
  • Response time acceptable (<1s for web pages)
  • No memory leaks detected
  • Disk I/O acceptable

9.4 Disaster Recovery Testing

  • Backup restoration tested
  • Service recovery time measured (RTO)
  • Data loss measured (RPO)
  • Failover tested (if HA configured)

10. Operational Readiness

10.1 Monitoring Integration

  • Service health checks configured
  • Monitoring dashboard created
  • Alerts configured and tested
  • On-call rotation updated (if applicable)

10.2 Maintenance Plan

  • Update schedule documented (monthly, quarterly)
  • Maintenance window scheduled
  • Update procedure documented
  • Rollback procedure tested

10.3 Runbooks

  • Service start/stop procedure documented
  • Common troubleshooting steps documented
  • Incident response procedure documented
  • Escalation contacts documented

10.4 Access Control

  • User access provisioned
  • Admin access limited to authorized personnel
  • Access review schedule documented
  • Access revocation procedure documented

11. Final Review

11.1 Security Review

  • All CRITICAL findings addressed
  • All HIGH findings addressed
  • Medium findings have remediation plan
  • Security sign-off obtained

11.2 Stakeholder Approval

  • Infrastructure owner approval
  • Security team approval (if applicable)
  • Service owner approval
  • Documentation review complete

11.3 Go-Live Checklist

  • Production deployment scheduled
  • Rollback plan ready
  • Support team notified
  • Monitoring dashboard open
  • Incident response team on standby

11.4 Post-Deployment

  • Service confirmed operational
  • Monitoring confirms normal operations
  • No errors in logs
  • Performance metrics within acceptable range
  • Post-deployment review scheduled (1 week)

Approval Signatures

Role Name Date Signature
Service Owner
Security Reviewer
Infrastructure Owner

Deployment Record

Deployment Date: ________________

Deployment Method: [ ] Manual [ ] Ansible [ ] CI/CD

Deployment Status: [ ] Success [ ] Failed [ ] Rolled Back

Issues Encountered:

(Document any issues encountered during deployment)

Lessons Learned:

(Document lessons learned for future deployments)

Checklist Score

Total Items: 200+

Items Completed: ______ / ______

Completion Percentage: ______ %

Risk Level:

  • Low Risk (95-100% complete, all CRITICAL and HIGH items complete)
  • Medium Risk (85-94% complete, all CRITICAL items complete)
  • High Risk (70-84% complete, some CRITICAL items incomplete)
  • Unacceptable (<70% complete, deploy NOT approved)

Archive

After deployment, archive this completed checklist:

Location: /home/jramos/homelab/docs/deployment-records/<service-name>-<date>.md

Command:

cp SECURITY_CHECKLIST.md /home/jramos/homelab/docs/deployment-records/<service-name>-$(date +%Y%m%d).md

Template Version: 1.0 Last Updated: 2025-12-20 Maintained By: Infrastructure Security Team Review Frequency: Quarterly