Files

Jordan Ramos e481c95da4 docs(security): comprehensive security audit and remediation documentation

- Add SECURITY.md policy with credential management, Docker security, SSL/TLS guidance
- Add security audit report (2025-12-20) with 31 findings across 4 severity levels
- Add pre-deployment security checklist template
- Update CLAUDE_STATUS.md with security audit initiative
- Expand services/README.md with comprehensive security sections
- Add script validation report and container name fix guide

Audit identified 6 CRITICAL, 3 HIGH, 2 MEDIUM findings
4-phase remediation roadmap created (estimated 6-13 min downtime)
All security scripts validated and ready for execution

Related: Security Audit Q4 2025, CRITICAL-001 through CRITICAL-006

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-21 13:52:34 -07:00

20 KiB

Raw Blame History

Security Pre-Deployment Checklist

Purpose: Ensure all new services and infrastructure components meet security standards before deployment to production.

Usage: Complete this checklist for every new service, VM, container, or infrastructure component. Archive completed checklists in /home/jramos/homelab/docs/deployment-records/.

Service Information

Field	Value
Service Name
Deployment Type	[ ] VM [ ] LXC Container [ ] Docker Container [ ] Bare Metal
Deployment Date
Owner
Purpose
Criticality	[ ] Critical [ ] High [ ] Medium [ ] Low
Data Classification	[ ] Public [ ] Internal [ ] Confidential [ ] Restricted

1. Authentication & Authorization

1.1 User Accounts

Default credentials changed (admin/admin, root/password, etc.)
Strong password policy enforced (minimum 16 characters)
Separate user accounts created (no shared credentials)
Root/administrator login disabled
Service accounts use principle of least privilege
User account list documented in /home/jramos/homelab/docs/accounts/

Default Credentials to Check:

Grafana:        admin / admin
NPM:            admin@example.com / changeme
Proxmox:        root / <install_password>
PostgreSQL:     postgres / postgres
TinyAuth:       (check .env file)
Portainer:      admin / <first_login>
n8n:            (set on first login)
Home Assistant: (set on first login)

1.2 Multi-Factor Authentication (MFA)

MFA enabled for administrative accounts
MFA method documented (TOTP, U2F, etc.)
Recovery codes generated and stored securely
MFA enforcement tested and verified

1.3 Single Sign-On (SSO)

SSO integration configured (if applicable via TinyAuth)
SSO tested with test account
Fallback authentication method configured
Direct IP access blocked (must go through SSO gateway)

1.4 SSH Access

Password authentication disabled
SSH key authentication only
SSH keys use passphrase protection
Root SSH login disabled (PermitRootLogin no)
SSH port changed from 22 (optional hardening)
SSH AllowUsers configured (whitelist approach)
SSH configuration validated (sshd -t)

SSH Hardening Verification:

# Verify configuration
grep -E "PermitRootLogin|PasswordAuthentication|AllowUsers" /etc/ssh/sshd_config

# Expected output:
# PermitRootLogin no
# PasswordAuthentication no
# AllowUsers jramos

2. Secrets Management

2.1 Credentials Storage

No hardcoded passwords in docker-compose.yaml
No secrets in environment variables (visible in docker inspect)
Secrets stored in .env files (excluded from git)
Docker secrets used for production deployments
.env files have restrictive permissions (600)
Secrets documented in password manager (Vault, Bitwarden, etc.)

2.2 API Keys & Tokens

API keys generated with minimal required permissions
API keys rotated regularly (document rotation schedule)
API key usage monitored in logs
Unused API keys revoked
API keys never logged or displayed in UI

2.3 Encryption Keys

Database encryption keys generated
TLS certificate private keys protected (600 permissions)
Encryption keys backed up securely
Key recovery procedure documented
LUKS encryption keys for volumes (if applicable)

2.4 JWT & Session Secrets

JWT secrets generated with cryptographic randomness
```
openssl rand -base64 64
```
Session secrets rotated on schedule
JWT expiration configured (not indefinite)
Session timeout configured (30 minutes idle recommended)

Secret Generation Examples:

# PostgreSQL password
openssl rand -base64 32

# JWT secret
openssl rand -base64 64

# AES-256 encryption key
openssl rand -hex 32

# API token
uuidgen

3. Network Security

3.1 Port Exposure

Only required ports exposed to network
Unnecessary ports firewalled off
Port scan performed to verify (nmap -sS -sV <ip>)
Administrative ports not exposed to Internet
Database ports (5432, 3306, 27017) not publicly accessible

Port Exposure Rules:

Internet-facing:
  - 80 (HTTP - redirects to HTTPS)
  - 443 (HTTPS)

Internal-only:
  - 22 (SSH)
  - 8006 (Proxmox)
  - 9090 (Prometheus)
  - 3000 (Grafana)
  - 5432 (PostgreSQL)
  - All other services

3.2 Reverse Proxy Configuration

Service behind Nginx Proxy Manager (CT 102)
HTTPS configured with valid certificate
HTTP redirects to HTTPS (Force SSL enabled)
Direct IP access blocked (only accessible via proxy)
Proxy headers configured (X-Real-IP, X-Forwarded-For)

NPM Configuration Checklist:

Proxy Host Settings:
  ✓ Domain name configured
  ✓ Forward to internal IP:PORT
  ✓ Force SSL: Enabled
  ✓ HTTP/2 Support: Enabled
  ✓ HSTS Enabled: Yes
  ✓ HSTS Subdomains: Yes

SSL Settings:
  ✓ Let's Encrypt certificate requested
  ✓ Auto-renewal enabled
  ✓ Force SSL: Enabled

Advanced:
  ✓ Custom Nginx Configuration (security headers)
  ✓ Authentication (TinyAuth if applicable)

3.3 TLS/SSL Configuration

TLS 1.2 minimum (TLS 1.3 preferred)
Strong cipher suites only (no RC4, 3DES, MD5)
Certificate from trusted CA (Let's Encrypt)
Certificate expiration monitored
HSTS header configured (Strict-Transport-Security)
Certificate tested with SSL Labs (A+ rating)

TLS Testing:

# Test TLS configuration
testssl.sh https://service.apophisnetworking.net

# Or use SSL Labs
# https://www.ssllabs.com/ssltest/

3.4 Firewall Rules

Proxmox firewall enabled (if applicable)
VM/CT firewall enabled
iptables rules configured
Default deny policy for inbound traffic
Egress filtering configured (if applicable)
Firewall rules documented

Example iptables Rules:

# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow SSH from management network
iptables -A INPUT -p tcp -s 192.168.2.0/24 --dport 22 -j ACCEPT

# Allow service port from proxy only
iptables -A INPUT -p tcp -s 192.168.2.101 --dport 8080 -j ACCEPT

# Log dropped packets
iptables -A INPUT -j LOG --log-prefix "IPTABLES-DROP: "

# Save rules
iptables-save > /etc/iptables/rules.v4

3.5 Network Segmentation

Service deployed on appropriate VLAN (if VLANs implemented)
Database servers isolated from Internet-facing services
Management network separated from production
Docker networks isolated per service stack

VLAN Assignment (if applicable):

VLAN 10 - Management: Proxmox, Ansible-Control
VLAN 20 - DMZ: Web servers, reverse proxy
VLAN 30 - Internal: Databases, monitoring
VLAN 40 - IoT: Home Assistant, isolated devices

4. Container Security

4.1 Docker Image Security

Base image from trusted registry (Docker Hub official, ghcr.io)
Image pinned to specific version tag (not latest)
Image scanned for vulnerabilities (Trivy, Snyk)
No critical or high CVEs in image
Image layers reviewed for suspicious content
Multi-stage build used to minimize image size

Image Scanning:

# Scan image with Trivy
trivy image <image-name>:tag

# Only show HIGH and CRITICAL
trivy image --severity HIGH,CRITICAL <image-name>:tag

# Generate JSON report
trivy image --format json --output results.json <image-name>:tag

4.2 Container Runtime Security

Container runs as non-root user
```
user: "1000:1000"  # Or named user
```
Read-only root filesystem (if applicable)
```
read_only: true
```
No privileged mode (privileged: false)

Capabilities dropped to minimum required

cap_drop:
  - ALL
cap_add:
  - NET_BIND_SERVICE  # Only if needed

Security options configured

security_opt:
  - no-new-privileges:true
  - apparmor=docker-default

4.3 Volume Mounts

No root filesystem mounts (/:/host)
Sensitive directories not mounted (/etc, /root, /home)
Docker socket not mounted (unless absolutely required)
- If socket required, use docker-socket-proxy
Volume mounts use least privilege (read-only where possible)
```
volumes:
  - ./config:/config:ro  # Read-only
```
Host paths documented and justified

Dangerous Volume Mounts to Avoid:

# NEVER DO THIS
volumes:
  - /:/srv  # Full filesystem access
  - /var/run/docker.sock:/var/run/docker.sock  # Root-equivalent
  - /etc:/host-etc  # System configuration access
  - /root:/root  # Root home directory

4.4 Resource Limits

Memory limits configured
```
mem_limit: 512m
mem_reservation: 256m
```
CPU limits configured
```
cpus: '0.5'
cpu_shares: 512
```
Restart policy configured appropriately
```
restart: unless-stopped  # Recommended
```

Log limits configured (prevent disk exhaustion)

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

4.5 Container Naming

Container name follows standard convention

Format: <service>-<component>
Example: paperless-webserver, monitoring-grafana

Container name documented in services README
Name does not conflict with existing containers

See: /home/jramos/homelab/scripts/security/CONTAINER_NAME_FIXES.md

5. Data Protection

5.1 Backup Configuration

Backup job configured in Proxmox Backup Server
Backup schedule documented (daily incremental + weekly full)

Backup retention policy configured

Recommended:
- Keep last 7 daily backups
- Keep last 4 weekly backups
- Keep last 6 monthly backups

Backup encryption enabled
Backup encryption key stored securely
Backup restoration tested successfully

Backup Job Configuration:

# Create backup job in Proxmox
# Storage: PBS-Backups
# Schedule: Daily at 0200
# Retention: 7 daily, 4 weekly, 6 monthly
# Compression: ZSTD
# Mode: Snapshot

5.2 Data Encryption

Data encrypted at rest (LUKS, ZFS encryption)
Database encryption enabled (if supported)
Application-level encryption configured (if available)
Encryption keys documented and backed up
Key rotation schedule documented

PostgreSQL Encryption (example):

-- Enable pgcrypto extension
CREATE EXTENSION pgcrypto;

-- Encrypt sensitive columns
UPDATE users SET ssn = pgp_sym_encrypt(ssn, 'encryption_key');

5.3 Data Retention

Data retention policy documented
PII data retention compliant with regulations (GDPR, CCPA)
Automated data purge scripts configured
User data deletion procedure documented
Log retention configured (default: 90 days)

5.4 Sensitive Data Handling

No PII in logs
Credit card data not stored (if applicable)
Health information protected (HIPAA compliance if applicable)
Passwords never logged
API responses sanitized before logging

6. Monitoring & Logging

6.1 Application Logging

Application logs configured
Log level set appropriately (INFO for production)
Logs forwarded to centralized logging (Loki)
Log format standardized (JSON preferred)
Sensitive data redacted from logs
Log rotation configured

Docker Logging Configuration:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"
    labels: "service,environment"

6.2 Security Event Logging

Failed authentication attempts logged
Privilege escalation logged
Configuration changes logged
File access logged (for sensitive data)
Security events forwarded to monitoring

Security Events to Log:

- Failed login attempts
- Successful privileged access (sudo, docker exec root)
- SSH key usage
- Configuration file modifications
- User account creation/deletion
- Permission changes
- Firewall rule modifications

6.3 Metrics Collection

Service added to Prometheus scrape targets

# prometheus.yml
scrape_configs:
  - job_name: 'new-service'
    static_configs:
      - targets: ['192.168.2.XXX:9090']

Service exposes metrics endpoint (if supported)
Grafana dashboard created for service
Alerting rules configured for service health

6.4 Alerting

Critical alerts configured (service down, high error rate)
Alert notification destination configured (email, Slack, etc.)
Alert escalation policy documented
Alert thresholds tested and validated

Example Alerting Rules:

# Service down alert
- alert: ServiceDown
  expr: up{job="new-service"} == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Service {{ $labels.instance }} is down"

# High error rate alert
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High error rate on {{ $labels.instance }}"

7. Application Security

7.1 Security Headers

Content-Security-Policy configured
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Strict-Transport-Security configured (HSTS)
Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy configured

NPM Custom Nginx Configuration:

add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline';" always;
add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;

Verification:

curl -I https://service.apophisnetworking.net | grep -E "X-Frame-Options|Content-Security-Policy|Strict-Transport-Security"

7.2 Input Validation

SQL injection protection (parameterized queries, ORM)
XSS protection (input sanitization, output encoding)
CSRF protection (tokens, SameSite cookies)
File upload validation (type, size, content)
Rate limiting configured (prevent brute force)

7.3 Session Management

Secure session cookies (Secure, HttpOnly, SameSite)
Session timeout configured (30 minutes recommended)
Session invalidation on logout
Concurrent session limits configured

7.4 API Security

API authentication required (API key, OAuth, JWT)
API rate limiting configured
API input validation
API versioning implemented
API documentation does not expose sensitive endpoints

8. Compliance & Documentation

8.1 Documentation

Service documented in /home/jramos/homelab/services/README.md
Configuration files added to git repository
Architecture diagram updated (if applicable)
Dependencies documented
Troubleshooting guide created

Documentation Requirements:

Required sections in services/README.md:
- Service name and purpose
- Port mappings
- Environment variables
- Volume mounts
- Dependencies
- Deployment instructions
- Troubleshooting common issues
- Maintenance procedures

8.2 Change Management

Change request created (if required)
Change approved by infrastructure owner
Rollback plan documented
Change window scheduled
Stakeholders notified

8.3 Compliance

GDPR compliance verified (if handling EU data)
HIPAA compliance verified (if handling health data)
PCI-DSS compliance verified (if handling payment data)
License compliance checked (open-source licenses)
Data residency requirements met

8.4 Asset Inventory

Service added to NetBox (CT 103) inventory
IP address documented in IPAM
Service owner recorded
Criticality level assigned
Support contacts documented

9. Testing & Validation

9.1 Functional Testing

Service starts successfully
Service accessible via configured URL
Authentication works correctly
Core functionality tested
Dependencies verified (database connection, etc.)

9.2 Security Testing

Port scan performed (no unexpected open ports)
Vulnerability scan performed (Trivy, Nessus)
Penetration test completed (if critical service)
SSL/TLS configuration tested (SSL Labs A+ rating)
Security headers verified

Security Testing Tools:

# Port scan
nmap -sS -sV 192.168.2.XXX

# Vulnerability scan
trivy image <image-name>

# SSL test
testssl.sh https://service.apophisnetworking.net

# Security headers
curl -I https://service.apophisnetworking.net

9.3 Performance Testing

Load testing performed (if applicable)
Resource usage monitored under load
Response time acceptable (<1s for web pages)
No memory leaks detected
Disk I/O acceptable

9.4 Disaster Recovery Testing

Backup restoration tested
Service recovery time measured (RTO)
Data loss measured (RPO)
Failover tested (if HA configured)

10. Operational Readiness

10.1 Monitoring Integration

Service health checks configured
Monitoring dashboard created
Alerts configured and tested
On-call rotation updated (if applicable)

10.2 Maintenance Plan

Update schedule documented (monthly, quarterly)
Maintenance window scheduled
Update procedure documented
Rollback procedure tested

10.3 Runbooks

Service start/stop procedure documented
Common troubleshooting steps documented
Incident response procedure documented
Escalation contacts documented

10.4 Access Control

User access provisioned
Admin access limited to authorized personnel
Access review schedule documented
Access revocation procedure documented

11. Final Review

11.1 Security Review

All CRITICAL findings addressed
All HIGH findings addressed
Medium findings have remediation plan
Security sign-off obtained

11.2 Stakeholder Approval

Infrastructure owner approval
Security team approval (if applicable)
Service owner approval
Documentation review complete

11.3 Go-Live Checklist

Production deployment scheduled
Rollback plan ready
Support team notified
Monitoring dashboard open
Incident response team on standby

11.4 Post-Deployment

Service confirmed operational
Monitoring confirms normal operations
No errors in logs
Performance metrics within acceptable range
Post-deployment review scheduled (1 week)

Approval Signatures

Role	Name	Date	Signature
Service Owner
Security Reviewer
Infrastructure Owner

Deployment Record

Deployment Date: ________________

Deployment Method: [ ] Manual [ ] Ansible [ ] CI/CD

Deployment Status: [ ] Success [ ] Failed [ ] Rolled Back

Issues Encountered:

(Document any issues encountered during deployment)

Lessons Learned:

(Document lessons learned for future deployments)

Checklist Score

Total Items: 200+

Items Completed: ______ / ______

Completion Percentage: ______ %

Risk Level:

Low Risk (95-100% complete, all CRITICAL and HIGH items complete)
Medium Risk (85-94% complete, all CRITICAL items complete)
High Risk (70-84% complete, some CRITICAL items incomplete)
Unacceptable (<70% complete, deploy NOT approved)

20 KiB Raw Blame History