Files
homelab/templates/SECURITY_CHECKLIST.md
Jordan Ramos e481c95da4 docs(security): comprehensive security audit and remediation documentation
- Add SECURITY.md policy with credential management, Docker security, SSL/TLS guidance
- Add security audit report (2025-12-20) with 31 findings across 4 severity levels
- Add pre-deployment security checklist template
- Update CLAUDE_STATUS.md with security audit initiative
- Expand services/README.md with comprehensive security sections
- Add script validation report and container name fix guide

Audit identified 6 CRITICAL, 3 HIGH, 2 MEDIUM findings
4-phase remediation roadmap created (estimated 6-13 min downtime)
All security scripts validated and ready for execution

Related: Security Audit Q4 2025, CRITICAL-001 through CRITICAL-006

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 13:52:34 -07:00

751 lines
20 KiB
Markdown

# Security Pre-Deployment Checklist
**Purpose**: Ensure all new services and infrastructure components meet security standards before deployment to production.
**Usage**: Complete this checklist for every new service, VM, container, or infrastructure component. Archive completed checklists in `/home/jramos/homelab/docs/deployment-records/`.
---
## Service Information
| Field | Value |
|-------|-------|
| **Service Name** | |
| **Deployment Type** | [ ] VM [ ] LXC Container [ ] Docker Container [ ] Bare Metal |
| **Deployment Date** | |
| **Owner** | |
| **Purpose** | |
| **Criticality** | [ ] Critical [ ] High [ ] Medium [ ] Low |
| **Data Classification** | [ ] Public [ ] Internal [ ] Confidential [ ] Restricted |
---
## 1. Authentication & Authorization
### 1.1 User Accounts
- [ ] Default credentials changed (admin/admin, root/password, etc.)
- [ ] Strong password policy enforced (minimum 16 characters)
- [ ] Separate user accounts created (no shared credentials)
- [ ] Root/administrator login disabled
- [ ] Service accounts use principle of least privilege
- [ ] User account list documented in `/home/jramos/homelab/docs/accounts/`
**Default Credentials to Check**:
```
Grafana: admin / admin
NPM: admin@example.com / changeme
Proxmox: root / <install_password>
PostgreSQL: postgres / postgres
TinyAuth: (check .env file)
Portainer: admin / <first_login>
n8n: (set on first login)
Home Assistant: (set on first login)
```
### 1.2 Multi-Factor Authentication (MFA)
- [ ] MFA enabled for administrative accounts
- [ ] MFA method documented (TOTP, U2F, etc.)
- [ ] Recovery codes generated and stored securely
- [ ] MFA enforcement tested and verified
### 1.3 Single Sign-On (SSO)
- [ ] SSO integration configured (if applicable via TinyAuth)
- [ ] SSO tested with test account
- [ ] Fallback authentication method configured
- [ ] Direct IP access blocked (must go through SSO gateway)
### 1.4 SSH Access
- [ ] Password authentication disabled
- [ ] SSH key authentication only
- [ ] SSH keys use passphrase protection
- [ ] Root SSH login disabled (`PermitRootLogin no`)
- [ ] SSH port changed from 22 (optional hardening)
- [ ] SSH AllowUsers configured (whitelist approach)
- [ ] SSH configuration validated (`sshd -t`)
**SSH Hardening Verification**:
```bash
# Verify configuration
grep -E "PermitRootLogin|PasswordAuthentication|AllowUsers" /etc/ssh/sshd_config
# Expected output:
# PermitRootLogin no
# PasswordAuthentication no
# AllowUsers jramos
```
---
## 2. Secrets Management
### 2.1 Credentials Storage
- [ ] No hardcoded passwords in docker-compose.yaml
- [ ] No secrets in environment variables (visible in `docker inspect`)
- [ ] Secrets stored in `.env` files (excluded from git)
- [ ] Docker secrets used for production deployments
- [ ] `.env` files have restrictive permissions (600)
- [ ] Secrets documented in password manager (Vault, Bitwarden, etc.)
### 2.2 API Keys & Tokens
- [ ] API keys generated with minimal required permissions
- [ ] API keys rotated regularly (document rotation schedule)
- [ ] API key usage monitored in logs
- [ ] Unused API keys revoked
- [ ] API keys never logged or displayed in UI
### 2.3 Encryption Keys
- [ ] Database encryption keys generated
- [ ] TLS certificate private keys protected (600 permissions)
- [ ] Encryption keys backed up securely
- [ ] Key recovery procedure documented
- [ ] LUKS encryption keys for volumes (if applicable)
### 2.4 JWT & Session Secrets
- [ ] JWT secrets generated with cryptographic randomness
```bash
openssl rand -base64 64
```
- [ ] Session secrets rotated on schedule
- [ ] JWT expiration configured (not indefinite)
- [ ] Session timeout configured (30 minutes idle recommended)
**Secret Generation Examples**:
```bash
# PostgreSQL password
openssl rand -base64 32
# JWT secret
openssl rand -base64 64
# AES-256 encryption key
openssl rand -hex 32
# API token
uuidgen
```
---
## 3. Network Security
### 3.1 Port Exposure
- [ ] Only required ports exposed to network
- [ ] Unnecessary ports firewalled off
- [ ] Port scan performed to verify (`nmap -sS -sV <ip>`)
- [ ] Administrative ports not exposed to Internet
- [ ] Database ports (5432, 3306, 27017) not publicly accessible
**Port Exposure Rules**:
```
Internet-facing:
- 80 (HTTP - redirects to HTTPS)
- 443 (HTTPS)
Internal-only:
- 22 (SSH)
- 8006 (Proxmox)
- 9090 (Prometheus)
- 3000 (Grafana)
- 5432 (PostgreSQL)
- All other services
```
### 3.2 Reverse Proxy Configuration
- [ ] Service behind Nginx Proxy Manager (CT 102)
- [ ] HTTPS configured with valid certificate
- [ ] HTTP redirects to HTTPS (`Force SSL` enabled)
- [ ] Direct IP access blocked (only accessible via proxy)
- [ ] Proxy headers configured (`X-Real-IP`, `X-Forwarded-For`)
**NPM Configuration Checklist**:
```
Proxy Host Settings:
✓ Domain name configured
✓ Forward to internal IP:PORT
✓ Force SSL: Enabled
✓ HTTP/2 Support: Enabled
✓ HSTS Enabled: Yes
✓ HSTS Subdomains: Yes
SSL Settings:
✓ Let's Encrypt certificate requested
✓ Auto-renewal enabled
✓ Force SSL: Enabled
Advanced:
✓ Custom Nginx Configuration (security headers)
✓ Authentication (TinyAuth if applicable)
```
### 3.3 TLS/SSL Configuration
- [ ] TLS 1.2 minimum (TLS 1.3 preferred)
- [ ] Strong cipher suites only (no RC4, 3DES, MD5)
- [ ] Certificate from trusted CA (Let's Encrypt)
- [ ] Certificate expiration monitored
- [ ] HSTS header configured (Strict-Transport-Security)
- [ ] Certificate tested with SSL Labs (A+ rating)
**TLS Testing**:
```bash
# Test TLS configuration
testssl.sh https://service.apophisnetworking.net
# Or use SSL Labs
# https://www.ssllabs.com/ssltest/
```
### 3.4 Firewall Rules
- [ ] Proxmox firewall enabled (if applicable)
- [ ] VM/CT firewall enabled
- [ ] iptables rules configured
- [ ] Default deny policy for inbound traffic
- [ ] Egress filtering configured (if applicable)
- [ ] Firewall rules documented
**Example iptables Rules**:
```bash
# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
# Allow SSH from management network
iptables -A INPUT -p tcp -s 192.168.2.0/24 --dport 22 -j ACCEPT
# Allow service port from proxy only
iptables -A INPUT -p tcp -s 192.168.2.101 --dport 8080 -j ACCEPT
# Log dropped packets
iptables -A INPUT -j LOG --log-prefix "IPTABLES-DROP: "
# Save rules
iptables-save > /etc/iptables/rules.v4
```
### 3.5 Network Segmentation
- [ ] Service deployed on appropriate VLAN (if VLANs implemented)
- [ ] Database servers isolated from Internet-facing services
- [ ] Management network separated from production
- [ ] Docker networks isolated per service stack
**VLAN Assignment** (if applicable):
```
VLAN 10 - Management: Proxmox, Ansible-Control
VLAN 20 - DMZ: Web servers, reverse proxy
VLAN 30 - Internal: Databases, monitoring
VLAN 40 - IoT: Home Assistant, isolated devices
```
---
## 4. Container Security
### 4.1 Docker Image Security
- [ ] Base image from trusted registry (Docker Hub official, ghcr.io)
- [ ] Image pinned to specific version tag (not `latest`)
- [ ] Image scanned for vulnerabilities (Trivy, Snyk)
- [ ] No critical or high CVEs in image
- [ ] Image layers reviewed for suspicious content
- [ ] Multi-stage build used to minimize image size
**Image Scanning**:
```bash
# Scan image with Trivy
trivy image <image-name>:tag
# Only show HIGH and CRITICAL
trivy image --severity HIGH,CRITICAL <image-name>:tag
# Generate JSON report
trivy image --format json --output results.json <image-name>:tag
```
### 4.2 Container Runtime Security
- [ ] Container runs as non-root user
```yaml
user: "1000:1000" # Or named user
```
- [ ] Read-only root filesystem (if applicable)
```yaml
read_only: true
```
- [ ] No privileged mode (`privileged: false`)
- [ ] Capabilities dropped to minimum required
```yaml
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if needed
```
- [ ] Security options configured
```yaml
security_opt:
- no-new-privileges:true
- apparmor=docker-default
```
### 4.3 Volume Mounts
- [ ] No root filesystem mounts (`/:/host`)
- [ ] Sensitive directories not mounted (`/etc`, `/root`, `/home`)
- [ ] Docker socket not mounted (unless absolutely required)
- [ ] If socket required, use docker-socket-proxy
- [ ] Volume mounts use least privilege (read-only where possible)
```yaml
volumes:
- ./config:/config:ro # Read-only
```
- [ ] Host paths documented and justified
**Dangerous Volume Mounts to Avoid**:
```yaml
# NEVER DO THIS
volumes:
- /:/srv # Full filesystem access
- /var/run/docker.sock:/var/run/docker.sock # Root-equivalent
- /etc:/host-etc # System configuration access
- /root:/root # Root home directory
```
### 4.4 Resource Limits
- [ ] Memory limits configured
```yaml
mem_limit: 512m
mem_reservation: 256m
```
- [ ] CPU limits configured
```yaml
cpus: '0.5'
cpu_shares: 512
```
- [ ] Restart policy configured appropriately
```yaml
restart: unless-stopped # Recommended
```
- [ ] Log limits configured (prevent disk exhaustion)
```yaml
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
```
### 4.5 Container Naming
- [ ] Container name follows standard convention
```
Format: <service>-<component>
Example: paperless-webserver, monitoring-grafana
```
- [ ] Container name documented in services README
- [ ] Name does not conflict with existing containers
**See**: `/home/jramos/homelab/scripts/security/CONTAINER_NAME_FIXES.md`
---
## 5. Data Protection
### 5.1 Backup Configuration
- [ ] Backup job configured in Proxmox Backup Server
- [ ] Backup schedule documented (daily incremental + weekly full)
- [ ] Backup retention policy configured
```
Recommended:
- Keep last 7 daily backups
- Keep last 4 weekly backups
- Keep last 6 monthly backups
```
- [ ] Backup encryption enabled
- [ ] Backup encryption key stored securely
- [ ] Backup restoration tested successfully
**Backup Job Configuration**:
```bash
# Create backup job in Proxmox
# Storage: PBS-Backups
# Schedule: Daily at 0200
# Retention: 7 daily, 4 weekly, 6 monthly
# Compression: ZSTD
# Mode: Snapshot
```
### 5.2 Data Encryption
- [ ] Data encrypted at rest (LUKS, ZFS encryption)
- [ ] Database encryption enabled (if supported)
- [ ] Application-level encryption configured (if available)
- [ ] Encryption keys documented and backed up
- [ ] Key rotation schedule documented
**PostgreSQL Encryption** (example):
```sql
-- Enable pgcrypto extension
CREATE EXTENSION pgcrypto;
-- Encrypt sensitive columns
UPDATE users SET ssn = pgp_sym_encrypt(ssn, 'encryption_key');
```
### 5.3 Data Retention
- [ ] Data retention policy documented
- [ ] PII data retention compliant with regulations (GDPR, CCPA)
- [ ] Automated data purge scripts configured
- [ ] User data deletion procedure documented
- [ ] Log retention configured (default: 90 days)
### 5.4 Sensitive Data Handling
- [ ] No PII in logs
- [ ] Credit card data not stored (if applicable)
- [ ] Health information protected (HIPAA compliance if applicable)
- [ ] Passwords never logged
- [ ] API responses sanitized before logging
---
## 6. Monitoring & Logging
### 6.1 Application Logging
- [ ] Application logs configured
- [ ] Log level set appropriately (INFO for production)
- [ ] Logs forwarded to centralized logging (Loki)
- [ ] Log format standardized (JSON preferred)
- [ ] Sensitive data redacted from logs
- [ ] Log rotation configured
**Docker Logging Configuration**:
```yaml
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
labels: "service,environment"
```
### 6.2 Security Event Logging
- [ ] Failed authentication attempts logged
- [ ] Privilege escalation logged
- [ ] Configuration changes logged
- [ ] File access logged (for sensitive data)
- [ ] Security events forwarded to monitoring
**Security Events to Log**:
```
- Failed login attempts
- Successful privileged access (sudo, docker exec root)
- SSH key usage
- Configuration file modifications
- User account creation/deletion
- Permission changes
- Firewall rule modifications
```
### 6.3 Metrics Collection
- [ ] Service added to Prometheus scrape targets
```yaml
# prometheus.yml
scrape_configs:
- job_name: 'new-service'
static_configs:
- targets: ['192.168.2.XXX:9090']
```
- [ ] Service exposes metrics endpoint (if supported)
- [ ] Grafana dashboard created for service
- [ ] Alerting rules configured for service health
### 6.4 Alerting
- [ ] Critical alerts configured (service down, high error rate)
- [ ] Alert notification destination configured (email, Slack, etc.)
- [ ] Alert escalation policy documented
- [ ] Alert thresholds tested and validated
**Example Alerting Rules**:
```yaml
# Service down alert
- alert: ServiceDown
expr: up{job="new-service"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.instance }} is down"
# High error rate alert
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 10m
labels:
severity: warning
annotations:
summary: "High error rate on {{ $labels.instance }}"
```
---
## 7. Application Security
### 7.1 Security Headers
- [ ] Content-Security-Policy configured
- [ ] X-Frame-Options: SAMEORIGIN
- [ ] X-Content-Type-Options: nosniff
- [ ] X-XSS-Protection: 1; mode=block
- [ ] Strict-Transport-Security configured (HSTS)
- [ ] Referrer-Policy: strict-origin-when-cross-origin
- [ ] Permissions-Policy configured
**NPM Custom Nginx Configuration**:
```nginx
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline';" always;
add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;
```
**Verification**:
```bash
curl -I https://service.apophisnetworking.net | grep -E "X-Frame-Options|Content-Security-Policy|Strict-Transport-Security"
```
### 7.2 Input Validation
- [ ] SQL injection protection (parameterized queries, ORM)
- [ ] XSS protection (input sanitization, output encoding)
- [ ] CSRF protection (tokens, SameSite cookies)
- [ ] File upload validation (type, size, content)
- [ ] Rate limiting configured (prevent brute force)
### 7.3 Session Management
- [ ] Secure session cookies (Secure, HttpOnly, SameSite)
- [ ] Session timeout configured (30 minutes recommended)
- [ ] Session invalidation on logout
- [ ] Concurrent session limits configured
### 7.4 API Security
- [ ] API authentication required (API key, OAuth, JWT)
- [ ] API rate limiting configured
- [ ] API input validation
- [ ] API versioning implemented
- [ ] API documentation does not expose sensitive endpoints
---
## 8. Compliance & Documentation
### 8.1 Documentation
- [ ] Service documented in `/home/jramos/homelab/services/README.md`
- [ ] Configuration files added to git repository
- [ ] Architecture diagram updated (if applicable)
- [ ] Dependencies documented
- [ ] Troubleshooting guide created
**Documentation Requirements**:
```markdown
Required sections in services/README.md:
- Service name and purpose
- Port mappings
- Environment variables
- Volume mounts
- Dependencies
- Deployment instructions
- Troubleshooting common issues
- Maintenance procedures
```
### 8.2 Change Management
- [ ] Change request created (if required)
- [ ] Change approved by infrastructure owner
- [ ] Rollback plan documented
- [ ] Change window scheduled
- [ ] Stakeholders notified
### 8.3 Compliance
- [ ] GDPR compliance verified (if handling EU data)
- [ ] HIPAA compliance verified (if handling health data)
- [ ] PCI-DSS compliance verified (if handling payment data)
- [ ] License compliance checked (open-source licenses)
- [ ] Data residency requirements met
### 8.4 Asset Inventory
- [ ] Service added to NetBox (CT 103) inventory
- [ ] IP address documented in IPAM
- [ ] Service owner recorded
- [ ] Criticality level assigned
- [ ] Support contacts documented
---
## 9. Testing & Validation
### 9.1 Functional Testing
- [ ] Service starts successfully
- [ ] Service accessible via configured URL
- [ ] Authentication works correctly
- [ ] Core functionality tested
- [ ] Dependencies verified (database connection, etc.)
### 9.2 Security Testing
- [ ] Port scan performed (no unexpected open ports)
- [ ] Vulnerability scan performed (Trivy, Nessus)
- [ ] Penetration test completed (if critical service)
- [ ] SSL/TLS configuration tested (SSL Labs A+ rating)
- [ ] Security headers verified
**Security Testing Tools**:
```bash
# Port scan
nmap -sS -sV 192.168.2.XXX
# Vulnerability scan
trivy image <image-name>
# SSL test
testssl.sh https://service.apophisnetworking.net
# Security headers
curl -I https://service.apophisnetworking.net
```
### 9.3 Performance Testing
- [ ] Load testing performed (if applicable)
- [ ] Resource usage monitored under load
- [ ] Response time acceptable (<1s for web pages)
- [ ] No memory leaks detected
- [ ] Disk I/O acceptable
### 9.4 Disaster Recovery Testing
- [ ] Backup restoration tested
- [ ] Service recovery time measured (RTO)
- [ ] Data loss measured (RPO)
- [ ] Failover tested (if HA configured)
---
## 10. Operational Readiness
### 10.1 Monitoring Integration
- [ ] Service health checks configured
- [ ] Monitoring dashboard created
- [ ] Alerts configured and tested
- [ ] On-call rotation updated (if applicable)
### 10.2 Maintenance Plan
- [ ] Update schedule documented (monthly, quarterly)
- [ ] Maintenance window scheduled
- [ ] Update procedure documented
- [ ] Rollback procedure tested
### 10.3 Runbooks
- [ ] Service start/stop procedure documented
- [ ] Common troubleshooting steps documented
- [ ] Incident response procedure documented
- [ ] Escalation contacts documented
### 10.4 Access Control
- [ ] User access provisioned
- [ ] Admin access limited to authorized personnel
- [ ] Access review schedule documented
- [ ] Access revocation procedure documented
---
## 11. Final Review
### 11.1 Security Review
- [ ] All CRITICAL findings addressed
- [ ] All HIGH findings addressed
- [ ] Medium findings have remediation plan
- [ ] Security sign-off obtained
### 11.2 Stakeholder Approval
- [ ] Infrastructure owner approval
- [ ] Security team approval (if applicable)
- [ ] Service owner approval
- [ ] Documentation review complete
### 11.3 Go-Live Checklist
- [ ] Production deployment scheduled
- [ ] Rollback plan ready
- [ ] Support team notified
- [ ] Monitoring dashboard open
- [ ] Incident response team on standby
### 11.4 Post-Deployment
- [ ] Service confirmed operational
- [ ] Monitoring confirms normal operations
- [ ] No errors in logs
- [ ] Performance metrics within acceptable range
- [ ] Post-deployment review scheduled (1 week)
---
## Approval Signatures
| Role | Name | Date | Signature |
|------|------|------|-----------|
| **Service Owner** | | | |
| **Security Reviewer** | | | |
| **Infrastructure Owner** | | | |
---
## Deployment Record
**Deployment Date**: ________________
**Deployment Method**: [ ] Manual [ ] Ansible [ ] CI/CD
**Deployment Status**: [ ] Success [ ] Failed [ ] Rolled Back
**Issues Encountered**:
```
(Document any issues encountered during deployment)
```
**Lessons Learned**:
```
(Document lessons learned for future deployments)
```
---
## Checklist Score
**Total Items**: 200+
**Items Completed**: ______ / ______
**Completion Percentage**: ______ %
**Risk Level**:
- [ ] Low Risk (95-100% complete, all CRITICAL and HIGH items complete)
- [ ] Medium Risk (85-94% complete, all CRITICAL items complete)
- [ ] High Risk (70-84% complete, some CRITICAL items incomplete)
- [ ] Unacceptable (<70% complete, deploy NOT approved)
---
## Archive
After deployment, archive this completed checklist:
**Location**: `/home/jramos/homelab/docs/deployment-records/<service-name>-<date>.md`
**Command**:
```bash
cp SECURITY_CHECKLIST.md /home/jramos/homelab/docs/deployment-records/<service-name>-$(date +%Y%m%d).md
```
---
**Template Version**: 1.0
**Last Updated**: 2025-12-20
**Maintained By**: Infrastructure Security Team
**Review Frequency**: Quarterly