homelab/templates/SECURITY_CHECKLIST.md

# Security Pre-Deployment Checklist

**Purpose**: Ensure all new services and infrastructure components meet security standards before deployment to production.

**Usage**: Complete this checklist for every new service, VM, container, or infrastructure component. Archive completed checklists in `/home/jramos/homelab/docs/deployment-records/`.

---

## Service Information

| Field | Value |
|-------|-------|
| **Service Name** | |
| **Deployment Type** | [ ] VM [ ] LXC Container [ ] Docker Container [ ] Bare Metal |
| **Deployment Date** | |
| **Owner** | |
| **Purpose** | |
| **Criticality** | [ ] Critical [ ] High [ ] Medium [ ] Low |
| **Data Classification** | [ ] Public [ ] Internal [ ] Confidential [ ] Restricted |

---

## 1. Authentication & Authorization

### 1.1 User Accounts
- [ ] Default credentials changed (admin/admin, root/password, etc.)
- [ ] Strong password policy enforced (minimum 16 characters)
- [ ] Separate user accounts created (no shared credentials)
- [ ] Root/administrator login disabled
- [ ] Service accounts use principle of least privilege
- [ ] User account list documented in `/home/jramos/homelab/docs/accounts/`

**Default Credentials to Check**:
```
Grafana:        admin / admin
NPM:            admin@example.com / changeme
Proxmox:        root / <install_password>
PostgreSQL:     postgres / postgres
TinyAuth:       (check .env file)
Portainer:      admin / <first_login>
n8n:            (set on first login)
Home Assistant: (set on first login)
```

### 1.2 Multi-Factor Authentication (MFA)
- [ ] MFA enabled for administrative accounts
- [ ] MFA method documented (TOTP, U2F, etc.)
- [ ] Recovery codes generated and stored securely
- [ ] MFA enforcement tested and verified

### 1.3 Single Sign-On (SSO)
- [ ] SSO integration configured (if applicable via TinyAuth)
- [ ] SSO tested with test account
- [ ] Fallback authentication method configured
- [ ] Direct IP access blocked (must go through SSO gateway)

### 1.4 SSH Access
- [ ] Password authentication disabled
- [ ] SSH key authentication only
- [ ] SSH keys use passphrase protection
- [ ] Root SSH login disabled (`PermitRootLogin no`)
- [ ] SSH port changed from 22 (optional hardening)
- [ ] SSH AllowUsers configured (whitelist approach)
- [ ] SSH configuration validated (`sshd -t`)

**SSH Hardening Verification**:
```bash
# Verify configuration
grep -E "PermitRootLogin|PasswordAuthentication|AllowUsers" /etc/ssh/sshd_config

# Expected output:
# PermitRootLogin no
# PasswordAuthentication no
# AllowUsers jramos
```

---

## 2. Secrets Management

### 2.1 Credentials Storage
- [ ] No hardcoded passwords in docker-compose.yaml
- [ ] No secrets in environment variables (visible in `docker inspect`)
- [ ] Secrets stored in `.env` files (excluded from git)
- [ ] Docker secrets used for production deployments
- [ ] `.env` files have restrictive permissions (600)
- [ ] Secrets documented in password manager (Vault, Bitwarden, etc.)

### 2.2 API Keys & Tokens
- [ ] API keys generated with minimal required permissions
- [ ] API keys rotated regularly (document rotation schedule)
- [ ] API key usage monitored in logs
- [ ] Unused API keys revoked
- [ ] API keys never logged or displayed in UI

### 2.3 Encryption Keys
- [ ] Database encryption keys generated
- [ ] TLS certificate private keys protected (600 permissions)
- [ ] Encryption keys backed up securely
- [ ] Key recovery procedure documented
- [ ] LUKS encryption keys for volumes (if applicable)

### 2.4 JWT & Session Secrets
- [ ] JWT secrets generated with cryptographic randomness
  ```bash
  openssl rand -base64 64
  ```
- [ ] Session secrets rotated on schedule
- [ ] JWT expiration configured (not indefinite)
- [ ] Session timeout configured (30 minutes idle recommended)

**Secret Generation Examples**:
```bash
# PostgreSQL password
openssl rand -base64 32

# JWT secret
openssl rand -base64 64

# AES-256 encryption key
openssl rand -hex 32

# API token
uuidgen
```

---

## 3. Network Security

### 3.1 Port Exposure
- [ ] Only required ports exposed to network
- [ ] Unnecessary ports firewalled off
- [ ] Port scan performed to verify (`nmap -sS -sV <ip>`)
- [ ] Administrative ports not exposed to Internet
- [ ] Database ports (5432, 3306, 27017) not publicly accessible

**Port Exposure Rules**:
```
Internet-facing:
  - 80 (HTTP - redirects to HTTPS)
  - 443 (HTTPS)

Internal-only:
  - 22 (SSH)
  - 8006 (Proxmox)
  - 9090 (Prometheus)
  - 3000 (Grafana)
  - 5432 (PostgreSQL)
  - All other services
```

### 3.2 Reverse Proxy Configuration
- [ ] Service behind Nginx Proxy Manager (CT 102)
- [ ] HTTPS configured with valid certificate
- [ ] HTTP redirects to HTTPS (`Force SSL` enabled)
- [ ] Direct IP access blocked (only accessible via proxy)
- [ ] Proxy headers configured (`X-Real-IP`, `X-Forwarded-For`)

**NPM Configuration Checklist**:
```
Proxy Host Settings:
  ✓ Domain name configured
  ✓ Forward to internal IP:PORT
  ✓ Force SSL: Enabled
  ✓ HTTP/2 Support: Enabled
  ✓ HSTS Enabled: Yes
  ✓ HSTS Subdomains: Yes

SSL Settings:
  ✓ Let's Encrypt certificate requested
  ✓ Auto-renewal enabled
  ✓ Force SSL: Enabled

Advanced:
  ✓ Custom Nginx Configuration (security headers)
  ✓ Authentication (TinyAuth if applicable)
```

### 3.3 TLS/SSL Configuration
- [ ] TLS 1.2 minimum (TLS 1.3 preferred)
- [ ] Strong cipher suites only (no RC4, 3DES, MD5)
- [ ] Certificate from trusted CA (Let's Encrypt)
- [ ] Certificate expiration monitored
- [ ] HSTS header configured (Strict-Transport-Security)
- [ ] Certificate tested with SSL Labs (A+ rating)

**TLS Testing**:
```bash
# Test TLS configuration
testssl.sh https://service.apophisnetworking.net

# Or use SSL Labs
# https://www.ssllabs.com/ssltest/
```

### 3.4 Firewall Rules
- [ ] Proxmox firewall enabled (if applicable)
- [ ] VM/CT firewall enabled
- [ ] iptables rules configured
- [ ] Default deny policy for inbound traffic
- [ ] Egress filtering configured (if applicable)
- [ ] Firewall rules documented

**Example iptables Rules**:
```bash
# Default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow SSH from management network
iptables -A INPUT -p tcp -s 192.168.2.0/24 --dport 22 -j ACCEPT

# Allow service port from proxy only
iptables -A INPUT -p tcp -s 192.168.2.101 --dport 8080 -j ACCEPT

# Log dropped packets
iptables -A INPUT -j LOG --log-prefix "IPTABLES-DROP: "

# Save rules
iptables-save > /etc/iptables/rules.v4
```

### 3.5 Network Segmentation
- [ ] Service deployed on appropriate VLAN (if VLANs implemented)
- [ ] Database servers isolated from Internet-facing services
- [ ] Management network separated from production
- [ ] Docker networks isolated per service stack

**VLAN Assignment** (if applicable):
```
VLAN 10 - Management: Proxmox, Ansible-Control
VLAN 20 - DMZ: Web servers, reverse proxy
VLAN 30 - Internal: Databases, monitoring
VLAN 40 - IoT: Home Assistant, isolated devices
```

---

## 4. Container Security

### 4.1 Docker Image Security
- [ ] Base image from trusted registry (Docker Hub official, ghcr.io)
- [ ] Image pinned to specific version tag (not `latest`)
- [ ] Image scanned for vulnerabilities (Trivy, Snyk)
- [ ] No critical or high CVEs in image
- [ ] Image layers reviewed for suspicious content
- [ ] Multi-stage build used to minimize image size

**Image Scanning**:
```bash
# Scan image with Trivy
trivy image <image-name>:tag

# Only show HIGH and CRITICAL
trivy image --severity HIGH,CRITICAL <image-name>:tag

# Generate JSON report
trivy image --format json --output results.json <image-name>:tag
```

### 4.2 Container Runtime Security
- [ ] Container runs as non-root user
  ```yaml
  user: "1000:1000"  # Or named user
  ```
- [ ] Read-only root filesystem (if applicable)
  ```yaml
  read_only: true
  ```
- [ ] No privileged mode (`privileged: false`)
- [ ] Capabilities dropped to minimum required
  ```yaml
  cap_drop:
    - ALL
  cap_add:
    - NET_BIND_SERVICE  # Only if needed
  ```
- [ ] Security options configured
  ```yaml
  security_opt:
    - no-new-privileges:true
    - apparmor=docker-default
  ```

### 4.3 Volume Mounts
- [ ] No root filesystem mounts (`/:/host`)
- [ ] Sensitive directories not mounted (`/etc`, `/root`, `/home`)
- [ ] Docker socket not mounted (unless absolutely required)
  - [ ] If socket required, use docker-socket-proxy
- [ ] Volume mounts use least privilege (read-only where possible)
  ```yaml
  volumes:
    - ./config:/config:ro  # Read-only
  ```
- [ ] Host paths documented and justified

**Dangerous Volume Mounts to Avoid**:
```yaml
# NEVER DO THIS
volumes:
  - /:/srv  # Full filesystem access
  - /var/run/docker.sock:/var/run/docker.sock  # Root-equivalent
  - /etc:/host-etc  # System configuration access
  - /root:/root  # Root home directory
```

### 4.4 Resource Limits
- [ ] Memory limits configured
  ```yaml
  mem_limit: 512m
  mem_reservation: 256m
  ```
- [ ] CPU limits configured
  ```yaml
  cpus: '0.5'
  cpu_shares: 512
  ```
- [ ] Restart policy configured appropriately
  ```yaml
  restart: unless-stopped  # Recommended
  ```
- [ ] Log limits configured (prevent disk exhaustion)
  ```yaml
  logging:
    driver: "json-file"
    options:
      max-size: "10m"
      max-file: "3"
  ```

### 4.5 Container Naming
- [ ] Container name follows standard convention
  ```
  Format: <service>-<component>
  Example: paperless-webserver, monitoring-grafana
  ```
- [ ] Container name documented in services README
- [ ] Name does not conflict with existing containers

**See**: `/home/jramos/homelab/scripts/security/CONTAINER_NAME_FIXES.md`

---

## 5. Data Protection

### 5.1 Backup Configuration
- [ ] Backup job configured in Proxmox Backup Server
- [ ] Backup schedule documented (daily incremental + weekly full)
- [ ] Backup retention policy configured
  ```
  Recommended:
  - Keep last 7 daily backups
  - Keep last 4 weekly backups
  - Keep last 6 monthly backups
  ```
- [ ] Backup encryption enabled
- [ ] Backup encryption key stored securely
- [ ] Backup restoration tested successfully

**Backup Job Configuration**:
```bash
# Create backup job in Proxmox
# Storage: PBS-Backups
# Schedule: Daily at 0200
# Retention: 7 daily, 4 weekly, 6 monthly
# Compression: ZSTD
# Mode: Snapshot
```

### 5.2 Data Encryption
- [ ] Data encrypted at rest (LUKS, ZFS encryption)
- [ ] Database encryption enabled (if supported)
- [ ] Application-level encryption configured (if available)
- [ ] Encryption keys documented and backed up
- [ ] Key rotation schedule documented

**PostgreSQL Encryption** (example):
```sql
-- Enable pgcrypto extension
CREATE EXTENSION pgcrypto;

-- Encrypt sensitive columns
UPDATE users SET ssn = pgp_sym_encrypt(ssn, 'encryption_key');
```

### 5.3 Data Retention
- [ ] Data retention policy documented
- [ ] PII data retention compliant with regulations (GDPR, CCPA)
- [ ] Automated data purge scripts configured
- [ ] User data deletion procedure documented
- [ ] Log retention configured (default: 90 days)

### 5.4 Sensitive Data Handling
- [ ] No PII in logs
- [ ] Credit card data not stored (if applicable)
- [ ] Health information protected (HIPAA compliance if applicable)
- [ ] Passwords never logged
- [ ] API responses sanitized before logging

---

## 6. Monitoring & Logging

### 6.1 Application Logging
- [ ] Application logs configured
- [ ] Log level set appropriately (INFO for production)
- [ ] Logs forwarded to centralized logging (Loki)
- [ ] Log format standardized (JSON preferred)
- [ ] Sensitive data redacted from logs
- [ ] Log rotation configured

**Docker Logging Configuration**:
```yaml
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"
    labels: "service,environment"
```

### 6.2 Security Event Logging
- [ ] Failed authentication attempts logged
- [ ] Privilege escalation logged
- [ ] Configuration changes logged
- [ ] File access logged (for sensitive data)
- [ ] Security events forwarded to monitoring

**Security Events to Log**:
```
- Failed login attempts
- Successful privileged access (sudo, docker exec root)
- SSH key usage
- Configuration file modifications
- User account creation/deletion
- Permission changes
- Firewall rule modifications
```

### 6.3 Metrics Collection
- [ ] Service added to Prometheus scrape targets
  ```yaml
  # prometheus.yml
  scrape_configs:
    - job_name: 'new-service'
      static_configs:
        - targets: ['192.168.2.XXX:9090']
  ```
- [ ] Service exposes metrics endpoint (if supported)
- [ ] Grafana dashboard created for service
- [ ] Alerting rules configured for service health

### 6.4 Alerting
- [ ] Critical alerts configured (service down, high error rate)
- [ ] Alert notification destination configured (email, Slack, etc.)
- [ ] Alert escalation policy documented
- [ ] Alert thresholds tested and validated

**Example Alerting Rules**:
```yaml
# Service down alert
- alert: ServiceDown
  expr: up{job="new-service"} == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Service {{ $labels.instance }} is down"

# High error rate alert
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High error rate on {{ $labels.instance }}"
```

---

## 7. Application Security

### 7.1 Security Headers
- [ ] Content-Security-Policy configured
- [ ] X-Frame-Options: SAMEORIGIN
- [ ] X-Content-Type-Options: nosniff
- [ ] X-XSS-Protection: 1; mode=block
- [ ] Strict-Transport-Security configured (HSTS)
- [ ] Referrer-Policy: strict-origin-when-cross-origin
- [ ] Permissions-Policy configured

**NPM Custom Nginx Configuration**:
```nginx
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline';" always;
add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;
```

**Verification**:
```bash
curl -I https://service.apophisnetworking.net | grep -E "X-Frame-Options|Content-Security-Policy|Strict-Transport-Security"
```

### 7.2 Input Validation
- [ ] SQL injection protection (parameterized queries, ORM)
- [ ] XSS protection (input sanitization, output encoding)
- [ ] CSRF protection (tokens, SameSite cookies)
- [ ] File upload validation (type, size, content)
- [ ] Rate limiting configured (prevent brute force)

### 7.3 Session Management
- [ ] Secure session cookies (Secure, HttpOnly, SameSite)
- [ ] Session timeout configured (30 minutes recommended)
- [ ] Session invalidation on logout
- [ ] Concurrent session limits configured

### 7.4 API Security
- [ ] API authentication required (API key, OAuth, JWT)
- [ ] API rate limiting configured
- [ ] API input validation
- [ ] API versioning implemented
- [ ] API documentation does not expose sensitive endpoints

---

## 8. Compliance & Documentation

### 8.1 Documentation
- [ ] Service documented in `/home/jramos/homelab/services/README.md`
- [ ] Configuration files added to git repository
- [ ] Architecture diagram updated (if applicable)
- [ ] Dependencies documented
- [ ] Troubleshooting guide created

**Documentation Requirements**:
```markdown
Required sections in services/README.md:
- Service name and purpose
- Port mappings
- Environment variables
- Volume mounts
- Dependencies
- Deployment instructions
- Troubleshooting common issues
- Maintenance procedures
```

### 8.2 Change Management
- [ ] Change request created (if required)
- [ ] Change approved by infrastructure owner
- [ ] Rollback plan documented
- [ ] Change window scheduled
- [ ] Stakeholders notified

### 8.3 Compliance
- [ ] GDPR compliance verified (if handling EU data)
- [ ] HIPAA compliance verified (if handling health data)
- [ ] PCI-DSS compliance verified (if handling payment data)
- [ ] License compliance checked (open-source licenses)
- [ ] Data residency requirements met

### 8.4 Asset Inventory
- [ ] Service added to NetBox (CT 103) inventory
- [ ] IP address documented in IPAM
- [ ] Service owner recorded
- [ ] Criticality level assigned
- [ ] Support contacts documented

---

## 9. Testing & Validation

### 9.1 Functional Testing
- [ ] Service starts successfully
- [ ] Service accessible via configured URL
- [ ] Authentication works correctly
- [ ] Core functionality tested
- [ ] Dependencies verified (database connection, etc.)

### 9.2 Security Testing
- [ ] Port scan performed (no unexpected open ports)
- [ ] Vulnerability scan performed (Trivy, Nessus)
- [ ] Penetration test completed (if critical service)
- [ ] SSL/TLS configuration tested (SSL Labs A+ rating)
- [ ] Security headers verified

**Security Testing Tools**:
```bash
# Port scan
nmap -sS -sV 192.168.2.XXX

# Vulnerability scan
trivy image <image-name>

# SSL test
testssl.sh https://service.apophisnetworking.net

# Security headers
curl -I https://service.apophisnetworking.net
```

### 9.3 Performance Testing
- [ ] Load testing performed (if applicable)
- [ ] Resource usage monitored under load
- [ ] Response time acceptable (<1s for web pages)
- [ ] No memory leaks detected
- [ ] Disk I/O acceptable

### 9.4 Disaster Recovery Testing
- [ ] Backup restoration tested
- [ ] Service recovery time measured (RTO)
- [ ] Data loss measured (RPO)
- [ ] Failover tested (if HA configured)

---

## 10. Operational Readiness

### 10.1 Monitoring Integration
- [ ] Service health checks configured
- [ ] Monitoring dashboard created
- [ ] Alerts configured and tested
- [ ] On-call rotation updated (if applicable)

### 10.2 Maintenance Plan
- [ ] Update schedule documented (monthly, quarterly)
- [ ] Maintenance window scheduled
- [ ] Update procedure documented
- [ ] Rollback procedure tested

### 10.3 Runbooks
- [ ] Service start/stop procedure documented
- [ ] Common troubleshooting steps documented
- [ ] Incident response procedure documented
- [ ] Escalation contacts documented

### 10.4 Access Control
- [ ] User access provisioned
- [ ] Admin access limited to authorized personnel
- [ ] Access review schedule documented
- [ ] Access revocation procedure documented

---

## 11. Final Review

### 11.1 Security Review
- [ ] All CRITICAL findings addressed
- [ ] All HIGH findings addressed
- [ ] Medium findings have remediation plan
- [ ] Security sign-off obtained

### 11.2 Stakeholder Approval
- [ ] Infrastructure owner approval
- [ ] Security team approval (if applicable)
- [ ] Service owner approval
- [ ] Documentation review complete

### 11.3 Go-Live Checklist
- [ ] Production deployment scheduled
- [ ] Rollback plan ready
- [ ] Support team notified
- [ ] Monitoring dashboard open
- [ ] Incident response team on standby

### 11.4 Post-Deployment
- [ ] Service confirmed operational
- [ ] Monitoring confirms normal operations
- [ ] No errors in logs
- [ ] Performance metrics within acceptable range
- [ ] Post-deployment review scheduled (1 week)

---

## Approval Signatures

| Role | Name | Date | Signature |
|------|------|------|-----------|
| **Service Owner** | | | |
| **Security Reviewer** | | | |
| **Infrastructure Owner** | | | |

---

## Deployment Record

**Deployment Date**: ________________

**Deployment Method**: [ ] Manual [ ] Ansible [ ] CI/CD

**Deployment Status**: [ ] Success [ ] Failed [ ] Rolled Back

**Issues Encountered**:
```
(Document any issues encountered during deployment)
```

**Lessons Learned**:
```
(Document lessons learned for future deployments)
```

---

## Checklist Score

**Total Items**: 200+

**Items Completed**: ______ / ______

**Completion Percentage**: ______ %

**Risk Level**:
- [ ] Low Risk (95-100% complete, all CRITICAL and HIGH items complete)
- [ ] Medium Risk (85-94% complete, all CRITICAL items complete)
- [ ] High Risk (70-84% complete, some CRITICAL items incomplete)
- [ ] Unacceptable (<70% complete, deploy NOT approved)

---

## Archive

After deployment, archive this completed checklist:

**Location**: `/home/jramos/homelab/docs/deployment-records/<service-name>-<date>.md`

**Command**:
```bash
cp SECURITY_CHECKLIST.md /home/jramos/homelab/docs/deployment-records/<service-name>-$(date +%Y%m%d).md
```

---

**Template Version**: 1.0
**Last Updated**: 2025-12-20
**Maintained By**: Infrastructure Security Team
**Review Frequency**: Quarterly