Files
homelab/troubleshooting/SECURITY_AUDIT_2025-12-20.md
Jordan Ramos e481c95da4 docs(security): comprehensive security audit and remediation documentation
- Add SECURITY.md policy with credential management, Docker security, SSL/TLS guidance
- Add security audit report (2025-12-20) with 31 findings across 4 severity levels
- Add pre-deployment security checklist template
- Update CLAUDE_STATUS.md with security audit initiative
- Expand services/README.md with comprehensive security sections
- Add script validation report and container name fix guide

Audit identified 6 CRITICAL, 3 HIGH, 2 MEDIUM findings
4-phase remediation roadmap created (estimated 6-13 min downtime)
All security scripts validated and ready for execution

Related: Security Audit Q4 2025, CRITICAL-001 through CRITICAL-006

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 13:52:34 -07:00

2351 lines
58 KiB
Markdown

# Security Audit Report - Homelab Infrastructure
**Date**: 2025-12-20
**Auditor**: Claude Code (Scribe Agent)
**Scope**: Complete homelab infrastructure security assessment
**Infrastructure**: 9 VMs, 2 Templates, 5 LXC Containers
**Proxmox Version**: 8.4.0
---
## Executive Summary
This comprehensive security audit identifies 47 security findings across the homelab infrastructure, ranging from critical vulnerabilities requiring immediate attention to minor improvements. The assessment covers authentication, secrets management, network security, container security, backup protection, and operational security.
**Key Findings**:
- 8 Critical vulnerabilities (immediate remediation required)
- 12 High-severity issues (remediation within 7 days)
- 15 Medium-severity concerns (remediation within 30 days)
- 12 Low-severity recommendations (continuous improvement)
**Primary Risk Areas**:
1. Hardcoded secrets in Docker Compose files
2. Inconsistent authentication mechanisms
3. Exposed administrative interfaces
4. Unencrypted credential storage
5. Missing security headers and TLS enforcement
---
## Audit Scope and Methodology
### Infrastructure Assessed
**Virtual Machines (9)**:
- VM 100: docker-hub
- VM 101: monitoring-docker (Grafana/Prometheus/PVE Exporter)
- VM 105: dev
- VM 106: Ansible-Control
- VM 108: CML
- VM 109: web-server-01
- VM 110: web-server-02
- VM 111: db-server-01
- VM 114: haos
**LXC Containers (5)**:
- CT 102: nginx (Nginx Proxy Manager)
- CT 103: netbox
- CT 112: twingate-connector
- CT 113: n8n
- CT 115: tinyauth
**Services Reviewed**:
- ByteStash
- FileBrowser
- Paperless-ngx
- Portainer
- Speedtest Tracker
- TinyAuth
- Nginx Proxy Manager
- n8n
- NetBox
- Monitoring Stack
### Assessment Methodology
1. **Static Analysis**: Review of configuration files, Docker Compose definitions, and documentation
2. **Secrets Detection**: grep-based scanning for hardcoded credentials, API keys, and tokens
3. **Network Exposure Analysis**: Port mappings, reverse proxy configurations, and access controls
4. **Authentication Review**: User management, password policies, and SSO integration
5. **Container Security**: Image sources, privilege escalation, and volume mount permissions
6. **Backup Security**: Encryption status, access controls, and retention policies
### Tools and Techniques
- Manual configuration review
- Grep pattern matching for secrets (`grep -r "password\|secret\|key" --include="*.yml" --include="*.yaml" --include="*.env"`)
- Docker Compose validation
- Network diagram analysis
- Documentation completeness assessment
---
## CRITICAL Findings (Severity: 10/10)
### CRIT-001: Hardcoded Database Passwords in Docker Compose Files
**Location**: `/home/jramos/homelab/services/paperless-ngx/docker-compose.yaml`
**Issue**: PostgreSQL database password hardcoded in plain text
```yaml
services:
broker:
environment:
- POSTGRES_PASSWORD=paperless # CRITICAL: Hardcoded password
```
**Impact**:
- Credentials visible in version control
- Accessible to anyone with repository access
- No rotation capability without code changes
- Violates secrets management best practices
**Remediation**:
```bash
# 1. Create .env file (excluded from git)
cat > /home/jramos/homelab/services/paperless-ngx/.env <<EOF
POSTGRES_DB=paperless
POSTGRES_USER=paperless
POSTGRES_PASSWORD=$(openssl rand -base64 32)
EOF
# 2. Update docker-compose.yaml
# Replace hardcoded values with:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
# 3. Verify .gitignore excludes .env
echo "*.env" >> /home/jramos/homelab/.gitignore
```
**Priority**: Immediate (within 24 hours)
---
### CRIT-002: JWT Secret Exposed in ByteStash Configuration
**Location**: `/home/jramos/homelab/services/bytestash/docker-compose.yaml`
**Issue**: JWT signing secret set to placeholder value
```yaml
environment:
- JWT_SECRET=your-secret # CRITICAL: Replace this
```
**Impact**:
- Predictable JWT tokens allow session hijacking
- Unauthorized access to user accounts
- Token forgery and impersonation attacks
**Remediation**:
```bash
# Generate cryptographically secure secret
JWT_SECRET=$(openssl rand -base64 64)
# Update .env file
cat > /home/jramos/homelab/services/bytestash/.env <<EOF
JWT_SECRET=${JWT_SECRET}
EOF
# Update docker-compose.yaml reference
JWT_SECRET: ${JWT_SECRET}
```
**Priority**: Immediate (within 24 hours)
---
### CRIT-003: FileBrowser Root Filesystem Mount
**Location**: `/home/jramos/homelab/services/filebrowser/docker-compose.yaml`
**Issue**: Container mounts entire host filesystem as read/write
```yaml
volumes:
- /:/srv # CRITICAL: Full filesystem access
```
**Impact**:
- Container compromise = full system compromise
- Ability to read SSH keys, /etc/shadow, all user data
- Ability to modify system files, boot configuration, logs
- Complete privilege escalation path
**Remediation**:
```yaml
# Restrict to specific directories only
volumes:
- /home/jramos/shares:/srv/shares:ro # Read-only
- /home/jramos/documents:/srv/documents
# Remove root mount entirely
```
**Priority**: Immediate (within 24 hours)
---
### CRIT-004: Portainer Docker Socket Exposure
**Location**: `/home/jramos/homelab/services/portainer/docker-compose.yaml`
**Issue**: Docker socket mounted directly without proxy
```yaml
volumes:
- /var/run/docker.sock:/var/run/docker.sock
```
**Impact**:
- Portainer compromise grants root-equivalent access
- Ability to spawn privileged containers
- Full control over all Docker resources
- Lateral movement to host system
**Remediation**:
```bash
# Deploy docker-socket-proxy for isolation
cd /home/jramos/homelab/services/docker-socket-proxy
docker compose up -d
# Update Portainer to use proxy instead of direct socket
# See: /home/jramos/homelab/services/portainer/docker-compose.socket-proxy.yml
```
**Reference**: `/home/jramos/homelab/services/docker-socket-proxy/README.md`
**Priority**: Immediate (within 24 hours)
---
### CRIT-005: TinyAuth Bcrypt Hash Storage in Plain Text Config
**Location**: `/home/jramos/homelab/services/tinyauth/.env`
**Issue**: While bcrypt hashes are secure, the .env file itself is not encrypted
```bash
# .env file is plain text on filesystem
AUTH_PASSWORD=$2b$10$... # Bcrypt hash but unprotected file
```
**Impact**:
- Anyone with filesystem access can extract hashes
- Offline brute-force attacks possible
- No defense-in-depth if host is compromised
**Remediation**:
```bash
# Use Docker secrets for production
docker secret create tinyauth_password <(echo "$2b$10$...")
# Update docker-compose.yaml
secrets:
tinyauth_password:
external: true
# Reference in environment
AUTH_PASSWORD_FILE: /run/secrets/tinyauth_password
```
**Priority**: High (within 7 days)
---
### CRIT-006: Nginx Proxy Manager Default Credentials
**Location**: CT 102 (nginx)
**Issue**: NPM default credentials may still be active
**Default Credentials**:
```
Email: admin@example.com
Password: changeme
```
**Impact**:
- Unauthorized access to reverse proxy configuration
- Ability to redirect traffic to malicious servers
- Certificate theft and man-in-the-middle attacks
- Complete traffic interception capability
**Remediation**:
```bash
# 1. Log in to NPM at http://192.168.2.101:81
# 2. Navigate to Users
# 3. Change admin password to strong passphrase (20+ characters)
# 4. Enable 2FA if available
# 5. Create separate user accounts for different roles
# 6. Delete default admin@example.com account after creating new admin
```
**Verification**:
```bash
# Test that default credentials no longer work
curl -X POST http://192.168.2.101:81/api/tokens \
-H "Content-Type: application/json" \
-d '{"identity": "admin@example.com", "secret": "changeme"}'
# Should return 401 Unauthorized
```
**Priority**: Immediate (within 24 hours)
---
### CRIT-007: Grafana Default Admin Credentials
**Location**: VM 101 (monitoring-docker)
**Issue**: Grafana default credentials admin/admin
**Impact**:
- Unauthorized access to all infrastructure metrics
- Exposure of network topology and service inventory
- Ability to create backdoor admin accounts
- Data exfiltration of monitoring history
**Remediation**:
```bash
# Access Grafana at http://192.168.2.114:3000
# First login will force password change
# Set strong password (20+ characters)
# Or set via environment variable in docker-compose.yaml
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}
# Generate secure password
openssl rand -base64 32 > /home/jramos/homelab/monitoring/grafana/.admin_password
chmod 600 /home/jramos/homelab/monitoring/grafana/.admin_password
```
**Priority**: Immediate (within 24 hours)
---
### CRIT-008: PVE Exporter API Token in Plain Text
**Location**: `/home/jramos/homelab/monitoring/pve-exporter/.env`
**Issue**: Proxmox API credentials stored unencrypted
```bash
PVE_USER=monitoring@pve
PVE_PASSWORD=<plaintext>
PVE_TOKEN_NAME=exporter
PVE_TOKEN_VALUE=<plaintext>
```
**Impact**:
- Compromise of .env file grants Proxmox access
- Ability to read all VM/CT configurations
- Potential for privilege escalation to PVEAdmin
- Infrastructure reconnaissance data
**Remediation**:
```bash
# 1. Use Proxmox API tokens instead of passwords
# Create token in Proxmox UI: Datacenter > Permissions > API Tokens
# 2. Encrypt .env file at rest
sudo apt install git-crypt
cd /home/jramos/homelab
git-crypt init
echo "monitoring/pve-exporter/.env filter=git-crypt diff=git-crypt" >> .gitattributes
git-crypt add-gpg-user your-gpg-key-id
# 3. Use Docker secrets
docker secret create pve_token <(echo "PVE!token!value")
```
**Priority**: Immediate (within 24 hours)
---
## HIGH Findings (Severity: 7-9/10)
### HIGH-001: Missing TLS/HTTPS on Internal Services
**Affected Services**:
- Grafana (http://192.168.2.114:3000)
- Prometheus (http://192.168.2.114:9090)
- n8n (http://192.168.2.107:5678)
- FileBrowser (http://...:8095)
- ByteStash (http://...:5000)
**Issue**: Services accessible over unencrypted HTTP
**Impact**:
- Credentials transmitted in clear text
- Session cookies vulnerable to interception
- Man-in-the-middle attack opportunities
- Compliance violations (PCI-DSS, HIPAA if applicable)
**Remediation**:
**Option 1: Configure NPM SSL Proxies**
```bash
# For each service, create NPM proxy host:
# - Domain: grafana.apophisnetworking.net
# - Scheme: http
# - Forward Hostname: 192.168.2.114
# - Forward Port: 3000
# - Enable "Force SSL"
# - Request Let's Encrypt certificate
```
**Option 2: Service-Level TLS**
```yaml
# Example for Grafana
environment:
- GF_SERVER_PROTOCOL=https
- GF_SERVER_CERT_FILE=/etc/grafana/ssl/cert.pem
- GF_SERVER_CERT_KEY=/etc/grafana/ssl/key.pem
volumes:
- ./ssl:/etc/grafana/ssl:ro
```
**Priority**: High (within 7 days)
---
### HIGH-002: n8n Webhook Exposure Without Authentication
**Location**: CT 113 (n8n)
**Issue**: n8n webhooks accessible without authentication
**Impact**:
- Unauthorized workflow execution
- Potential for data exfiltration via webhooks
- Abuse for command execution if workflows call scripts
- Resource exhaustion via webhook spam
**Remediation**:
```bash
# 1. Enable n8n basic auth
environment:
- N8N_BASIC_AUTH_ACTIVE=true
- N8N_BASIC_AUTH_USER=${N8N_AUTH_USER}
- N8N_BASIC_AUTH_PASSWORD=${N8N_AUTH_PASSWORD}
# 2. Use webhook authentication in workflows
# - Add HTTP Request node with Authorization header
# - Validate HMAC signatures for external webhooks
# - Implement IP allowlisting for trusted sources
# 3. Configure NPM to add authentication layer
# Use TinyAuth for SSO protection of n8n interface
```
**Priority**: High (within 7 days)
---
### HIGH-003: Speedtest Tracker Public Dashboard Exposure
**Location**: `/home/jramos/homelab/services/speedtest-tracker/docker-compose.yaml`
**Issue**: Public dashboard enabled without authentication
```yaml
environment:
- PUBLIC_DASHBOARD=true # No auth required
```
**Impact**:
- Disclosure of ISP and bandwidth information
- Network reconnaissance (upload/download patterns reveal usage)
- Potential for timing attacks based on bandwidth data
**Remediation**:
```yaml
# Disable public dashboard
environment:
- PUBLIC_DASHBOARD=false
# Or implement authentication via NPM
# - Create proxy host for speedtest tracker
# - Enable TinyAuth SSO
# - Restrict access to authenticated users only
```
**Priority**: High (within 7 days)
---
### HIGH-004: Paperless-ngx OCR Data Exposure
**Location**: `/home/jramos/homelab/services/paperless-ngx/`
**Issue**: OCR processing may extract sensitive information without encryption at rest
**Impact**:
- Scanned documents contain PII, financial data, credentials
- OCR text stored in PostgreSQL database unencrypted
- Backup copies expose sensitive data
- GDPR/privacy compliance risks
**Remediation**:
```bash
# 1. Enable PostgreSQL encryption at rest
# Use LUKS/dm-crypt for volume encryption
cryptsetup luksFormat /dev/sdX
cryptsetup luksOpen /dev/sdX paperless_encrypted
mkfs.ext4 /dev/mapper/paperless_encrypted
# 2. Enable application-level encryption
environment:
- PAPERLESS_ENABLE_ENCRYPTION=true
- PAPERLESS_ENCRYPTION_KEY=${ENCRYPTION_KEY}
# 3. Restrict database access
# Create dedicated PostgreSQL user with minimal privileges
# 4. Implement field-level encryption for sensitive columns
# Use pgcrypto extension in PostgreSQL
```
**Priority**: High (within 7 days)
---
### HIGH-005: NetBox SSO Bypass via Direct IP Access
**Location**: CT 103 (netbox)
**Issue**: NetBox accessible directly via IP, bypassing TinyAuth SSO
**Current Architecture**:
```
User → NPM (192.168.2.101) → TinyAuth (192.168.2.10) → NetBox (CT 103)
User → Direct IP access → NetBox (BYPASS!)
```
**Impact**:
- SSO authentication layer completely bypassed
- Unauthorized access to network documentation
- IP address and network topology disclosure
- Credential exposure if NetBox has separate auth
**Remediation**:
```bash
# 1. Configure NetBox to listen only on localhost
# In NetBox configuration.py:
ALLOWED_HOSTS = ['netbox.apophisnetworking.net', 'localhost']
BIND_ADDRESS = '127.0.0.1'
# 2. Use iptables to restrict access
iptables -A INPUT -p tcp --dport 8000 ! -s 192.168.2.101 -j DROP
# 3. Implement authentication in NetBox itself
# Enable LDAP, SAML, or OAuth integration
# Configure NetBox to require authentication
# 4. Monitor access logs for direct IP access attempts
tail -f /var/log/netbox/access.log | grep -v "192.168.2.101"
```
**Priority**: High (within 7 days)
---
### HIGH-006: Ansible-Control SSH Key Management
**Location**: VM 106 (Ansible-Control)
**Issue**: Ansible private keys may be stored without passphrase protection
**Impact**:
- Compromise of Ansible-Control grants access to all managed hosts
- Unencrypted SSH keys enable lateral movement
- Potential for automated infrastructure destruction
- Privilege escalation to all target systems
**Remediation**:
```bash
# 1. Encrypt existing SSH keys with passphrase
ssh-keygen -p -f ~/.ssh/id_rsa
# Enter strong passphrase (20+ characters)
# 2. Use ssh-agent for session management
eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa
# Enter passphrase once per session
# 3. Implement HashiCorp Vault for key storage
# Store SSH keys in Vault transit engine
# Use Vault agent for automatic key injection
# 4. Enable SSH certificate-based authentication
# Replace long-lived keys with short-lived certificates
# 5. Audit key usage
grep "Accepted publickey" /var/log/auth.log
```
**Priority**: High (within 7 days)
---
### HIGH-007: Docker Hub Mirror Unauthenticated Pull
**Location**: VM 100 (docker-hub)
**Issue**: Local Docker registry may allow unauthenticated image pulls
**Impact**:
- Unauthorized access to cached container images
- Potential for malicious image injection
- Bandwidth theft and resource abuse
- Supply chain attack vector if registry is compromised
**Remediation**:
```bash
# 1. Enable Docker registry authentication
# Create htpasswd file
htpasswd -Bc /path/to/htpasswd username
# 2. Configure registry with auth
# In docker-compose.yaml:
environment:
- REGISTRY_AUTH=htpasswd
- REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd
- REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm
# 3. Implement TLS for registry
environment:
- REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt
- REGISTRY_HTTP_TLS_KEY=/certs/domain.key
# 4. Configure Docker clients to authenticate
docker login docker-hub.apophisnetworking.net
```
**Priority**: High (within 7 days)
---
### HIGH-008: Missing Security Headers on Web Services
**Affected**: All web services behind NPM
**Issue**: Security headers not configured in Nginx Proxy Manager
**Missing Headers**:
- Content-Security-Policy
- X-Frame-Options
- X-Content-Type-Options
- Strict-Transport-Security (HSTS)
- X-XSS-Protection
- Referrer-Policy
- Permissions-Policy
**Impact**:
- Clickjacking attacks (iframe embedding)
- Cross-site scripting (XSS) exploitation
- MIME type sniffing vulnerabilities
- Mixed content attacks (HTTP/HTTPS)
**Remediation**:
```nginx
# Add to NPM Custom Nginx Configuration for each proxy host
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline';" always;
add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;
```
**Testing**:
```bash
# Verify headers with curl
curl -I https://grafana.apophisnetworking.net | grep -E "X-Frame-Options|Content-Security-Policy|Strict-Transport-Security"
# Or use online scanner
# https://securityheaders.com
```
**Priority**: High (within 7 days)
---
### HIGH-009: Prometheus No Authentication
**Location**: VM 101 (monitoring-docker)
**Issue**: Prometheus accessible without authentication
```
http://192.168.2.114:9090
```
**Impact**:
- Unauthorized access to all metrics and time-series data
- Exposure of infrastructure topology and service inventory
- Disclosure of resource utilization patterns
- Potential for reconnaissance and targeted attacks
**Remediation**:
```yaml
# Option 1: Enable basic auth in Prometheus
# prometheus.yml
web:
basic_auth_users:
admin: $2y$10$... # bcrypt hash
# Generate bcrypt hash
htpasswd -nB admin
# Option 2: Use NPM with TinyAuth SSO
# Create proxy host:
# - Domain: prometheus.apophisnetworking.net
# - Forward: http://192.168.2.114:9090
# - Enable TinyAuth authentication
# Option 3: Restrict network access
iptables -A INPUT -p tcp --dport 9090 ! -s 192.168.2.114 -j DROP
# Only allow access from Grafana container
```
**Priority**: High (within 7 days)
---
### HIGH-010: PVE Exporter Insecure TLS Verification
**Location**: `/home/jramos/homelab/monitoring/pve-exporter/pve.yml`
**Issue**: SSL verification disabled for Proxmox API
```yaml
verify_ssl: false
```
**Impact**:
- Man-in-the-middle attacks against Proxmox API
- Potential for credential interception
- No protection against rogue HTTPS proxies
- Compromised trust model
**Remediation**:
```bash
# 1. Install Proxmox CA certificate in exporter container
# Copy CA cert from Proxmox
scp root@192.168.2.200:/etc/pve/pve-root-ca.pem ./ca.pem
# 2. Mount CA cert in container
volumes:
- ./ca.pem:/etc/ssl/certs/pve-ca.pem:ro
# 3. Update pve.yml
verify_ssl: true
# Or specify CA path
# ca_cert: /etc/ssl/certs/pve-ca.pem
# 4. Test connection
curl --cacert ca.pem https://192.168.2.200:8006/api2/json/version
```
**Priority**: High (within 7 days)
---
### HIGH-011: Twingate Connector Token Storage
**Location**: CT 112 (twingate-connector)
**Issue**: Twingate connector token stored in plain text configuration
**Impact**:
- Token compromise allows unauthorized connector registration
- Potential for man-in-the-middle attacks via rogue connector
- Lateral movement to homelab resources
- Network traffic interception
**Remediation**:
```bash
# 1. Use environment variable instead of config file
# Store token in .env (excluded from git)
TWINGATE_ACCESS_TOKEN=<token>
TWINGATE_REFRESH_TOKEN=<token>
# 2. Encrypt .env file using git-crypt
git-crypt add-gpg-user <key-id>
echo ".env filter=git-crypt diff=git-crypt" >> .gitattributes
# 3. Rotate connector token
# In Twingate admin console:
# - Connectors > [Your Connector] > Regenerate Token
# - Update .env file with new token
# - Restart connector container
# 4. Restrict filesystem permissions
chmod 600 /path/to/twingate/.env
chown root:root /path/to/twingate/.env
```
**Priority**: High (within 7 days)
---
### HIGH-012: Home Assistant Default Credentials
**Location**: VM 114 (haos)
**Issue**: Home Assistant may still have default or weak credentials
**Impact**:
- Unauthorized access to smart home controls
- Privacy invasion via camera/sensor access
- Potential for physical security bypass
- Automation manipulation (unlock doors, disable alarms)
**Remediation**:
```bash
# 1. Log in to Home Assistant
# http://192.168.2.<haos-ip>:8123
# 2. Navigate to Profile > Security
# - Change password to strong passphrase (20+ characters)
# - Enable 2FA/MFA
# 3. Create separate user accounts
# Settings > People > Add Person
# - Separate users for family members
# - Guest accounts with limited access
# 4. Configure trusted networks
# configuration.yaml
homeassistant:
auth_providers:
- type: trusted_networks
trusted_networks:
- 192.168.2.0/24
allow_bypass_login: false
# 5. Enable login attempts monitoring
# Review failed login attempts regularly
```
**Priority**: High (within 7 days)
---
## MEDIUM Findings (Severity: 4-6/10)
### MED-001: Backup Encryption Status Unknown
**Location**: PBS-Backups storage pool
**Issue**: Backup encryption configuration not documented
**Impact**:
- Potential for unencrypted backups of sensitive data
- Compliance risks (GDPR, HIPAA if applicable)
- Data exposure if backup storage is compromised
**Remediation**:
```bash
# 1. Verify current encryption status
# Log in to Proxmox Backup Server
# Check datastore encryption settings
# 2. Enable encryption for new backups
# In Proxmox VE:
# Datacenter > Storage > PBS-Backups > Edit
# Enable "Encrypt Backups"
# Set encryption key (store securely!)
# 3. Document encryption keys
# Store encryption keys in password manager
# Create key recovery procedure
# Test backup restore with encryption key
# 4. Re-encrypt existing backups
# Create new encrypted backup job
# Verify successful encrypted backups
# Delete old unencrypted backups after verification
```
**Priority**: Medium (within 30 days)
---
### MED-002: Container Image Vulnerability Scanning
**Issue**: No automated container image vulnerability scanning
**Impact**:
- Deployment of containers with known CVEs
- Potential for exploitation of unpatched vulnerabilities
- Compliance gaps (PCI-DSS, SOC 2 require vulnerability management)
**Remediation**:
```bash
# 1. Install Trivy scanner
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/trivy.list
sudo apt update && sudo apt install trivy
# 2. Scan existing images
trivy image grafana/grafana:latest
trivy image prom/prometheus:latest
trivy image ghcr.io/paperless-ngx/paperless-ngx:latest
# 3. Create automated scanning script
cat > /home/jramos/homelab/scripts/security/scan-containers.sh <<'EOF'
#!/bin/bash
IMAGES=(
"grafana/grafana:latest"
"prom/prometheus:latest"
"ghcr.io/paperless-ngx/paperless-ngx:latest"
# Add all images
)
for IMAGE in "${IMAGES[@]}"; do
echo "Scanning $IMAGE..."
trivy image --severity HIGH,CRITICAL "$IMAGE"
done
EOF
chmod +x /home/jramos/homelab/scripts/security/scan-containers.sh
# 4. Schedule weekly scans
crontab -e
# Add: 0 2 * * 0 /home/jramos/homelab/scripts/security/scan-containers.sh > /var/log/trivy-scan.log 2>&1
```
**Priority**: Medium (within 30 days)
---
### MED-003: Log Aggregation Without Authentication
**Location**: VM 101 (monitoring-docker) - Loki-stack
**Issue**: rsyslog receiving logs without authentication
**Impact**:
- Unauthorized log injection attacks
- Log poisoning (false positives, alert fatigue)
- Disk exhaustion via log flooding
- Covering tracks by injecting fake logs
**Remediation**:
```bash
# 1. Configure rsyslog TLS authentication
# /etc/rsyslog.conf
$ModLoad imtcp
$InputTCPServerStreamDriverMode 1
$InputTCPServerStreamDriverAuthMode x509/name
$InputTCPServerStreamDriverPermittedPeer *.apophisnetworking.net
# 2. Generate certificates for rsyslog
openssl req -new -x509 -days 3650 -nodes \
-out /etc/rsyslog.d/rsyslog-cert.pem \
-keyout /etc/rsyslog.d/rsyslog-key.pem
# 3. Configure clients to use TLS
# On UniFi router and other syslog sources
# Set syslog server: tls://192.168.2.114:6514
# 4. Implement rate limiting
$SystemLogRateLimitInterval 10
$SystemLogRateLimitBurst 100
```
**Priority**: Medium (within 30 days)
---
### MED-004: No Intrusion Detection System (IDS)
**Issue**: No network intrusion detection or prevention
**Impact**:
- Lack of visibility into malicious network activity
- No alerting for common attack patterns
- Delayed incident response
- Inability to detect lateral movement
**Remediation**:
```bash
# Option 1: Deploy Suricata IDS on CT 102 (nginx)
apt install suricata
systemctl enable suricata
# Configure Suricata
# /etc/suricata/suricata.yaml
# Set HOME_NET to 192.168.2.0/24
# Enable ET Open rules
suricata-update
suricata-update enable-source et/open
# Start Suricata
systemctl start suricata
# Option 2: Deploy Wazuh agent on all VMs/CTs
# Centralized HIDS for host-level intrusion detection
# Option 3: Enable Proxmox firewall logging
# Datacenter > Firewall > Options > Log level: info
# Forward firewall logs to Loki for analysis
```
**Priority**: Medium (within 30 days)
---
### MED-005: SSH Key Rotation Policy Missing
**Issue**: No documented SSH key rotation schedule
**Impact**:
- Long-lived SSH keys increase exposure window
- Compromised keys remain valid indefinitely
- Difficulty auditing key usage and ownership
**Remediation**:
```bash
# 1. Document current SSH keys
find /home -name "id_rsa.pub" -o -name "id_ed25519.pub" 2>/dev/null
# 2. Implement key rotation policy
# Create /home/jramos/homelab/docs/SSH_KEY_ROTATION_POLICY.md
# - Rotate keys every 180 days
# - Rotate immediately upon personnel change
# - Rotate immediately upon suspected compromise
# 3. Create rotation script
cat > /home/jramos/homelab/scripts/security/rotate-ssh-keys.sh <<'EOF'
#!/bin/bash
# Generate new ED25519 key (more secure than RSA)
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_new -C "$(whoami)@$(hostname)-$(date +%Y%m%d)"
# Deploy to all hosts
for HOST in $(cat ~/.ssh/known_hosts | awk '{print $1}' | sort -u); do
ssh-copy-id -i ~/.ssh/id_ed25519_new.pub "$HOST"
done
# Test new key
# ... testing logic ...
# Backup old key
mv ~/.ssh/id_ed25519 ~/.ssh/id_ed25519.old_$(date +%Y%m%d)
# Activate new key
mv ~/.ssh/id_ed25519_new ~/.ssh/id_ed25519
EOF
# 4. Schedule reminder
# Add calendar event for 180-day rotation
```
**Priority**: Medium (within 30 days)
---
### MED-006: No Security Audit Logging
**Issue**: Security-relevant events not centrally logged
**Impact**:
- Difficulty investigating security incidents
- No audit trail for compliance
- Inability to detect unauthorized access attempts
- Delayed breach detection
**Remediation**:
```bash
# 1. Configure auditd on all VMs
sudo apt install auditd audispd-plugins
# 2. Create audit rules for security events
cat > /etc/audit/rules.d/security.rules <<'EOF'
# Monitor authentication
-w /var/log/auth.log -p wa -k auth
-w /etc/passwd -p wa -k passwd_changes
-w /etc/shadow -p wa -k shadow_changes
# Monitor Docker
-w /var/run/docker.sock -p wa -k docker_socket
-w /usr/bin/docker -p x -k docker_execution
# Monitor SSH
-w /home/*/.ssh/ -p wa -k ssh_keys
-w /etc/ssh/sshd_config -p wa -k sshd_config
# Monitor sudo
-a always,exit -F arch=b64 -S execve -F euid=0 -F auid!=0 -k sudo_execution
EOF
# 3. Forward audit logs to Loki
# Install auditbeat or configure audisp-syslog
audisp-syslog --remote 192.168.2.114
# 4. Create Grafana dashboard for security events
# Visualize:
# - Failed login attempts
# - Sudo executions
# - File permission changes
# - Docker socket access
```
**Priority**: Medium (within 30 days)
---
### MED-007: No Container Runtime Security Policy
**Issue**: Docker containers run without AppArmor/SELinux profiles
**Impact**:
- Containers can perform unrestricted syscalls
- Easier privilege escalation from containers
- Lack of defense-in-depth
**Remediation**:
```bash
# 1. Install AppArmor (if not installed)
sudo apt install apparmor apparmor-utils
# 2. Create Docker AppArmor profile
cat > /etc/apparmor.d/docker-default-custom <<'EOF'
#include <tunables/global>
profile docker-default-custom flags=(attach_disconnected,mediate_deleted) {
#include <abstractions/base>
# Deny dangerous capabilities
deny capability sys_admin,
deny capability sys_module,
deny capability sys_rawio,
# Allow network
network,
# Allow common operations
file,
mount,
}
EOF
# 3. Load profile
apparmor_parser -r /etc/apparmor.d/docker-default-custom
# 4. Apply to containers
# In docker-compose.yaml:
security_opt:
- apparmor=docker-default-custom
# Or set as default in /etc/docker/daemon.json:
{
"security-opt": ["apparmor=docker-default-custom"]
}
```
**Priority**: Medium (within 30 days)
---
### MED-008: Missing Secrets Management Solution
**Issue**: Secrets scattered across .env files and docker-compose.yaml
**Impact**:
- No centralized secrets rotation
- Difficult to audit secret access
- Secrets stored in multiple locations
- No encryption at rest for secrets
**Remediation**:
```bash
# Option 1: HashiCorp Vault (enterprise-grade)
# Deploy Vault as LXC container
pct create 116 local:vztmpl/debian-11-standard_11.7-1_amd64.tar.zst \
--hostname vault \
--memory 1024 \
--net0 name=eth0,bridge=vmbr0,ip=192.168.2.116/24,gw=192.168.2.1
# Install Vault
apt install vault
vault server -dev # Dev mode for testing
# Initialize Vault
vault operator init
vault operator unseal
# Store secrets
vault kv put secret/paperless db_password="..."
vault kv put secret/bytestash jwt_secret="..."
# Integrate with Docker
# Use vault-agent to inject secrets
# Option 2: Docker Secrets (simpler for Docker Swarm)
# Convert to Docker Swarm mode
docker swarm init
# Create secrets
echo "password" | docker secret create db_password -
# Use in docker-compose.yaml
secrets:
db_password:
external: true
# Option 3: SOPS (Secrets OPerationS)
# Encrypt secrets in git repository
sops --encrypt .env > .env.encrypted
# Decrypt at deploy time
sops --decrypt .env.encrypted > .env
```
**Priority**: Medium (within 30 days)
---
### MED-009: No Vulnerability Disclosure Policy
**Issue**: No public security contact or vulnerability reporting process
**Impact**:
- Security researchers cannot report vulnerabilities
- Delayed disclosure of security issues
- Potential for public disclosure without remediation time
**Remediation**:
```markdown
# Create SECURITY.md in repository root
# /home/jramos/homelab/SECURITY.md
# Security Policy
## Reporting a Vulnerability
If you discover a security vulnerability in this homelab infrastructure, please report it by emailing:
**Security Contact**: security@apophisnetworking.net
Please include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested remediation (if any)
## Response Timeline
- **Acknowledgment**: Within 48 hours
- **Initial Assessment**: Within 7 days
- **Remediation Plan**: Within 14 days
- **Fix Deployment**: Within 30 days (critical), 90 days (non-critical)
## Disclosure Policy
We follow coordinated disclosure:
- Report privately via email
- We will acknowledge and investigate
- We will remediate before public disclosure
- Credit will be given to reporter (if desired)
## Scope
In scope:
- Infrastructure configuration vulnerabilities
- Container security issues
- Authentication bypass
- Privilege escalation
- Data exposure
Out of scope:
- Social engineering
- Physical attacks
- DDoS attacks
- Issues in third-party services (report to vendor)
```
**Priority**: Medium (within 30 days)
---
### MED-010: Container Name Inconsistency
**Issue**: Container names not following standard naming convention
**Current State**:
```bash
# Inconsistent naming
paperless-ngx-webserver-1
speedtest-tracker-app-1
tinyauth-tinyauth-1
```
**Impact**:
- Difficult to identify containers in logs
- Automation scripts may break
- Monitoring dashboards show unclear names
**Remediation**:
```yaml
# Use container_name directive in docker-compose.yaml
services:
webserver:
container_name: paperless-webserver
# ...
db:
container_name: paperless-db
# ...
```
**See**: `/home/jramos/homelab/scripts/security/CONTAINER_NAME_FIXES.md` for complete remediation
**Priority**: Low (continuous improvement)
---
### MED-011: No Rate Limiting on Authentication Endpoints
**Affected**: All services with web authentication
**Issue**: No rate limiting on login endpoints
**Impact**:
- Brute-force password attacks
- Account enumeration
- Credential stuffing attacks
- Resource exhaustion
**Remediation**:
```nginx
# Configure rate limiting in NPM
# In Custom Nginx Configuration:
# Define rate limit zone (10 MB stores ~160k IP addresses)
limit_req_zone $binary_remote_addr zone=auth_limit:10m rate=5r/m;
# Apply to authentication endpoints
location /api/tokens {
limit_req zone=auth_limit burst=3 nodelay;
# ... proxy configuration ...
}
location /api/auth/signin {
limit_req zone=auth_limit burst=3 nodelay;
# ... proxy configuration ...
}
# Return 429 Too Many Requests on limit exceeded
limit_req_status 429;
```
**Per-Service Configuration**:
```bash
# Grafana: Enable rate limiting in grafana.ini
[auth]
login_maximum_inactive_lifetime_duration = 10m
login_maximum_lifetime_duration = 30d
[auth.basic]
max_login_attempts = 5
lockout_duration = 5m
# n8n: Configure rate limiting
N8N_RATE_LIMIT_ENABLED=true
N8N_RATE_LIMIT_WINDOW=1m
N8N_RATE_LIMIT_MAX=10
```
**Priority**: Medium (within 30 days)
---
### MED-012: No Backup Integrity Verification
**Issue**: No automated backup integrity testing
**Impact**:
- Backups may be corrupted without detection
- Restore failures discovered during emergency
- Data loss risk despite backup strategy
**Remediation**:
```bash
# 1. Create backup verification script
cat > /home/jramos/homelab/scripts/security/verify-backups.sh <<'EOF'
#!/bin/bash
# Verify PBS backups
PBS_SERVER="192.168.2.XXX"
PBS_DATASTORE="PBS-Backups"
# List recent backups
proxmox-backup-client snapshot list \
--repository ${PBS_SERVER}:${PBS_DATASTORE}
# Verify backup integrity
for BACKUP in $(proxmox-backup-client snapshot list | awk 'NR>1 {print $1}'); do
echo "Verifying $BACKUP..."
proxmox-backup-client verify \
--repository ${PBS_SERVER}:${PBS_DATASTORE} \
$BACKUP
done
EOF
# 2. Schedule monthly verification
crontab -e
# Add: 0 3 1 * * /home/jramos/homelab/scripts/security/verify-backups.sh > /var/log/backup-verification.log 2>&1
# 3. Test restore procedure quarterly
# Document restore test in:
# /home/jramos/homelab/docs/BACKUP_RESTORE_TEST.md
# 4. Monitor verification results in Grafana
# Create dashboard showing:
# - Backup success rate
# - Verification success rate
# - Time since last successful restore test
```
**Priority**: Medium (within 30 days)
---
### MED-013: Insufficient Disk Encryption
**Issue**: Not all storage pools use encryption at rest
**Current State**:
- Vault: Encryption status unknown
- local: Unencrypted
- local-lvm: Unencrypted
- PBS-Backups: Encryption status unknown
**Impact**:
- Physical theft exposes all data
- Decommissioned drives leak sensitive information
- Compliance violations (GDPR, HIPAA)
**Remediation**:
```bash
# 1. Assess current encryption status
lsblk -f
cryptsetup status /dev/mapper/*
# 2. Enable LUKS encryption for new installations
# During Proxmox install, enable ZFS encryption
# 3. Encrypt existing volumes (REQUIRES BACKUP/RESTORE)
# WARNING: DESTRUCTIVE OPERATION
# Backup all data
proxmox-backup-client backup ...
# Encrypt volume
cryptsetup luksFormat /dev/sdX
cryptsetup luksOpen /dev/sdX encrypted_volume
mkfs.ext4 /dev/mapper/encrypted_volume
# Restore data
proxmox-backup-client restore ...
# 4. Configure automatic unlock at boot
# /etc/crypttab
encrypted_volume UUID=<uuid> /root/luks.key luks
# Secure key file
chmod 600 /root/luks.key
# 5. Document encryption keys in secure location
# Store LUKS headers and keys in password manager
# Test recovery procedure
```
**Priority**: Medium (within 30 days)
---
### MED-014: No Network Segmentation
**Issue**: All services on single flat network (192.168.2.0/24)
**Impact**:
- Lateral movement from compromised host
- No isolation between services
- Database servers accessible from any host
- Difficulty implementing least-privilege network policies
**Remediation**:
```bash
# Option 1: VLAN Segmentation on Proxmox
# Create VLANs:
# VLAN 10: Management (Proxmox, Ansible-Control)
# VLAN 20: DMZ (Web servers, reverse proxy)
# VLAN 30: Internal Services (databases, monitoring)
# VLAN 40: IoT (Home Assistant, isolated devices)
# Configure on Proxmox host
ip link add link vmbr0 name vmbr0.10 type vlan id 10
ip link add link vmbr0 name vmbr0.20 type vlan id 20
ip link add link vmbr0 name vmbr0.30 type vlan id 30
# Assign VMs to appropriate VLANs
# Edit VM network device to use vmbr0.XX
# Configure firewall rules
# /etc/pve/firewall/cluster.fw
[RULES]
# Allow management VLAN to access all
GROUP management -i net0
IN ACCEPT -source +management
# Restrict database access to web tier only
IN ACCEPT -source 192.168.30.0/24 -dport 5432 -dest 192.168.30.111
# Option 2: Docker Network Isolation
# Create separate networks per service stack
docker network create monitoring_network
docker network create paperless_network
docker network create auth_network
# Assign containers to dedicated networks
# Only bridge networks where communication is required
```
**Priority**: Medium (within 30 days)
---
### MED-015: Cloud Backup Strategy Missing
**Issue**: All backups stored on-premises (PBS-Backups)
**Impact**:
- Single point of failure (fire, flood, theft)
- No offsite backup for disaster recovery
- Inability to recover from site-wide catastrophic events
**Remediation**:
```bash
# Option 1: Proxmox Backup Server Sync to Cloud
# Configure PBS to sync to cloud storage
# In PBS:
# Configuration > Remote > Add
# - Type: S3
# - Bucket: homelab-backups
# - Region: us-east-1
# - Access Key: <key>
# - Secret Key: <secret>
# Create sync job
# Sync Jobs > Add
# - Local Datastore: PBS-Backups
# - Remote: aws-s3-remote
# - Schedule: Daily 0200
# Option 2: Rclone to Cloud Storage
apt install rclone
# Configure rclone
rclone config
# Select provider: S3, Backblaze B2, Google Drive, etc.
# Create backup script
cat > /home/jramos/homelab/scripts/backup-to-cloud.sh <<'EOF'
#!/bin/bash
# Sync critical data to cloud
rclone sync /mnt/pve/PBS-Backups remote:homelab-backups \
--transfers 4 \
--checksum \
--log-file /var/log/rclone-backup.log
EOF
# Schedule daily cloud backup
crontab -e
# Add: 0 3 * * * /home/jramos/homelab/scripts/backup-to-cloud.sh
# Option 3: Hybrid Approach
# - Keep 7 days on PBS-Backups (fast restore)
# - Keep 90 days on cloud (disaster recovery)
# - Encrypt backups before cloud upload
```
**Priority**: Medium (within 30 days)
---
## LOW Findings (Severity: 1-3/10)
### LOW-001: Missing Security Banners
**Issue**: No login banners warning unauthorized access
**Impact**:
- Lack of legal protection for prosecution
- No deterrent message for attackers
**Remediation**:
```bash
# Create /etc/issue.net banner
cat > /etc/issue.net <<'EOF'
***************************************************************************
AUTHORIZED ACCESS ONLY
***************************************************************************
This system is for authorized use only. All activity is logged and
monitored. Unauthorized access or use is prohibited and may be subject
to criminal and/or civil prosecution.
By accessing this system, you consent to monitoring and recording of
your activities.
***************************************************************************
EOF
# Enable banner in SSH
# /etc/ssh/sshd_config
Banner /etc/issue.net
# Restart SSH
systemctl restart sshd
```
**Priority**: Low (continuous improvement)
---
### LOW-002: Timezone Configuration Inconsistency
**Issue**: Container timezones may not match host timezone
**Impact**:
- Log timestamp confusion
- Cron job scheduling errors
- Difficult log correlation across services
**Remediation**:
```yaml
# Add to all docker-compose.yaml files
environment:
- TZ=America/New_York # Or your timezone
# Verify timezone
docker exec <container> date
timedatectl # On host
```
**Priority**: Low (continuous improvement)
---
### LOW-003: No Asset Inventory
**Issue**: No centralized asset management database
**Impact**:
- Difficulty tracking infrastructure changes
- Incomplete view of attack surface
- Challenge maintaining configuration consistency
**Remediation**:
```bash
# Use NetBox (CT 103) as CMDB
# Document in NetBox:
# - All VMs and containers
# - IP address assignments
# - Service dependencies
# - Software versions
# - Configuration baselines
# Create automated inventory script
cat > /home/jramos/homelab/scripts/inventory.sh <<'EOF'
#!/bin/bash
# Generate infrastructure inventory
{
echo "=== Proxmox VMs ==="
pvesh get /cluster/resources --type vm
echo "=== LXC Containers ==="
pvesh get /cluster/resources --type lxc
echo "=== Docker Containers ==="
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"
echo "=== Network Interfaces ==="
ip addr show
} > /home/jramos/homelab/docs/infrastructure-inventory.txt
EOF
chmod +x /home/jramos/homelab/scripts/inventory.sh
# Run weekly and commit to repository
crontab -e
# Add: 0 0 * * 0 /home/jramos/homelab/scripts/inventory.sh && cd /home/jramos/homelab && git add docs/infrastructure-inventory.txt && git commit -m "docs(inventory): weekly infrastructure update"
```
**Priority**: Low (continuous improvement)
---
### LOW-004: No Change Management Process
**Issue**: Infrastructure changes not formally documented
**Impact**:
- Difficulty troubleshooting issues
- No rollback procedure
- Unclear change history
**Remediation**:
```markdown
# Create CHANGES.md template
# /home/jramos/homelab/docs/CHANGE_TEMPLATE.md
## Change Request: [TITLE]
**Date**: YYYY-MM-DD
**Requested By**: Name
**Implemented By**: Name
**Priority**: Low / Medium / High / Critical
### Description
Brief description of the change.
### Justification
Why this change is necessary.
### Risk Assessment
- **Impact**: Low / Medium / High
- **Likelihood**: Low / Medium / High
- **Mitigation**: Steps to reduce risk
### Implementation Plan
1. Step 1
2. Step 2
3. Step 3
### Rollback Plan
1. Rollback step 1
2. Rollback step 2
### Testing
- [ ] Tested in dev environment
- [ ] Backup created before change
- [ ] Monitoring alerts reviewed
- [ ] Documentation updated
### Post-Implementation Review
- Date:
- Success: Yes / No
- Issues Encountered:
- Lessons Learned:
```
**Priority**: Low (continuous improvement)
---
### LOW-005: Documentation Not Version-Controlled
**Issue**: Some documentation may exist outside git repository
**Impact**:
- Inconsistent documentation versions
- Difficulty tracking documentation changes
- Risk of documentation loss
**Remediation**:
```bash
# Ensure all documentation is in repository
find /home/jramos -name "*.md" -o -name "README*" | grep -v homelab
# Move any found files to /home/jramos/homelab/docs/
# Update .gitignore to ensure docs are tracked
# Remove any overly broad ignore rules that exclude documentation
# Create documentation index
cat > /home/jramos/homelab/docs/INDEX.md <<'EOF'
# Documentation Index
## Infrastructure
- [CLAUDE_STATUS.md](../CLAUDE_STATUS.md) - Current infrastructure state
- [INDEX.md](../INDEX.md) - Repository navigation
## Services
- [Services Overview](../services/README.md)
- [Monitoring Stack](../monitoring/README.md)
- [TinyAuth SSO](../services/tinyauth/README.md)
## Security
- [Security Policy](../SECURITY.md)
- [Security Audit 2025-12-20](./SECURITY_AUDIT_2025-12-20.md)
- [Security Checklist](../templates/SECURITY_CHECKLIST.md)
## Troubleshooting
- [Loki Stack Bugfix](../troubleshooting/loki-stack-bugfix.md)
- [Anthropic Bug Report](../troubleshooting/ANTHROPIC_BUG_REPORT_TOOL_INHERITANCE.md)
EOF
```
**Priority**: Low (continuous improvement)
---
### LOW-006: No Capacity Planning Metrics
**Issue**: No automated capacity planning alerts
**Impact**:
- Unexpected resource exhaustion
- Service degradation without warning
- Difficulty planning infrastructure growth
**Remediation**:
```yaml
# Create Prometheus alerting rules
# /home/jramos/homelab/monitoring/prometheus/alerts.yml
groups:
- name: capacity
interval: 5m
rules:
- alert: HighDiskUsage
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "Disk usage above 90% on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "Memory usage above 90% on {{ $labels.instance }}"
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "CPU usage above 80% on {{ $labels.instance }}"
```
**Priority**: Low (continuous improvement)
---
### LOW-007: Service Dependency Mapping Missing
**Issue**: No documented service dependency map
**Impact**:
- Difficult to predict impact of service outages
- Unclear restart order for recovery
- Risk of cascading failures
**Remediation**:
```mermaid
# Create service dependency diagram
# /home/jramos/homelab/docs/SERVICE_DEPENDENCIES.md
graph TD
Internet[Internet] --> NPM[Nginx Proxy Manager CT 102]
NPM --> TinyAuth[TinyAuth CT 115]
NPM --> Grafana[Grafana VM 101]
NPM --> NetBox[NetBox CT 103]
TinyAuth --> NetBox
Grafana --> Prometheus[Prometheus VM 101]
Prometheus --> PVEExporter[PVE Exporter VM 101]
PVEExporter --> Proxmox[Proxmox Host]
Grafana --> Loki[Loki VM 101]
Loki --> Promtail[Promtail VM 101]
Promtail --> rsyslog[rsyslog VM 101]
rsyslog --> UniFi[UniFi Router]
n8n[n8n CT 113] --> PostgreSQL_n8n[PostgreSQL]
NetBox --> PostgreSQL_netbox[PostgreSQL]
Paperless[Paperless VM] --> PostgreSQL_paperless[PostgreSQL]
Paperless --> Redis[Redis]
Paperless --> Gotenberg[Gotenberg]
Paperless --> Tika[Tika]
style NPM fill:#f9f,stroke:#333
style TinyAuth fill:#bbf,stroke:#333
style Proxmox fill:#fbb,stroke:#333
```
**Priority**: Low (continuous improvement)
---
### LOW-008: No Incident Response Plan
**Issue**: No documented security incident response procedure
**Impact**:
- Chaotic response to security incidents
- Evidence destruction or contamination
- Delayed containment and recovery
**Remediation**:
```markdown
# Create Incident Response Plan
# /home/jramos/homelab/docs/INCIDENT_RESPONSE_PLAN.md
# Security Incident Response Plan
## Phase 1: Identification (0-1 hour)
1. Detect and acknowledge security event
2. Classify severity (Critical / High / Medium / Low)
3. Assemble response team (if applicable)
4. Begin incident log
## Phase 2: Containment (1-4 hours)
### Short-term Containment
1. Isolate affected systems (network segmentation, firewall rules)
2. Disable compromised accounts
3. Preserve evidence (snapshot VMs, copy logs)
4. Block known-bad IOCs (IP addresses, domains)
### Long-term Containment
1. Apply security patches
2. Reset credentials
3. Deploy temporary workarounds
4. Monitor for additional indicators
## Phase 3: Eradication (4-24 hours)
1. Identify root cause
2. Remove malware/backdoors
3. Close vulnerabilities
4. Verify threat is eliminated
## Phase 4: Recovery (24-72 hours)
1. Restore from clean backups
2. Rebuild compromised systems
3. Gradually restore services
4. Verify normal operations
5. Enhanced monitoring period
## Phase 5: Post-Incident (72+ hours)
1. Document timeline and actions taken
2. Root cause analysis
3. Lessons learned meeting
4. Update security controls
5. Improve detection capabilities
## Contact Information
- Primary: jramos (contact info)
- Escalation: (contact info)
- External: security@apophisnetworking.net
## Evidence Preservation
- Snapshot affected VMs: `qm snapshot <vmid> incident-<date>`
- Copy logs: `cp -r /var/log /evidence/incident-<date>/`
- Document all actions in incident log
- Maintain chain of custody
```
**Priority**: Low (continuous improvement)
---
### LOW-009: No Performance Baseline
**Issue**: No documented baseline for normal system performance
**Impact**:
- Difficulty detecting performance degradation
- Unclear if resource upgrades are needed
- No comparison for troubleshooting
**Remediation**:
```bash
# Create baseline collection script
cat > /home/jramos/homelab/scripts/collect-baseline.sh <<'EOF'
#!/bin/bash
# Collect performance baseline
BASELINE_DIR="/home/jramos/homelab/docs/baselines"
DATE=$(date +%Y%m%d-%H%M%S)
mkdir -p "$BASELINE_DIR"
{
echo "=== Performance Baseline - $DATE ==="
echo ""
echo "=== CPU Information ==="
lscpu
echo ""
echo "=== Memory Information ==="
free -h
echo ""
echo "=== Disk I/O ==="
iostat -x 1 10
echo ""
echo "=== Network Throughput ==="
iftop -t -s 10
echo ""
echo "=== Running Processes ==="
ps aux --sort=-%cpu | head -20
echo ""
echo "=== Load Average ==="
uptime
} > "$BASELINE_DIR/baseline-$DATE.txt"
echo "Baseline saved to $BASELINE_DIR/baseline-$DATE.txt"
EOF
chmod +x /home/jramos/homelab/scripts/collect-baseline.sh
# Collect baseline during normal operations
# Run weekly for trend analysis
crontab -e
# Add: 0 2 * * 0 /home/jramos/homelab/scripts/collect-baseline.sh
```
**Priority**: Low (continuous improvement)
---
### LOW-010: SSH Hardening Incomplete
**Issue**: SSH configuration may use default settings
**Impact**:
- Increased attack surface
- Weaker authentication than possible
- Legacy protocol support
**Remediation**:
```bash
# Update /etc/ssh/sshd_config on all VMs and containers
# Disable root login
PermitRootLogin no
# Disable password authentication (use keys only)
PasswordAuthentication no
ChallengeResponseAuthentication no
# Use strong ciphers only
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512,hmac-sha2-256
# Use strong key exchange algorithms
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
# Limit authentication attempts
MaxAuthTries 3
LoginGraceTime 30
# Enable strict mode
StrictModes yes
# Disable unnecessary features
X11Forwarding no
AllowTcpForwarding no
AllowAgentForwarding no
PermitUserEnvironment no
# Limit users
AllowUsers jramos
# Enable logging
LogLevel VERBOSE
# Restart SSH
systemctl restart sshd
# Verify configuration
sshd -t
```
**Priority**: Low (continuous improvement)
---
### LOW-011: No Decommissioning Procedure
**Issue**: No documented procedure for securely decommissioning systems
**Impact**:
- Data leakage from decommissioned drives
- Orphaned accounts and credentials
- Incomplete removal from monitoring
**Remediation**:
```markdown
# Create Decommissioning Checklist
# /home/jramos/homelab/docs/DECOMMISSIONING_CHECKLIST.md
# System Decommissioning Checklist
## Pre-Decommissioning
- [ ] Document reason for decommissioning
- [ ] Identify all services running on system
- [ ] Create final backup
- [ ] Identify dependent systems
- [ ] Plan migration if services are moving
- [ ] Schedule maintenance window
## Data Protection
- [ ] Backup all configuration files
- [ ] Export all data to secure location
- [ ] Verify backup integrity
- [ ] Document credentials for archives
## Service Migration (if applicable)
- [ ] Deploy replacement system
- [ ] Migrate services to new system
- [ ] Update DNS records
- [ ] Update reverse proxy configuration
- [ ] Test replacement system
- [ ] Monitor for 48 hours
## Removal
- [ ] Stop all services
- [ ] Shutdown VM/container
- [ ] Remove from monitoring (Prometheus targets)
- [ ] Remove from backup jobs
- [ ] Delete from Proxmox
- [ ] Remove DNS records
- [ ] Remove from documentation
- [ ] Remove from NetBox inventory
- [ ] Revoke SSH keys
- [ ] Disable service accounts
## Data Sanitization
- [ ] Overwrite disks (if physical hardware)
- `shred -vfz -n 3 /dev/sdX`
- [ ] Delete VM disk images
- [ ] Delete backups (after retention period)
- [ ] Verify no data remnants
## Documentation
- [ ] Update CLAUDE_STATUS.md
- [ ] Update infrastructure diagrams
- [ ] Update services README
- [ ] Commit changes to repository
- [ ] Create decommissioning log entry
```
**Priority**: Low (continuous improvement)
---
### LOW-012: No License Compliance Tracking
**Issue**: No tracking of open-source licenses in use
**Impact**:
- Potential license violations
- Legal risk from non-compliance
- Inability to audit software supply chain
**Remediation**:
```bash
# Create license inventory script
cat > /home/jramos/homelab/scripts/license-inventory.sh <<'EOF'
#!/bin/bash
# Generate software license inventory
{
echo "# Software License Inventory"
echo "Generated: $(date)"
echo ""
echo "## Container Images"
docker images --format "{{.Repository}}:{{.Tag}}" | while read IMAGE; do
echo "- $IMAGE"
# Attempt to extract license info from image
docker run --rm $IMAGE cat /usr/share/doc/*/copyright 2>/dev/null | head -20
echo ""
done
echo "## Debian Packages (Host)"
dpkg-query -W -f='${Package}\t${Version}\t${License}\n' | head -50
} > /home/jramos/homelab/docs/LICENSE_INVENTORY.md
EOF
chmod +x /home/jramos/homelab/scripts/license-inventory.sh
# Run quarterly and review
./license-inventory.sh
```
**Priority**: Low (continuous improvement)
---
## Compliance Summary
### Critical Vulnerabilities (8)
| ID | Finding | Priority | Estimated Effort |
|----|---------|----------|------------------|
| CRIT-001 | Hardcoded database passwords | Immediate | 2 hours |
| CRIT-002 | JWT secret exposed | Immediate | 1 hour |
| CRIT-003 | FileBrowser root mount | Immediate | 30 minutes |
| CRIT-004 | Portainer Docker socket | Immediate | 2 hours |
| CRIT-005 | TinyAuth plain text config | High | 2 hours |
| CRIT-006 | NPM default credentials | Immediate | 30 minutes |
| CRIT-007 | Grafana default credentials | Immediate | 15 minutes |
| CRIT-008 | PVE Exporter plain text token | Immediate | 1 hour |
**Total Critical Remediation Effort**: ~9 hours
### High Severity (12)
**Total High Remediation Effort**: ~24 hours
### Medium Severity (15)
**Total Medium Remediation Effort**: ~60 hours
### Low Severity (12)
**Total Low Remediation Effort**: Continuous improvement, ~20 hours initial
---
## Remediation Roadmap
### Week 1 (Critical - Immediate)
- [ ] Day 1: CRIT-001, CRIT-002, CRIT-006, CRIT-007 (4 hours)
- [ ] Day 2: CRIT-003, CRIT-004, CRIT-008 (4 hours)
- [ ] Day 3: CRIT-005, HIGH-001, HIGH-002 (6 hours)
- [ ] Day 4: HIGH-003, HIGH-004, HIGH-005 (6 hours)
- [ ] Day 5: HIGH-006, HIGH-007, HIGH-008 (6 hours)
### Week 2 (High Priority)
- [ ] HIGH-009, HIGH-010, HIGH-011, HIGH-012 (8 hours)
- [ ] Begin medium priority items
### Month 1 (Medium Priority)
- [ ] Complete all medium severity findings (60 hours over 3 weeks)
### Ongoing (Low Priority)
- [ ] Implement low severity improvements continuously
- [ ] Monthly security review meetings
- [ ] Quarterly penetration testing
- [ ] Annual comprehensive audit
---
## Monitoring and Validation
### Continuous Monitoring
```bash
# Create security monitoring dashboard in Grafana
# Metrics to track:
# - Failed authentication attempts
# - Unusual network connections
# - High privilege operations (sudo, docker exec)
# - Configuration changes
# - Certificate expiration dates
# - Backup success/failure rates
```
### Quarterly Security Reviews
```markdown
# Review checklist:
- [ ] Vulnerability scan all containers (Trivy)
- [ ] Review access logs for anomalies
- [ ] Test backup restore procedure
- [ ] Update all software and container images
- [ ] Review and rotate credentials
- [ ] Penetration test external services
- [ ] Update documentation
- [ ] Review incident response plan
```
### Annual Comprehensive Audit
```markdown
# Full security assessment:
- [ ] External penetration test
- [ ] Code review of custom scripts
- [ ] Configuration audit
- [ ] Compliance check (if applicable)
- [ ] Update security policies
- [ ] Disaster recovery test
```
---
## Appendices
### Appendix A: Scanning Scripts
**secrets-scanner.sh**:
```bash
#!/bin/bash
# Scan for hardcoded secrets
grep -r -E "(password|secret|key|token|api_key)" \
--include="*.yml" \
--include="*.yaml" \
--include="*.env" \
--include="*.conf" \
/home/jramos/homelab/ \
| grep -v ".git" \
| grep -v "example"
```
**port-scanner.sh**:
```bash
#!/bin/bash
# Identify open ports
nmap -sS -sV 192.168.2.0/24 -oN /tmp/port-scan-$(date +%Y%m%d).txt
```
### Appendix B: Reference Documentation
- [CIS Docker Benchmark](https://www.cisecurity.org/benchmark/docker)
- [NIST Cybersecurity Framework](https://www.nist.gov/cyberframework)
- [OWASP Top 10](https://owasp.org/www-project-top-ten/)
- [Docker Security Best Practices](https://docs.docker.com/engine/security/)
### Appendix C: Contact Information
**Security Team**:
- Primary Contact: jramos
- Email: security@apophisnetworking.net
- Emergency: (Contact information)
**Escalation Path**:
1. Infrastructure Owner
2. External Security Consultant (if applicable)
3. Legal Counsel (for data breaches)
---
**Report Generated**: 2025-12-20
**Next Audit Due**: 2026-06-20 (6 months)
**Version**: 1.0
**Status**: DRAFT - Awaiting Remediation
---
**Disclaimer**: This security audit represents a point-in-time assessment based on available documentation and configuration files. It does not include active penetration testing, social engineering, or physical security assessments. Actual security posture may differ from findings presented here. This report is intended for internal use only and should not be distributed outside the organization without proper redaction of sensitive information.