feat(docs): update documentation for monitoring stack and infrastructure changes

- Update INDEX.md with VM 101 (monitoring-docker) and CT 112 (twingate-connector)
- Update README.md with monitoring and security sections
- Update CLAUDE.md with new architecture patterns
- Update services/README.md with monitoring stack documentation
- Update CLAUDE_STATUS.md with current infrastructure state
- Update infrastructure counts: 10 VMs, 4 Containers
- Update storage stats: PBS 27.43%, Vault 10.88%
- Create comprehensive monitoring/README.md
- Add .gitignore rules for monitoring sensitive files (pve.yml, .env)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-12-07 12:41:08 -07:00
parent 0366c63d51
commit f42eeaba92
7 changed files with 1367 additions and 1000 deletions

View File

@@ -132,6 +132,205 @@ cd speedtest-tracker
docker compose up -d
```
## Monitoring Stack (VM-based)
**Deployment**: VM 101 (monitoring-docker) at 192.168.2.114
**Technology**: Docker Compose
**Components**: Grafana, Prometheus, PVE Exporter
### Overview
Comprehensive monitoring and observability stack for the Proxmox homelab environment providing real-time metrics, visualization, and alerting capabilities.
### Components
**Grafana** (Port 3000):
- Visualization and dashboards
- Pre-configured Proxmox VE dashboards
- User authentication and RBAC
- Alerting capabilities
- Access: http://192.168.2.114:3000
**Prometheus** (Port 9090):
- Metrics collection and time-series database
- PromQL query language
- 15-day retention (configurable)
- Service discovery
- Access: http://192.168.2.114:9090
**PVE Exporter** (Port 9221):
- Proxmox VE metrics exporter
- Connects to Proxmox API
- Exports node, VM, CT, and storage metrics
- Access: http://192.168.2.114:9221
### Key Features
- Real-time Proxmox infrastructure monitoring
- VM and container resource utilization tracking
- Storage pool capacity planning
- Network traffic analysis
- Backup job status monitoring
- Custom alerting rules
### Deployment
```bash
# Navigate to monitoring directory
cd /home/jramos/homelab/monitoring
# Deploy PVE Exporter
cd pve-exporter
docker compose up -d
# Deploy Prometheus
cd ../prometheus
docker compose up -d
# Deploy Grafana
cd ../grafana
docker compose up -d
# Verify all services
docker ps | grep -E 'grafana|prometheus|pve-exporter'
```
### Configuration
**PVE Exporter**:
- Environment file: `monitoring/pve-exporter/.env`
- Configuration: `monitoring/pve-exporter/pve.yml`
- Requires Proxmox API user with PVEAuditor role
**Prometheus**:
- Configuration: `monitoring/prometheus/prometheus.yml`
- Scrapes PVE Exporter every 30 seconds
- Targets: localhost:9090, pve-exporter:9221
**Grafana**:
- Default credentials: admin/admin (change on first login)
- Data source: Prometheus at http://prometheus:9090
- Recommended dashboard: Grafana ID 10347 (Proxmox VE)
### Maintenance
```bash
# Update images
cd /home/jramos/homelab/monitoring/<component>
docker compose pull
docker compose up -d
# View logs
docker compose logs -f
# Restart services
docker compose restart
```
### Troubleshooting
**PVE Exporter connection issues**:
1. Verify Proxmox API is accessible: `curl -k https://192.168.2.200:8006`
2. Check credentials in `.env` file
3. Verify user has PVEAuditor role: `pveum user list` (on Proxmox)
**Grafana shows no data**:
1. Verify Prometheus data source configuration
2. Check Prometheus targets: http://192.168.2.114:9090/targets
3. Test queries in Prometheus UI before using in Grafana
**High memory usage**:
1. Reduce Prometheus retention period
2. Limit Grafana concurrent queries
3. Increase VM 101 memory allocation
**Complete Documentation**: See `/home/jramos/homelab/monitoring/README.md`
---
## Twingate Connector
**Deployment**: CT 112 (twingate-connector)
**Technology**: LXC Container
**Purpose**: Zero-trust network access
### Overview
Lightweight connector providing secure remote access to homelab resources without traditional VPN complexity. Part of Twingate's zero-trust network access (ZTNA) solution.
### Features
- **Zero-Trust Architecture**: Grant access to specific resources, not entire networks
- **No VPN Required**: Simplified connection without VPN client configuration
- **Identity-Based Access**: User and device authentication
- **Automatic Updates**: Connector auto-updates for security patches
- **Low Resource Overhead**: Minimal CPU and memory footprint
### Architecture
```
External User → Twingate Cloud → Twingate Connector (CT 112) → Homelab Resources
```
### Deployment Considerations
**LXC vs Docker**:
- LXC chosen for lightweight, always-on service
- Minimal resource consumption
- System-level integration
- Quick restart and recovery
**Network Placement**:
- Deployed on homelab management network (192.168.2.0/24)
- Access to all internal resources
- No inbound port forwarding required
### Configuration
The Twingate connector is configured via the Twingate Admin Console:
1. **Create Connector** in Twingate Admin Console
2. **Generate Token** for connector authentication
3. **Deploy Container** with provided token
4. **Configure Resources** to route through connector
5. **Assign Users** to resources
### Maintenance
**Health Monitoring**:
- Check connector status in Twingate Admin Console
- Monitor CPU/memory usage on CT 112
- Review connection logs
**Updates**:
- Connector auto-updates by default
- Manual updates: Restart container or redeploy
**Troubleshooting**:
- Verify network connectivity to Twingate cloud
- Check connector token validity
- Review resource routing configuration
- Ensure firewall allows outbound HTTPS
### Security Best Practices
1. **Least Privilege**: Grant access only to required resources
2. **MFA Enforcement**: Require multi-factor authentication for users
3. **Device Trust**: Enable device posture checks
4. **Audit Logs**: Regularly review access logs in Twingate Console
5. **Connector Isolation**: Consider dedicated network segment for connector
### Integration with Homelab
**Protected Resources**:
- Proxmox Web UI (192.168.2.200:8006)
- Grafana Monitoring (192.168.2.114:3000)
- Nginx Proxy Manager (192.168.2.101:81)
- n8n Workflows (192.168.2.107:5678)
- Development VMs and services
**Access Policies**:
- Admin users: Full access to all resources
- Monitoring users: Read-only Grafana access
- Developers: Access to dev VMs and services
---
## General Deployment Instructions
### Prerequisites
@@ -308,6 +507,39 @@ Several services have embedded secrets in their docker-compose.yaml files:
2. Verify host directory ownership: `chown -R <user>:<group> /path/to/volume`
3. Check SELinux context (if applicable): `ls -Z /path/to/volume`
### Monitoring Stack Issues
**Metrics Not Appearing**:
1. Verify PVE Exporter can reach Proxmox API
2. Check Prometheus scrape targets status
3. Ensure Grafana data source is configured correctly
4. Review retention policies (data may be expired)
**Authentication Failures (PVE Exporter)**:
1. Verify Proxmox user credentials in `.env` file
2. Check user has PVEAuditor role
3. Test API access: `curl -k https://192.168.2.200:8006/api2/json/version`
**High Resource Usage**:
1. Adjust Prometheus retention: `--storage.tsdb.retention.time=7d`
2. Reduce scrape frequency in prometheus.yml
3. Limit Grafana query concurrency
4. Increase VM 101 resources if needed
### Twingate Connector Issues
**Connector Offline**:
1. Check CT 112 is running: `pct status 112`
2. Verify network connectivity from container
3. Check connector token validity in Twingate Console
4. Review container logs for error messages
**Cannot Access Resources**:
1. Verify resource is configured in Twingate Console
2. Check user has permission to access resource
3. Ensure connector is online and healthy
4. Verify network routes on CT 112
## Migration Notes
### Post-Migration Tasks
@@ -353,6 +585,7 @@ For homelab-specific questions or issues:
---
**Last Updated**: 2025-12-02
**Last Updated**: 2025-12-07
**Maintainer**: jramos
**Repository**: http://192.168.2.102:3060/jramos/homelab
**Infrastructure**: 10 VMs, 4 LXC Containers