feat(docs): update documentation for monitoring stack and infrastructure changes
- Update INDEX.md with VM 101 (monitoring-docker) and CT 112 (twingate-connector) - Update README.md with monitoring and security sections - Update CLAUDE.md with new architecture patterns - Update services/README.md with monitoring stack documentation - Update CLAUDE_STATUS.md with current infrastructure state - Update infrastructure counts: 10 VMs, 4 Containers - Update storage stats: PBS 27.43%, Vault 10.88% - Create comprehensive monitoring/README.md - Add .gitignore rules for monitoring sensitive files (pve.yml, .env) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
5
.gitignore
vendored
5
.gitignore
vendored
@@ -134,6 +134,11 @@ services/homepage/services.yaml
|
||||
# Template files (.template) are tracked for reference
|
||||
scripts/fixers/fix_n8n_db_c_locale.sh
|
||||
|
||||
# Monitoring Stack Sensitive Files
|
||||
# --------------------------------
|
||||
# Exclude files containing Proxmox credentials and local paths
|
||||
**/pve.yml # Proxmox credentials for exporters (NOT templates)
|
||||
|
||||
# Custom Exclusions
|
||||
# ----------------
|
||||
# Add any custom patterns specific to your homelab below:
|
||||
|
||||
21
CLAUDE.md
21
CLAUDE.md
@@ -21,9 +21,11 @@ The infrastructure employs full VMs for services requiring kernel-level isolatio
|
||||
| VM ID | Name | Purpose | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| 100 | docker-hub | Container registry/Docker hub mirror | Local container image caching |
|
||||
| 101 | gitlab | GitLab CE/EE instance | Source control, CI/CD platform |
|
||||
| 101 | monitoring-docker | Monitoring stack | Grafana/Prometheus/PVE Exporter at 192.168.2.114 |
|
||||
| 104 | ubuntu-dev | Ubuntu development environment | Additional dev workstation |
|
||||
| 105 | dev | Development environment | General-purpose development workstation |
|
||||
| 106 | Ansible-Control | Automation control node | IaC orchestration, configuration management |
|
||||
| 107 | ubuntu-docker | Ubuntu Docker host | Docker-focused environment |
|
||||
| 108 | CML | Cisco Modeling Labs | Network simulation/testing environment |
|
||||
| 109 | web-server-01 | Web application server | Production-like web tier (clustered) |
|
||||
| 110 | web-server-02 | Web application server | Load-balanced pair with web-server-01 |
|
||||
@@ -35,9 +37,10 @@ Lightweight services leveraging LXC for reduced overhead and faster provisioning
|
||||
|
||||
| CT ID | Name | Purpose | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| 102 | nginx | Reverse proxy/load balancer | Front-end traffic management |
|
||||
| 102 | nginx | Reverse proxy/load balancer | Front-end traffic management (NPM) |
|
||||
| 103 | netbox | Network documentation/IPAM | Infrastructure source of truth |
|
||||
| 112 | Anytype | Knowledge management | Personal/team documentation |
|
||||
| 112 | twingate-connector | Zero-trust network access | Secure remote access connector |
|
||||
| 113 | n8n | Workflow automation | n8n.io platform at 192.168.2.107 |
|
||||
|
||||
### Storage Architecture
|
||||
|
||||
@@ -45,10 +48,10 @@ The storage layout demonstrates a well-organized approach to data separation:
|
||||
|
||||
| Storage Pool | Type | Usage | Purpose |
|
||||
|--------------|------|-------|---------|
|
||||
| local | Directory | 14.8% | System files, ISOs, templates |
|
||||
| local | Directory | 15.13% | System files, ISOs, templates |
|
||||
| local-lvm | LVM-Thin | 0.0% | VM disk images (thin provisioned) |
|
||||
| Vault | NFS/Directory | 11.9% | Secure storage for sensitive data |
|
||||
| PBS-Backups | Proxmox Backup Server | 21.6% | Automated backup repository |
|
||||
| Vault | NFS/Directory | 10.88% | Secure storage for sensitive data |
|
||||
| PBS-Backups | Proxmox Backup Server | 27.43% | Automated backup repository |
|
||||
| iso-share | NFS/CIFS | 1.4% | Installation media library |
|
||||
| localnetwork | Network share | N/A | Shared resources across infrastructure |
|
||||
|
||||
@@ -60,7 +63,11 @@ The storage layout demonstrates a well-organized approach to data separation:
|
||||
|
||||
**Network Simulation Capability**: CML (108) suggests network engineering activities, possibly testing configurations before production deployment.
|
||||
|
||||
**Container Strategy**: The selective use of LXC for stateless or lightweight services (nginx, netbox) vs full VMs for complex applications demonstrates thoughtful resource optimization.
|
||||
**Container Strategy**: The selective use of LXC for stateless or lightweight services (nginx, netbox, twingate, n8n) vs full VMs for complex applications demonstrates thoughtful resource optimization.
|
||||
|
||||
**Monitoring & Observability**: The dedicated monitoring VM (101) with Grafana, Prometheus, and PVE Exporter provides comprehensive infrastructure visibility, enabling proactive capacity planning and performance optimization.
|
||||
|
||||
**Zero-Trust Security**: Implementation of Twingate connector (CT 112) demonstrates modern security practices, providing secure remote access without traditional VPN complexity.
|
||||
|
||||
## Working with This Environment
|
||||
|
||||
|
||||
1239
CLAUDE_STATUS.md
1239
CLAUDE_STATUS.md
File diff suppressed because it is too large
Load Diff
66
INDEX.md
66
INDEX.md
@@ -309,13 +309,14 @@ cat scripts/crawlers-exporters/COLLECTION-GUIDE.md
|
||||
|
||||
## Your Infrastructure
|
||||
|
||||
Based on the latest export (2025-12-02 20:49:54), your environment includes:
|
||||
Based on the latest export (2025-12-07 12:00:40), your environment includes:
|
||||
|
||||
### Virtual Machines (QEMU/KVM) - 9 VMs
|
||||
### Virtual Machines (QEMU/KVM) - 10 VMs
|
||||
|
||||
| VM ID | Name | Status | Purpose |
|
||||
|-------|------|--------|---------|
|
||||
| 100 | docker-hub | Running | Container registry/Docker hub mirror |
|
||||
| 101 | monitoring-docker | Running | Monitoring stack (Grafana/Prometheus/PVE Exporter) at 192.168.2.114 |
|
||||
| 104 | ubuntu-dev | Stopped | Ubuntu development environment |
|
||||
| 105 | dev | Stopped | General-purpose development workstation |
|
||||
| 106 | Ansible-Control | Running | IaC orchestration, configuration management |
|
||||
@@ -325,23 +326,24 @@ Based on the latest export (2025-12-02 20:49:54), your environment includes:
|
||||
| 110 | web-server-02 | Running | Load-balanced pair with web-server-01 |
|
||||
| 111 | db-server-01 | Running | Backend database server |
|
||||
|
||||
**Note**: VM 101 (gitlab) has been removed from the infrastructure.
|
||||
**Recent Changes**: Added VM 101 (monitoring-docker) for dedicated observability infrastructure.
|
||||
|
||||
### Containers (LXC) - 3 Containers
|
||||
### Containers (LXC) - 4 Containers
|
||||
|
||||
| CT ID | Name | Status | Purpose |
|
||||
|-------|------|--------|---------|
|
||||
| 102 | nginx | Running | Reverse proxy/load balancer |
|
||||
| 103 | netbox | Stopped | Network documentation/IPAM |
|
||||
| 113 | n8n | Running | Workflow automation platform |
|
||||
| 112 | twingate-connector | Running | Zero-trust network access connector |
|
||||
| 113 | n8n | Running | Workflow automation platform at 192.168.2.107 |
|
||||
|
||||
**Note**: CT 112 (Anytype) has been replaced by CT 113 (n8n).
|
||||
**Recent Changes**: Added CT 112 (twingate-connector) for zero-trust security, CT 113 (n8n) for workflow automation.
|
||||
|
||||
### Storage Pools
|
||||
- **local** (Directory) - 14.8% used - System files, ISOs, templates
|
||||
- **local** (Directory) - 15.13% used - System files, ISOs, templates
|
||||
- **local-lvm** (LVM-Thin) - 0.0% used - VM disk images (thin provisioned)
|
||||
- **Vault** (NFS/Directory) - 11.9% used - Secure storage for sensitive data
|
||||
- **PBS-Backups** (Proxmox Backup Server) - 21.6% used - Automated backup repository
|
||||
- **Vault** (NFS/Directory) - 10.88% used - Secure storage for sensitive data
|
||||
- **PBS-Backups** (Proxmox Backup Server) - 27.43% used - Automated backup repository
|
||||
- **iso-share** (NFS/CIFS) - 1.4% used - Installation media library
|
||||
- **localnetwork** (Network share) - Shared resources across infrastructure
|
||||
|
||||
@@ -349,8 +351,8 @@ All of these are documented in your collection exports!
|
||||
|
||||
## Latest Export Information
|
||||
|
||||
- **Export Directory**: `/home/jramos/homelab/homelab-export-20251202-204939/`
|
||||
- **Collection Date**: 2025-12-02 20:49:54
|
||||
- **Export Directory**: `/home/jramos/homelab/disaster-recovery/homelab-export-20251207-120040/`
|
||||
- **Collection Date**: 2025-12-07 12:00:40
|
||||
- **Hostname**: serviceslab
|
||||
- **Collection Level**: full
|
||||
- **Script Version**: 1.0.0
|
||||
@@ -439,6 +441,40 @@ For detailed troubleshooting, see: **[troubleshooting/BUGFIX-SUMMARY.md](trouble
|
||||
| **Output (standard)** | 2-6 MB | Per collection run |
|
||||
| **Output (full)** | 5-20 MB | Per collection run |
|
||||
|
||||
## Monitoring Stack
|
||||
|
||||
The infrastructure now includes a comprehensive monitoring and observability stack deployed on VM 101 (monitoring-docker) at 192.168.2.114:
|
||||
|
||||
### Components
|
||||
- **Grafana** (Port 3000): Visualization and dashboards
|
||||
- **Prometheus** (Port 9090): Metrics collection and time-series database
|
||||
- **PVE Exporter** (Port 9221): Proxmox VE metrics exporter
|
||||
|
||||
### Features
|
||||
- Real-time Proxmox infrastructure monitoring
|
||||
- VM and container resource utilization tracking
|
||||
- Storage pool metrics and capacity planning
|
||||
- Network traffic analysis
|
||||
- Pre-configured dashboards for Proxmox VE
|
||||
- Alerting capabilities (configurable)
|
||||
|
||||
### Access
|
||||
- **Grafana UI**: http://192.168.2.114:3000
|
||||
- **Prometheus UI**: http://192.168.2.114:9090
|
||||
- **Metrics Endpoint**: http://192.168.2.114:9221/pve
|
||||
|
||||
### Documentation
|
||||
For comprehensive setup, configuration, and troubleshooting:
|
||||
- **Monitoring Guide**: `monitoring/README.md`
|
||||
- **Docker Compose Configs**: `monitoring/grafana/`, `monitoring/prometheus/`, `monitoring/pve-exporter/`
|
||||
|
||||
### Key Metrics
|
||||
- Node CPU, memory, and disk usage
|
||||
- VM/CT resource consumption
|
||||
- Storage pool utilization trends
|
||||
- Backup job success rates
|
||||
- Network interface statistics
|
||||
|
||||
## Service Management
|
||||
|
||||
### n8n Workflow Automation
|
||||
@@ -531,8 +567,8 @@ bash scripts/crawlers-exporters/collect.sh
|
||||
|
||||
---
|
||||
|
||||
**Repository Version:** 2.0.0
|
||||
**Last Updated**: 2025-12-02
|
||||
**Latest Export**: homelab-export-20251202-204939
|
||||
**Infrastructure**: 9 VMs, 3 Containers, Proxmox VE 8.3.3
|
||||
**Repository Version:** 2.1.0
|
||||
**Last Updated**: 2025-12-07
|
||||
**Latest Export**: disaster-recovery/homelab-export-20251207-120040
|
||||
**Infrastructure**: 10 VMs, 4 Containers, Proxmox VE 8.3.3
|
||||
**Maintained by**: Your homelab automation system
|
||||
|
||||
46
README.md
46
README.md
@@ -16,18 +16,21 @@ This repository contains configuration files, scripts, and documentation for man
|
||||
|
||||
### Virtual Machines (QEMU/KVM)
|
||||
- **100** - docker-hub: Container registry and Docker hub mirror
|
||||
- **101** - gitlab: GitLab CE/EE for source control and CI/CD
|
||||
- **101** - monitoring-docker: Monitoring stack (Grafana/Prometheus/PVE Exporter) at 192.168.2.114
|
||||
- **104** - ubuntu-dev: Ubuntu development environment
|
||||
- **105** - dev: General-purpose development environment
|
||||
- **106** - Ansible-Control: Infrastructure automation control node
|
||||
- **107** - ubuntu-docker: Ubuntu Docker host
|
||||
- **108** - CML: Cisco Modeling Labs for network simulation
|
||||
- **109** - web-server-01: Web application server (clustered)
|
||||
- **110** - web-server-02: Web application server (load-balanced)
|
||||
- **111** - db-server-01: Database server
|
||||
|
||||
### Containers (LXC)
|
||||
- **102** - nginx: Reverse proxy and load balancer
|
||||
- **102** - nginx: Reverse proxy and load balancer (Nginx Proxy Manager)
|
||||
- **103** - netbox: Network documentation and IPAM
|
||||
- **112** - Anytype: Knowledge management system
|
||||
- **112** - twingate-connector: Zero-trust network access connector
|
||||
- **113** - n8n: Workflow automation platform at 192.168.2.107
|
||||
|
||||
### Storage Pools
|
||||
- **local**: System files, ISOs, and templates
|
||||
@@ -49,6 +52,40 @@ homelab/
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
The infrastructure includes a comprehensive monitoring stack deployed on VM 101 (monitoring-docker) at 192.168.2.114:
|
||||
|
||||
### Components
|
||||
- **Grafana** (Port 3000): Visualization and dashboards
|
||||
- **Prometheus** (Port 9090): Metrics collection and time-series database
|
||||
- **PVE Exporter** (Port 9221): Proxmox VE metrics exporter
|
||||
|
||||
### Features
|
||||
- Real-time infrastructure monitoring
|
||||
- Resource utilization tracking for VMs and containers
|
||||
- Storage pool metrics and trends
|
||||
- Network traffic analysis
|
||||
- Pre-configured Proxmox VE dashboards
|
||||
- Alerting capabilities
|
||||
|
||||
**Documentation**: See `monitoring/README.md` for complete setup and configuration guide.
|
||||
|
||||
## Network Security
|
||||
|
||||
### Zero-Trust Access
|
||||
- **CT 112** - twingate-connector: Provides secure remote access without traditional VPN
|
||||
- **Technology**: Twingate zero-trust network access
|
||||
- **Benefits**: Simplified secure access, no complex VPN configurations
|
||||
|
||||
## Automation & Integration
|
||||
|
||||
### Workflow Automation
|
||||
- **CT 113** - n8n at 192.168.2.107
|
||||
- **Database**: PostgreSQL 15+
|
||||
- **Features**: API integrations, scheduled workflows, webhook triggers
|
||||
- **Documentation**: See `services/README.md` for n8n setup and troubleshooting
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
@@ -137,5 +174,6 @@ For questions about:
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: 2025-11-29*
|
||||
*Last Updated: 2025-12-07*
|
||||
*Proxmox Version: 8.3.3*
|
||||
*Infrastructure: 10 VMs, 4 LXC Containers*
|
||||
|
||||
755
monitoring/README.md
Normal file
755
monitoring/README.md
Normal file
@@ -0,0 +1,755 @@
|
||||
# Monitoring Stack
|
||||
|
||||
Comprehensive monitoring and observability stack for the Proxmox homelab environment, providing real-time metrics, visualization, and alerting capabilities.
|
||||
|
||||
## Overview
|
||||
|
||||
The monitoring stack consists of three primary components deployed on VM 101 (monitoring-docker) at 192.168.2.114:
|
||||
|
||||
- **Grafana**: Visualization and dashboards (Port 3000)
|
||||
- **Prometheus**: Metrics collection and time-series database (Port 9090)
|
||||
- **PVE Exporter**: Proxmox VE metrics exporter (Port 9221)
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Proxmox Host (serviceslab) │
|
||||
│ 192.168.2.200 │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
│ API (8006)
|
||||
│
|
||||
┌────────▼────────┐
|
||||
│ PVE Exporter │
|
||||
│ Port: 9221 │
|
||||
│ (VM 101) │
|
||||
└────────┬────────┘
|
||||
│
|
||||
│ Metrics
|
||||
│
|
||||
┌────────▼────────┐
|
||||
│ Prometheus │
|
||||
│ Port: 9090 │
|
||||
│ (VM 101) │
|
||||
└────────┬────────┘
|
||||
│
|
||||
│ Query
|
||||
│
|
||||
┌────────▼────────┐
|
||||
│ Grafana │
|
||||
│ Port: 3000 │
|
||||
│ (VM 101) │
|
||||
└─────────────────┘
|
||||
│
|
||||
│ HTTPS
|
||||
│
|
||||
┌────────▼────────┐
|
||||
│ Nginx Proxy │
|
||||
│ (CT 102) │
|
||||
│ 192.168.2.101 │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### VM 101: monitoring-docker
|
||||
|
||||
**Specifications**:
|
||||
- **IP Address**: 192.168.2.114
|
||||
- **Operating System**: Ubuntu 22.04/24.04 LTS
|
||||
- **Docker Version**: 24.0+
|
||||
- **Purpose**: Dedicated monitoring infrastructure host
|
||||
|
||||
**Resource Allocation**:
|
||||
- **CPU**: 2-4 cores
|
||||
- **Memory**: 4-8 GB
|
||||
- **Storage**: 50-100 GB (thin provisioned)
|
||||
|
||||
### Grafana
|
||||
|
||||
**Version**: Latest stable
|
||||
**Port**: 3000
|
||||
**Access**: http://192.168.2.114:3000
|
||||
|
||||
**Features**:
|
||||
- Pre-configured Proxmox VE dashboards
|
||||
- Prometheus data source integration
|
||||
- User authentication and authorization
|
||||
- Dashboard templating and variables
|
||||
- Alerting capabilities
|
||||
- Panel plugins for advanced visualizations
|
||||
|
||||
**Default Credentials**:
|
||||
- Username: `admin`
|
||||
- Password: Check `.env` file or initial setup
|
||||
|
||||
**Key Dashboards**:
|
||||
- Proxmox Host Overview
|
||||
- VM Resource Utilization
|
||||
- Container Resource Utilization
|
||||
- Storage Pool Metrics
|
||||
- Network Traffic Analysis
|
||||
|
||||
### Prometheus
|
||||
|
||||
**Version**: Latest stable
|
||||
**Port**: 9090
|
||||
**Access**: http://192.168.2.114:9090
|
||||
|
||||
**Configuration**: `/home/jramos/homelab/monitoring/prometheus/prometheus.yml`
|
||||
|
||||
**Scrape Targets**:
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'prometheus'
|
||||
static_configs:
|
||||
- targets: ['localhost:9090']
|
||||
|
||||
- job_name: 'pve'
|
||||
static_configs:
|
||||
- targets: ['pve-exporter:9221']
|
||||
metrics_path: /pve
|
||||
params:
|
||||
module: [default]
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Time-series metrics database
|
||||
- PromQL query language
|
||||
- Service discovery
|
||||
- Alert manager integration (configurable)
|
||||
- Data retention policies
|
||||
- Remote storage support
|
||||
|
||||
**Retention Policy**: 15 days (configurable via command line args)
|
||||
|
||||
### PVE Exporter
|
||||
|
||||
**Version**: prompve/prometheus-pve-exporter:latest
|
||||
**Port**: 9221
|
||||
**Access**: http://192.168.2.114:9221
|
||||
|
||||
**Configuration**:
|
||||
- File: `/home/jramos/homelab/monitoring/pve-exporter/pve.yml`
|
||||
- Environment: `/home/jramos/homelab/monitoring/pve-exporter/.env`
|
||||
|
||||
**Proxmox Connection**:
|
||||
```yaml
|
||||
default:
|
||||
user: monitoring@pve
|
||||
password: <stored in .env>
|
||||
verify_ssl: false
|
||||
```
|
||||
|
||||
**Metrics Exported**:
|
||||
- Proxmox cluster status
|
||||
- Node CPU, memory, disk usage
|
||||
- VM/CT status and resource usage
|
||||
- Storage pool utilization
|
||||
- Network interface statistics
|
||||
- Backup job status
|
||||
- Service health
|
||||
|
||||
**Environment Variables**:
|
||||
- `PVE_USER`: Proxmox API user (typically `monitoring@pve`)
|
||||
- `PVE_PASSWORD`: API user password
|
||||
- `PVE_VERIFY_SSL`: SSL verification (false for self-signed certs)
|
||||
|
||||
## Deployment
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. **VM 101 Setup**:
|
||||
```bash
|
||||
# Install Docker and Docker Compose
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
sudo usermod -aG docker $USER
|
||||
|
||||
# Verify installation
|
||||
docker --version
|
||||
docker compose version
|
||||
```
|
||||
|
||||
2. **Proxmox API User**:
|
||||
```bash
|
||||
# On Proxmox host, create monitoring user
|
||||
pveum user add monitoring@pve
|
||||
pveum passwd monitoring@pve
|
||||
pveum aclmod / -user monitoring@pve -role PVEAuditor
|
||||
```
|
||||
|
||||
3. **Clone Repository**:
|
||||
```bash
|
||||
cd /home/jramos
|
||||
git clone <repository-url> homelab
|
||||
cd homelab/monitoring
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
1. **PVE Exporter Environment**:
|
||||
```bash
|
||||
cd pve-exporter
|
||||
nano .env
|
||||
```
|
||||
|
||||
Add:
|
||||
```env
|
||||
PVE_USER=monitoring@pve
|
||||
PVE_PASSWORD=your-secure-password
|
||||
PVE_VERIFY_SSL=false
|
||||
```
|
||||
|
||||
2. **Verify Configuration Files**:
|
||||
```bash
|
||||
# Check PVE exporter config
|
||||
cat pve-exporter/pve.yml
|
||||
|
||||
# Check Prometheus config
|
||||
cat prometheus/prometheus.yml
|
||||
```
|
||||
|
||||
### Deployment Steps
|
||||
|
||||
1. **Deploy PVE Exporter**:
|
||||
```bash
|
||||
cd /home/jramos/homelab/monitoring/pve-exporter
|
||||
docker compose up -d
|
||||
docker compose logs -f
|
||||
```
|
||||
|
||||
2. **Deploy Prometheus**:
|
||||
```bash
|
||||
cd /home/jramos/homelab/monitoring/prometheus
|
||||
docker compose up -d
|
||||
docker compose logs -f
|
||||
```
|
||||
|
||||
3. **Deploy Grafana**:
|
||||
```bash
|
||||
cd /home/jramos/homelab/monitoring/grafana
|
||||
docker compose up -d
|
||||
docker compose logs -f
|
||||
```
|
||||
|
||||
4. **Verify All Services**:
|
||||
```bash
|
||||
# Check running containers
|
||||
docker ps
|
||||
|
||||
# Test PVE Exporter
|
||||
curl http://192.168.2.114:9221/pve?target=192.168.2.200&module=default
|
||||
|
||||
# Test Prometheus
|
||||
curl http://192.168.2.114:9090/-/healthy
|
||||
|
||||
# Test Grafana
|
||||
curl http://192.168.2.114:3000/api/health
|
||||
```
|
||||
|
||||
### Initial Grafana Setup
|
||||
|
||||
1. **Access Grafana**:
|
||||
- Navigate to http://192.168.2.114:3000
|
||||
- Login with default credentials (admin/admin)
|
||||
- Change password when prompted
|
||||
|
||||
2. **Add Prometheus Data Source**:
|
||||
- Go to Configuration → Data Sources
|
||||
- Click "Add data source"
|
||||
- Select "Prometheus"
|
||||
- URL: `http://prometheus:9090`
|
||||
- Click "Save & Test"
|
||||
|
||||
3. **Import Proxmox Dashboard**:
|
||||
- Go to Dashboards → Import
|
||||
- Dashboard ID: 10347 (Proxmox VE)
|
||||
- Select Prometheus data source
|
||||
- Click "Import"
|
||||
|
||||
4. **Configure Alerting** (Optional):
|
||||
- Go to Alerting → Notification channels
|
||||
- Add email, Slack, or other notification methods
|
||||
- Create alert rules in dashboards
|
||||
|
||||
## Network Configuration
|
||||
|
||||
### Internal Access
|
||||
|
||||
All services are accessible within the homelab network:
|
||||
|
||||
- **Grafana**: http://192.168.2.114:3000
|
||||
- **Prometheus**: http://192.168.2.114:9090
|
||||
- **PVE Exporter**: http://192.168.2.114:9221
|
||||
|
||||
### External Access (via Nginx Proxy Manager)
|
||||
|
||||
Configure reverse proxy on CT 102 (nginx at 192.168.2.101):
|
||||
|
||||
1. **Create Proxy Host**:
|
||||
- Domain: `monitoring.yourdomain.com`
|
||||
- Scheme: `http`
|
||||
- Forward Hostname: `192.168.2.114`
|
||||
- Forward Port: `3000`
|
||||
|
||||
2. **SSL Configuration**:
|
||||
- Enable "Force SSL"
|
||||
- Request Let's Encrypt certificate
|
||||
- Enable HTTP/2
|
||||
|
||||
3. **Access List** (Optional):
|
||||
- Create access list for authentication
|
||||
- Apply to proxy host for additional security
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Update Services
|
||||
|
||||
```bash
|
||||
# Update all monitoring services
|
||||
cd /home/jramos/homelab/monitoring
|
||||
|
||||
# Update PVE Exporter
|
||||
cd pve-exporter
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
|
||||
# Update Prometheus
|
||||
cd ../prometheus
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
|
||||
# Update Grafana
|
||||
cd ../grafana
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Backup Grafana Dashboards
|
||||
|
||||
```bash
|
||||
# Backup Grafana data
|
||||
docker exec -t grafana tar czf - /var/lib/grafana > grafana-backup-$(date +%Y%m%d).tar.gz
|
||||
|
||||
# Or use Grafana's provisioning
|
||||
# Dashboards can be exported as JSON and stored in git
|
||||
```
|
||||
|
||||
### Prometheus Data Retention
|
||||
|
||||
```bash
|
||||
# Check Prometheus storage size
|
||||
docker exec prometheus du -sh /prometheus
|
||||
|
||||
# Adjust retention in docker-compose.yml:
|
||||
# command:
|
||||
# - '--storage.tsdb.retention.time=30d'
|
||||
# - '--storage.tsdb.retention.size=50GB'
|
||||
```
|
||||
|
||||
### View Logs
|
||||
|
||||
```bash
|
||||
# PVE Exporter logs
|
||||
cd /home/jramos/homelab/monitoring/pve-exporter
|
||||
docker compose logs -f
|
||||
|
||||
# Prometheus logs
|
||||
cd /home/jramos/homelab/monitoring/prometheus
|
||||
docker compose logs -f
|
||||
|
||||
# Grafana logs
|
||||
cd /home/jramos/homelab/monitoring/grafana
|
||||
docker compose logs -f
|
||||
|
||||
# All logs together
|
||||
docker logs -f pve-exporter
|
||||
docker logs -f prometheus
|
||||
docker logs -f grafana
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### PVE Exporter Cannot Connect to Proxmox
|
||||
|
||||
**Symptoms**: No metrics from Proxmox, connection refused errors
|
||||
|
||||
**Solutions**:
|
||||
1. Verify Proxmox API is accessible:
|
||||
```bash
|
||||
curl -k https://192.168.2.200:8006/api2/json/version
|
||||
```
|
||||
|
||||
2. Check PVE Exporter environment variables:
|
||||
```bash
|
||||
cd /home/jramos/homelab/monitoring/pve-exporter
|
||||
cat .env
|
||||
docker compose config
|
||||
```
|
||||
|
||||
3. Test authentication:
|
||||
```bash
|
||||
# From VM 101
|
||||
curl -k -d "username=monitoring@pve&password=yourpassword" \
|
||||
https://192.168.2.200:8006/api2/json/access/ticket
|
||||
```
|
||||
|
||||
4. Verify user permissions on Proxmox:
|
||||
```bash
|
||||
# On Proxmox host
|
||||
pveum user list
|
||||
pveum aclmod / -user monitoring@pve -role PVEAuditor
|
||||
```
|
||||
|
||||
### Prometheus Not Scraping Targets
|
||||
|
||||
**Symptoms**: Targets shown as down in Prometheus UI
|
||||
|
||||
**Solutions**:
|
||||
1. Check Prometheus targets:
|
||||
- Navigate to http://192.168.2.114:9090/targets
|
||||
- Verify target status and error messages
|
||||
|
||||
2. Verify network connectivity:
|
||||
```bash
|
||||
docker exec prometheus curl http://pve-exporter:9221/pve
|
||||
```
|
||||
|
||||
3. Check Prometheus configuration:
|
||||
```bash
|
||||
cd /home/jramos/homelab/monitoring/prometheus
|
||||
docker compose exec prometheus promtool check config /etc/prometheus/prometheus.yml
|
||||
```
|
||||
|
||||
4. Reload Prometheus configuration:
|
||||
```bash
|
||||
docker compose restart prometheus
|
||||
```
|
||||
|
||||
### Grafana Shows No Data
|
||||
|
||||
**Symptoms**: Dashboards display "No data" or empty graphs
|
||||
|
||||
**Solutions**:
|
||||
1. Verify Prometheus data source:
|
||||
- Go to Configuration → Data Sources
|
||||
- Test connection to Prometheus
|
||||
- URL should be `http://prometheus:9090`
|
||||
|
||||
2. Check Prometheus has data:
|
||||
- Navigate to http://192.168.2.114:9090
|
||||
- Run query: `up`
|
||||
- Should show all scrape targets
|
||||
|
||||
3. Verify dashboard queries:
|
||||
- Edit panel
|
||||
- Check PromQL query syntax
|
||||
- Test query in Prometheus UI first
|
||||
|
||||
4. Check time range:
|
||||
- Ensure dashboard time range includes recent data
|
||||
- Prometheus retention period not exceeded
|
||||
|
||||
### Docker Compose Network Issues
|
||||
|
||||
**Symptoms**: Containers cannot communicate
|
||||
|
||||
**Solutions**:
|
||||
1. Check Docker network:
|
||||
```bash
|
||||
docker network ls
|
||||
docker network inspect monitoring_default
|
||||
```
|
||||
|
||||
2. Verify container connectivity:
|
||||
```bash
|
||||
docker exec prometheus ping pve-exporter
|
||||
docker exec grafana ping prometheus
|
||||
```
|
||||
|
||||
3. Recreate network:
|
||||
```bash
|
||||
cd /home/jramos/homelab/monitoring
|
||||
docker compose down
|
||||
docker network prune
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
**Symptoms**: VM 101 running out of memory
|
||||
|
||||
**Solutions**:
|
||||
1. Check container memory usage:
|
||||
```bash
|
||||
docker stats
|
||||
```
|
||||
|
||||
2. Reduce Prometheus retention:
|
||||
```yaml
|
||||
# In prometheus/docker-compose.yml
|
||||
command:
|
||||
- '--storage.tsdb.retention.time=7d'
|
||||
- '--storage.tsdb.retention.size=10GB'
|
||||
```
|
||||
|
||||
3. Limit Grafana image rendering:
|
||||
```yaml
|
||||
# In grafana/docker-compose.yml
|
||||
environment:
|
||||
- GF_RENDERING_SERVER_URL=
|
||||
- GF_RENDERING_CALLBACK_URL=
|
||||
```
|
||||
|
||||
4. Increase VM memory allocation in Proxmox
|
||||
|
||||
### SSL/TLS Certificate Errors
|
||||
|
||||
**Symptoms**: PVE Exporter cannot verify SSL certificate
|
||||
|
||||
**Solutions**:
|
||||
1. Set `verify_ssl: false` in `pve.yml` (for self-signed certs)
|
||||
2. Or import Proxmox CA certificate:
|
||||
```bash
|
||||
# Copy CA from Proxmox to VM 101
|
||||
scp root@192.168.2.200:/etc/pve/pve-root-ca.pem .
|
||||
|
||||
# Add to trust store
|
||||
sudo cp pve-root-ca.pem /usr/local/share/ca-certificates/pve-root-ca.crt
|
||||
sudo update-ca-certificates
|
||||
```
|
||||
|
||||
## Metrics Reference
|
||||
|
||||
### Key Proxmox Metrics
|
||||
|
||||
**Node Metrics**:
|
||||
- `pve_node_cpu_usage_ratio`: CPU utilization (0-1)
|
||||
- `pve_node_memory_usage_bytes`: Memory used
|
||||
- `pve_node_memory_total_bytes`: Total memory
|
||||
- `pve_node_disk_usage_bytes`: Root disk used
|
||||
- `pve_node_uptime_seconds`: Node uptime
|
||||
|
||||
**VM/CT Metrics**:
|
||||
- `pve_guest_info`: Guest information (labels: id, name, type, node)
|
||||
- `pve_guest_cpu_usage_ratio`: Guest CPU usage
|
||||
- `pve_guest_memory_usage_bytes`: Guest memory used
|
||||
- `pve_guest_disk_read_bytes_total`: Disk read bytes
|
||||
- `pve_guest_disk_write_bytes_total`: Disk write bytes
|
||||
- `pve_guest_network_receive_bytes_total`: Network received
|
||||
- `pve_guest_network_transmit_bytes_total`: Network transmitted
|
||||
|
||||
**Storage Metrics**:
|
||||
- `pve_storage_usage_bytes`: Storage used
|
||||
- `pve_storage_size_bytes`: Total storage size
|
||||
- `pve_storage_info`: Storage information (labels: storage, type)
|
||||
|
||||
### Useful PromQL Queries
|
||||
|
||||
**CPU Usage by VM**:
|
||||
```promql
|
||||
pve_guest_cpu_usage_ratio{type="qemu"} * 100
|
||||
```
|
||||
|
||||
**Memory Usage Percentage**:
|
||||
```promql
|
||||
(pve_guest_memory_usage_bytes / pve_guest_memory_size_bytes) * 100
|
||||
```
|
||||
|
||||
**Storage Usage Percentage**:
|
||||
```promql
|
||||
(pve_storage_usage_bytes / pve_storage_size_bytes) * 100
|
||||
```
|
||||
|
||||
**Network Bandwidth (rate)**:
|
||||
```promql
|
||||
rate(pve_guest_network_transmit_bytes_total[5m])
|
||||
```
|
||||
|
||||
**Top 5 VMs by CPU**:
|
||||
```promql
|
||||
topk(5, pve_guest_cpu_usage_ratio{type="qemu"})
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### API Credentials
|
||||
|
||||
1. **PVE Exporter `.env` file**:
|
||||
- Never commit to version control
|
||||
- Use strong passwords
|
||||
- Restrict file permissions: `chmod 600 .env`
|
||||
|
||||
2. **Proxmox API User**:
|
||||
- Use dedicated monitoring user
|
||||
- Grant minimal required permissions (PVEAuditor role)
|
||||
- Consider token-based authentication
|
||||
|
||||
3. **Grafana Authentication**:
|
||||
- Change default admin password
|
||||
- Enable OAuth/LDAP for user authentication
|
||||
- Use role-based access control
|
||||
|
||||
### Network Security
|
||||
|
||||
1. **Firewall Rules**:
|
||||
```bash
|
||||
# On VM 101, restrict access
|
||||
ufw allow from 192.168.2.0/24 to any port 3000
|
||||
ufw allow from 192.168.2.0/24 to any port 9090
|
||||
ufw allow from 192.168.2.0/24 to any port 9221
|
||||
```
|
||||
|
||||
2. **Reverse Proxy**:
|
||||
- Use Nginx Proxy Manager for SSL termination
|
||||
- Implement access lists
|
||||
- Enable fail2ban for brute force protection
|
||||
|
||||
3. **Docker Security**:
|
||||
- Run containers as non-root users
|
||||
- Use read-only filesystems where possible
|
||||
- Limit container capabilities
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Prometheus Optimization
|
||||
|
||||
**Scrape Interval**:
|
||||
```yaml
|
||||
global:
|
||||
scrape_interval: 30s # Increase for less frequent scraping
|
||||
evaluation_interval: 30s
|
||||
```
|
||||
|
||||
**Target Relabeling**:
|
||||
```yaml
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
regex: '.*'
|
||||
action: keep # Keep only matching targets
|
||||
```
|
||||
|
||||
### Grafana Optimization
|
||||
|
||||
**Query Optimization**:
|
||||
- Use recording rules in Prometheus for complex queries
|
||||
- Set appropriate refresh intervals on dashboards
|
||||
- Limit time range on expensive queries
|
||||
|
||||
**Caching**:
|
||||
```ini
|
||||
# In grafana.ini or environment variables
|
||||
[caching]
|
||||
enabled = true
|
||||
ttl = 3600
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Alerting with Alertmanager
|
||||
|
||||
1. **Add Alertmanager to stack**:
|
||||
```bash
|
||||
cd /home/jramos/homelab/monitoring
|
||||
# Create alertmanager directory with docker-compose.yml
|
||||
```
|
||||
|
||||
2. **Configure alerts in Prometheus**:
|
||||
```yaml
|
||||
# In prometheus.yml
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets: ['alertmanager:9093']
|
||||
|
||||
rule_files:
|
||||
- 'alerts.yml'
|
||||
```
|
||||
|
||||
3. **Example alert rules**:
|
||||
```yaml
|
||||
# alerts.yml
|
||||
groups:
|
||||
- name: proxmox
|
||||
interval: 30s
|
||||
rules:
|
||||
- alert: HighCPUUsage
|
||||
expr: pve_node_cpu_usage_ratio > 0.9
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High CPU usage on {{ $labels.node }}"
|
||||
```
|
||||
|
||||
### Multi-Node Proxmox Cluster
|
||||
|
||||
For clustered Proxmox environments:
|
||||
|
||||
```yaml
|
||||
# In pve.yml
|
||||
cluster1:
|
||||
user: monitoring@pve
|
||||
password: ${PVE_PASSWORD}
|
||||
verify_ssl: false
|
||||
|
||||
cluster2:
|
||||
user: monitoring@pve
|
||||
password: ${PVE_PASSWORD}
|
||||
verify_ssl: false
|
||||
```
|
||||
|
||||
### Dashboard Provisioning
|
||||
|
||||
Store dashboards as code:
|
||||
|
||||
```bash
|
||||
# Create provisioning directory
|
||||
mkdir -p grafana/provisioning/dashboards
|
||||
|
||||
# Add provisioning config
|
||||
# grafana/provisioning/dashboards/dashboards.yml
|
||||
```
|
||||
|
||||
## Integration with Other Services
|
||||
|
||||
### n8n Workflow Automation
|
||||
|
||||
Create workflows in n8n (CT 113) to:
|
||||
- Send alerts to Slack/Discord based on Prometheus alerts
|
||||
- Generate daily/weekly infrastructure reports
|
||||
- Automate backup verification checks
|
||||
|
||||
### NetBox IPAM
|
||||
|
||||
Sync monitoring targets with NetBox (CT 103):
|
||||
- Automatically discover new VMs/CTs
|
||||
- Update service inventory
|
||||
- Link metrics to network documentation
|
||||
|
||||
## Additional Resources
|
||||
|
||||
### Documentation
|
||||
- [Prometheus Documentation](https://prometheus.io/docs/)
|
||||
- [Grafana Documentation](https://grafana.com/docs/)
|
||||
- [PVE Exporter GitHub](https://github.com/prometheus-pve/prometheus-pve-exporter)
|
||||
- [Proxmox API Documentation](https://pve.proxmox.com/pve-docs/api-viewer/)
|
||||
|
||||
### Community Dashboards
|
||||
- Grafana Dashboard 10347: Proxmox VE
|
||||
- Grafana Dashboard 15356: Proxmox Cluster
|
||||
- Grafana Dashboard 15362: Proxmox Summary
|
||||
|
||||
### Related Homelab Documentation
|
||||
- [Homelab Overview](../README.md)
|
||||
- [Services Documentation](../services/README.md)
|
||||
- [Infrastructure Index](../INDEX.md)
|
||||
- [n8n Setup Guide](../services/README.md#n8n-workflow-automation)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-07
|
||||
**Maintainer**: jramos
|
||||
**VM**: 101 (monitoring-docker) at 192.168.2.114
|
||||
**Stack Version**: Prometheus 2.x, Grafana 10.x, PVE Exporter latest
|
||||
@@ -132,6 +132,205 @@ cd speedtest-tracker
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Monitoring Stack (VM-based)
|
||||
|
||||
**Deployment**: VM 101 (monitoring-docker) at 192.168.2.114
|
||||
**Technology**: Docker Compose
|
||||
**Components**: Grafana, Prometheus, PVE Exporter
|
||||
|
||||
### Overview
|
||||
Comprehensive monitoring and observability stack for the Proxmox homelab environment providing real-time metrics, visualization, and alerting capabilities.
|
||||
|
||||
### Components
|
||||
|
||||
**Grafana** (Port 3000):
|
||||
- Visualization and dashboards
|
||||
- Pre-configured Proxmox VE dashboards
|
||||
- User authentication and RBAC
|
||||
- Alerting capabilities
|
||||
- Access: http://192.168.2.114:3000
|
||||
|
||||
**Prometheus** (Port 9090):
|
||||
- Metrics collection and time-series database
|
||||
- PromQL query language
|
||||
- 15-day retention (configurable)
|
||||
- Service discovery
|
||||
- Access: http://192.168.2.114:9090
|
||||
|
||||
**PVE Exporter** (Port 9221):
|
||||
- Proxmox VE metrics exporter
|
||||
- Connects to Proxmox API
|
||||
- Exports node, VM, CT, and storage metrics
|
||||
- Access: http://192.168.2.114:9221
|
||||
|
||||
### Key Features
|
||||
- Real-time Proxmox infrastructure monitoring
|
||||
- VM and container resource utilization tracking
|
||||
- Storage pool capacity planning
|
||||
- Network traffic analysis
|
||||
- Backup job status monitoring
|
||||
- Custom alerting rules
|
||||
|
||||
### Deployment
|
||||
|
||||
```bash
|
||||
# Navigate to monitoring directory
|
||||
cd /home/jramos/homelab/monitoring
|
||||
|
||||
# Deploy PVE Exporter
|
||||
cd pve-exporter
|
||||
docker compose up -d
|
||||
|
||||
# Deploy Prometheus
|
||||
cd ../prometheus
|
||||
docker compose up -d
|
||||
|
||||
# Deploy Grafana
|
||||
cd ../grafana
|
||||
docker compose up -d
|
||||
|
||||
# Verify all services
|
||||
docker ps | grep -E 'grafana|prometheus|pve-exporter'
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
**PVE Exporter**:
|
||||
- Environment file: `monitoring/pve-exporter/.env`
|
||||
- Configuration: `monitoring/pve-exporter/pve.yml`
|
||||
- Requires Proxmox API user with PVEAuditor role
|
||||
|
||||
**Prometheus**:
|
||||
- Configuration: `monitoring/prometheus/prometheus.yml`
|
||||
- Scrapes PVE Exporter every 30 seconds
|
||||
- Targets: localhost:9090, pve-exporter:9221
|
||||
|
||||
**Grafana**:
|
||||
- Default credentials: admin/admin (change on first login)
|
||||
- Data source: Prometheus at http://prometheus:9090
|
||||
- Recommended dashboard: Grafana ID 10347 (Proxmox VE)
|
||||
|
||||
### Maintenance
|
||||
|
||||
```bash
|
||||
# Update images
|
||||
cd /home/jramos/homelab/monitoring/<component>
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
|
||||
# View logs
|
||||
docker compose logs -f
|
||||
|
||||
# Restart services
|
||||
docker compose restart
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
**PVE Exporter connection issues**:
|
||||
1. Verify Proxmox API is accessible: `curl -k https://192.168.2.200:8006`
|
||||
2. Check credentials in `.env` file
|
||||
3. Verify user has PVEAuditor role: `pveum user list` (on Proxmox)
|
||||
|
||||
**Grafana shows no data**:
|
||||
1. Verify Prometheus data source configuration
|
||||
2. Check Prometheus targets: http://192.168.2.114:9090/targets
|
||||
3. Test queries in Prometheus UI before using in Grafana
|
||||
|
||||
**High memory usage**:
|
||||
1. Reduce Prometheus retention period
|
||||
2. Limit Grafana concurrent queries
|
||||
3. Increase VM 101 memory allocation
|
||||
|
||||
**Complete Documentation**: See `/home/jramos/homelab/monitoring/README.md`
|
||||
|
||||
---
|
||||
|
||||
## Twingate Connector
|
||||
|
||||
**Deployment**: CT 112 (twingate-connector)
|
||||
**Technology**: LXC Container
|
||||
**Purpose**: Zero-trust network access
|
||||
|
||||
### Overview
|
||||
Lightweight connector providing secure remote access to homelab resources without traditional VPN complexity. Part of Twingate's zero-trust network access (ZTNA) solution.
|
||||
|
||||
### Features
|
||||
- **Zero-Trust Architecture**: Grant access to specific resources, not entire networks
|
||||
- **No VPN Required**: Simplified connection without VPN client configuration
|
||||
- **Identity-Based Access**: User and device authentication
|
||||
- **Automatic Updates**: Connector auto-updates for security patches
|
||||
- **Low Resource Overhead**: Minimal CPU and memory footprint
|
||||
|
||||
### Architecture
|
||||
```
|
||||
External User → Twingate Cloud → Twingate Connector (CT 112) → Homelab Resources
|
||||
```
|
||||
|
||||
### Deployment Considerations
|
||||
|
||||
**LXC vs Docker**:
|
||||
- LXC chosen for lightweight, always-on service
|
||||
- Minimal resource consumption
|
||||
- System-level integration
|
||||
- Quick restart and recovery
|
||||
|
||||
**Network Placement**:
|
||||
- Deployed on homelab management network (192.168.2.0/24)
|
||||
- Access to all internal resources
|
||||
- No inbound port forwarding required
|
||||
|
||||
### Configuration
|
||||
|
||||
The Twingate connector is configured via the Twingate Admin Console:
|
||||
|
||||
1. **Create Connector** in Twingate Admin Console
|
||||
2. **Generate Token** for connector authentication
|
||||
3. **Deploy Container** with provided token
|
||||
4. **Configure Resources** to route through connector
|
||||
5. **Assign Users** to resources
|
||||
|
||||
### Maintenance
|
||||
|
||||
**Health Monitoring**:
|
||||
- Check connector status in Twingate Admin Console
|
||||
- Monitor CPU/memory usage on CT 112
|
||||
- Review connection logs
|
||||
|
||||
**Updates**:
|
||||
- Connector auto-updates by default
|
||||
- Manual updates: Restart container or redeploy
|
||||
|
||||
**Troubleshooting**:
|
||||
- Verify network connectivity to Twingate cloud
|
||||
- Check connector token validity
|
||||
- Review resource routing configuration
|
||||
- Ensure firewall allows outbound HTTPS
|
||||
|
||||
### Security Best Practices
|
||||
|
||||
1. **Least Privilege**: Grant access only to required resources
|
||||
2. **MFA Enforcement**: Require multi-factor authentication for users
|
||||
3. **Device Trust**: Enable device posture checks
|
||||
4. **Audit Logs**: Regularly review access logs in Twingate Console
|
||||
5. **Connector Isolation**: Consider dedicated network segment for connector
|
||||
|
||||
### Integration with Homelab
|
||||
|
||||
**Protected Resources**:
|
||||
- Proxmox Web UI (192.168.2.200:8006)
|
||||
- Grafana Monitoring (192.168.2.114:3000)
|
||||
- Nginx Proxy Manager (192.168.2.101:81)
|
||||
- n8n Workflows (192.168.2.107:5678)
|
||||
- Development VMs and services
|
||||
|
||||
**Access Policies**:
|
||||
- Admin users: Full access to all resources
|
||||
- Monitoring users: Read-only Grafana access
|
||||
- Developers: Access to dev VMs and services
|
||||
|
||||
---
|
||||
|
||||
## General Deployment Instructions
|
||||
|
||||
### Prerequisites
|
||||
@@ -308,6 +507,39 @@ Several services have embedded secrets in their docker-compose.yaml files:
|
||||
2. Verify host directory ownership: `chown -R <user>:<group> /path/to/volume`
|
||||
3. Check SELinux context (if applicable): `ls -Z /path/to/volume`
|
||||
|
||||
### Monitoring Stack Issues
|
||||
|
||||
**Metrics Not Appearing**:
|
||||
1. Verify PVE Exporter can reach Proxmox API
|
||||
2. Check Prometheus scrape targets status
|
||||
3. Ensure Grafana data source is configured correctly
|
||||
4. Review retention policies (data may be expired)
|
||||
|
||||
**Authentication Failures (PVE Exporter)**:
|
||||
1. Verify Proxmox user credentials in `.env` file
|
||||
2. Check user has PVEAuditor role
|
||||
3. Test API access: `curl -k https://192.168.2.200:8006/api2/json/version`
|
||||
|
||||
**High Resource Usage**:
|
||||
1. Adjust Prometheus retention: `--storage.tsdb.retention.time=7d`
|
||||
2. Reduce scrape frequency in prometheus.yml
|
||||
3. Limit Grafana query concurrency
|
||||
4. Increase VM 101 resources if needed
|
||||
|
||||
### Twingate Connector Issues
|
||||
|
||||
**Connector Offline**:
|
||||
1. Check CT 112 is running: `pct status 112`
|
||||
2. Verify network connectivity from container
|
||||
3. Check connector token validity in Twingate Console
|
||||
4. Review container logs for error messages
|
||||
|
||||
**Cannot Access Resources**:
|
||||
1. Verify resource is configured in Twingate Console
|
||||
2. Check user has permission to access resource
|
||||
3. Ensure connector is online and healthy
|
||||
4. Verify network routes on CT 112
|
||||
|
||||
## Migration Notes
|
||||
|
||||
### Post-Migration Tasks
|
||||
@@ -353,6 +585,7 @@ For homelab-specific questions or issues:
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-02
|
||||
**Last Updated**: 2025-12-07
|
||||
**Maintainer**: jramos
|
||||
**Repository**: http://192.168.2.102:3060/jramos/homelab
|
||||
**Infrastructure**: 10 VMs, 4 LXC Containers
|
||||
|
||||
Reference in New Issue
Block a user