feat(monitoring): resolve Loki-stack syslog ingestion with rsyslog filter fix
Fixed critical issue preventing UniFi router logs from reaching Loki/Promtail/Grafana. Root Cause: - rsyslog filter in /etc/rsyslog.d/unifi-router.conf filtered for 192.168.1.1 - VM 101 on VLAN 2, actual source IP is 192.168.2.1 (VLAN 2 gateway) - Filter silently rejected all incoming syslog traffic Solution: - Updated rsyslog filter from 192.168.1.1 to 192.168.2.1 - Logs now flow: UniFi → rsyslog → Promtail → Loki → Grafana Changes: - Add services/loki-stack/* - Complete Loki/Promtail/Grafana stack configs - Add services/logward/* - Logward service configuration - Update troubleshooting/loki-stack-bugfix.md - Complete 5-phase resolution - Update CLAUDE_STATUS.md - Document 2025-12-11 resolution - Update sub-agents/scribe.md - Agent improvements - Remove services/promtail-config.yml - Duplicate file cleanup Status: ✅ Monitoring stack fully operational, syslog ingestion active Technical Details: See troubleshooting/loki-stack-bugfix.md for complete analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
176
troubleshooting/loki-stack-bugfix.md
Normal file
176
troubleshooting/loki-stack-bugfix.md
Normal file
@@ -0,0 +1,176 @@
|
||||
Here is a summary of the troubleshooting session to build your centralized logging stack.
|
||||
|
||||
1. The Objective
|
||||
Create a monitoring stack on Proxmox using Loki (database) and Promtail (log collector) to ingest logs from:
|
||||
|
||||
Proxmox Host: Via TCP (Reliable).
|
||||
|
||||
UniFi Dream Router: Via UDP (Legacy RFC3164 format).
|
||||
|
||||
2. The Final Architecture
|
||||
Because Promtail strictly enforces modern log standards (RFC5424) and UniFi sends "dirty" legacy logs (RFC3164), we adopted a "Translator" Architecture.
|
||||
|
||||
UniFi Router: Sends UDP logs to the Host VM.
|
||||
|
||||
Host Rsyslog: Catches UDP, converts it to valid TCP, and forwards it to Docker.
|
||||
|
||||
Promtail: Receives clean TCP logs and pushes them to Loki.
|
||||
|
||||
3. Troubleshooting Timeline
|
||||
Phase 1: Loki Instability
|
||||
The Issue: Loki kept crashing with "Schema" and "Compactor" errors.
|
||||
|
||||
The Cause: You were using a legacy configuration file with the modern Loki v3.0 image.
|
||||
|
||||
The Fix: Updated the Loki config to use schema: v13, tsdb, and added the required delete_request_store.
|
||||
|
||||
Phase 2: Proxmox Log Ingestion (TCP)
|
||||
The Issue: Promtail threw "Parsing Errors" when receiving logs from Proxmox.
|
||||
|
||||
The Cause: Proxmox defaults to an older syslog format.
|
||||
|
||||
The Fix: Reconfigured Proxmox (/etc/rsyslog.conf) to use the template RSYSLOG_SyslogProtocol23Format (RFC5424).
|
||||
|
||||
Phase 3: The UniFi UDP Saga (The Main Blocker)
|
||||
The Issue: Promtail rejected UniFi logs.
|
||||
|
||||
Attempt 1: We added format: rfc3164 to the Promtail config.
|
||||
|
||||
Result: Crash (field format not found).
|
||||
|
||||
Attempt 2: We upgraded Promtail from v2.9 to v3.0.
|
||||
|
||||
Result: Crash persisted.
|
||||
|
||||
Discovery: Promtail v3.0 still does not support legacy format toggles in the syslog receiver.
|
||||
|
||||
The Final Fix: We moved the UDP listener out of Docker and onto the Host OS (rsyslog), letting the Host handle the "dirty" UDP work and forward clean TCP to Promtail.
|
||||
|
||||
Phase 4: The "Ghost" Configuration
|
||||
The Issue: Promtail logs showed it trying to connect to 192.168.2.25 even though your config file said http://loki:3100.
|
||||
|
||||
The Cause: Docker was holding onto an old version of the configuration file.
|
||||
|
||||
The Fix: Used docker-compose down followed by docker-compose up -d (instead of just restart) to force a refresh of the volume mounts.
|
||||
|
||||
4. The "Golden State" Configuration
|
||||
These are the settings that finally worked.
|
||||
|
||||
A. Docker Compose (docker-compose.yml)
|
||||
|
||||
Promtail Ports: Only TCP 1514:1514 mapped (UDP removed to prevent conflicts).
|
||||
|
||||
Volumes: Confirmed mapping ./promtail-config.yaml:/etc/promtail/config.yaml.
|
||||
|
||||
B. Promtail Config (promtail-config.yaml)
|
||||
|
||||
Clients: url: http://loki:3100/loki/api/v1/push (Using internal Docker DNS).
|
||||
|
||||
Scrape Config: Single job listening on tcp.
|
||||
|
||||
YAML
|
||||
|
||||
syslog:
|
||||
listen_address: 0.0.0.0:1514
|
||||
listen_protocol: tcp
|
||||
C. Host Rsyslog (/etc/rsyslog.conf)
|
||||
|
||||
Inputs: imudp enabled on port 1514.
|
||||
|
||||
Forwarding: Rule added to send all UDP traffic to 127.0.0.1:1514 via TCP.
|
||||
|
||||
---
|
||||
|
||||
## FINAL RESOLUTION - 2025-12-11
|
||||
|
||||
### Root Cause Identified
|
||||
**IP address mismatch in rsyslog forwarding filter**
|
||||
|
||||
**Problem:** `/etc/rsyslog.d/unifi-router.conf` on VM 101 was filtering for the wrong source IP
|
||||
- Filter was configured for: `192.168.1.1` (incorrect)
|
||||
- Actual source IP: `192.168.2.1` (VLAN 2 gateway interface)
|
||||
|
||||
**Explanation:** VM 101 is on VLAN 2 (192.168.2.x subnet). When the UniFi router sends syslog to 192.168.2.114, it uses its VLAN 2 interface IP (192.168.2.1) as the source address. The rsyslog filter was silently rejecting all incoming logs due to this IP mismatch.
|
||||
|
||||
### Solution Implemented
|
||||
|
||||
**File Modified:** `/etc/rsyslog.d/unifi-router.conf` on VM 101
|
||||
|
||||
**Change:**
|
||||
```bash
|
||||
# Before (WRONG):
|
||||
if $fromhost-ip == '192.168.1.1' then {
|
||||
|
||||
# After (CORRECT):
|
||||
if $fromhost-ip == '192.168.2.1' then {
|
||||
```
|
||||
|
||||
**Complete corrected configuration:**
|
||||
```bash
|
||||
# UniFi Router - VLAN 2 interface
|
||||
if $fromhost-ip == '192.168.2.1' then {
|
||||
action(type="omfwd" Target="127.0.0.1" Port="1514" Protocol="tcp" Template="RSYSLOG_SyslogProtocol23Format")
|
||||
stop
|
||||
}
|
||||
```
|
||||
|
||||
**Service restart:**
|
||||
```bash
|
||||
sudo systemctl restart rsyslog
|
||||
sudo systemctl status rsyslog
|
||||
```
|
||||
|
||||
**Result:** ✅ Logs immediately began flowing: UniFi router → rsyslog → Promtail → Loki → Grafana
|
||||
|
||||
### Verification Steps
|
||||
```bash
|
||||
# 1. Verify UDP listener (rsyslog)
|
||||
sudo ss -tulnp | grep 1514
|
||||
# Expected: udp UNCONN users:(("rsyslogd"))
|
||||
|
||||
# 2. Verify TCP listener (Promtail)
|
||||
sudo ss -tulnp | grep 1514
|
||||
# Expected: tcp LISTEN users:(("docker-proxy"))
|
||||
|
||||
# 3. Monitor Promtail ingestion
|
||||
docker logs promtail --tail 50 -f
|
||||
# Expected: "Successfully sent batch" messages
|
||||
|
||||
# 4. Test log injection
|
||||
logger -n 127.0.0.1 -P 1514 "Test from monitoring-docker host"
|
||||
```
|
||||
|
||||
### Troubleshooting Phases Summary
|
||||
|
||||
This was a **5-phase troubleshooting effort**:
|
||||
|
||||
1. **Phase 1:** Fixed Loki schema errors (v13, tsdb, delete_request_store)
|
||||
2. **Phase 2:** Fixed Proxmox log parsing (RSYSLOG_SyslogProtocol23Format)
|
||||
3. **Phase 3:** Moved UDP listener from Docker to Host rsyslog (Promtail doesn't support RFC3164)
|
||||
4. **Phase 4:** Fixed "ghost" configuration (192.168.2.25 stale config in Docker volumes)
|
||||
5. **Phase 5:** ✅ Corrected rsyslog filter IP from 192.168.1.1 to 192.168.2.1
|
||||
|
||||
### Data Flow Diagram
|
||||
```
|
||||
UniFi Router (192.168.2.1)
|
||||
↓ UDP syslog port 1514
|
||||
Host rsyslog (192.168.2.114:1514 UDP)
|
||||
↓ TCP forward (RFC5424 format)
|
||||
Docker Promtail (127.0.0.1:1514 TCP)
|
||||
↓ HTTP push
|
||||
Loki (loki:3100)
|
||||
↓ Query
|
||||
Grafana (192.168.2.114:3000)
|
||||
```
|
||||
|
||||
### Key Technical Details
|
||||
- **VLAN Topology:** VM 101 on VLAN 2, router uses 192.168.2.1 interface for that subnet
|
||||
- **rsyslog Template:** RSYSLOG_SyslogProtocol23Format (RFC5424) - required by Promtail
|
||||
- **Port Binding:** UDP 1514 (rsyslog) and TCP 1514 (Promtail) coexist on same port number, different protocols
|
||||
- **Stop Directive:** Prevents duplicate logging to local files after forwarding
|
||||
|
||||
### Status
|
||||
- **Monitoring Stack:** ✅ Fully operational
|
||||
- **Log Ingestion:** ✅ Active
|
||||
- **Grafana Dashboards:** ✅ Receiving data
|
||||
- **Resolution Date:** 2025-12-11
|
||||
Reference in New Issue
Block a user