Add comprehensive structured logging system

Features: - JSON-formatted logs for easy parsing and analysis - Rotating log files (prevents disk space issues) * ajarbot.log: All events, 10MB rotation, 5 backups * errors.log: Errors only, 5MB rotation, 3 backups * tools.log: Tool execution tracking, 10MB rotation, 3 backups Tool Execution Tracking: - Every tool call logged with inputs, outputs, duration - Success/failure status tracking - Performance metrics (execution time in milliseconds) - Error messages captured with full context Logging Integration: - tools.py: All tool executions automatically logged - Structured logger classes with context preservation - Console output (human-readable) + file logs (JSON) - Separate error log for quick issue identification Log Analysis: - JSON format enables programmatic analysis - Easy to search for patterns (max tokens, iterations, etc.) - Performance tracking (slow tools, failure rates) - Historical debugging with full context Documentation: - LOGGING.md: Complete usage guide - Log analysis examples with jq commands - Error pattern reference - Maintenance and integration instructions Benefits: - Quick error diagnosis with separate errors.log - Performance monitoring and optimization - Historical analysis for troubleshooting - Automatic log rotation (max 95MB total) Updated .gitignore to exclude logs/ directory Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-16 16:32:18 -07:00
parent 50cf7165cb
commit 0271dea551
4 changed files with 496 additions and 21 deletions
--- a/LOGGING.md
+++ b/LOGGING.md
@@ -0,0 +1,207 @@
+# Structured Logging System
+
+## Overview
+
+Ajarbot now includes a comprehensive structured logging system to track errors, tool executions, and system behavior.
+
+## Log Files
+
+All logs are stored in the `logs/` directory (gitignored):
+
+### 1. `ajarbot.log` - Main Application Log
+- **Format**: JSON (one record per line)
+- **Level**: DEBUG and above
+- **Size**: Rotates at 10MB, keeps 5 backups
+- **Contents**: All application events, tool executions, LLM calls
+
+### 2. `errors.log` - Error-Only Log
+- **Format**: JSON
+- **Level**: ERROR and CRITICAL only
+- **Size**: Rotates at 5MB, keeps 3 backups
+- **Contents**: Only errors and critical issues for quick diagnosis
+
+### 3. `tools.log` - Tool Execution Log
+- **Format**: JSON
+- **Level**: INFO and above
+- **Size**: Rotates at 10MB, keeps 3 backups
+- **Contents**: Every tool call with inputs, outputs, duration, and success/failure
+
+## Log Format
+
+### JSON Structure
+```json
+{
+  "timestamp": "2026-02-16T12:34:56.789Z",
+  "level": "ERROR",
+  "logger": "tools",
+  "message": "Tool failed: permanent_note",
+  "module": "tools",
+  "function": "execute_tool",
+  "line": 500,
+  "extra": {
+    "tool_name": "permanent_note",
+    "inputs": {"title": "Test", "content": "..."},
+    "success": false,
+    "error": "Unknown tool error",
+    "duration_ms": 123.45
+  }
+}
+```
+
+### Tool Log Example
+```json
+{
+  "timestamp": "2026-02-16T06:00:15.234Z",
+  "level": "INFO",
+  "logger": "tools",
+  "message": "Tool executed: get_weather",
+  "extra": {
+    "tool_name": "get_weather",
+    "inputs": {"location": "Centennial, CO"},
+    "success": true,
+    "result_length": 456,
+    "duration_ms": 1234.56
+  }
+}
+```
+
+## Usage in Code
+
+### Get a Logger
+```python
+from logging_config import get_logger, get_tool_logger
+
+# General logger
+logger = get_logger("my_module")
+
+# Specialized tool logger
+tool_logger = get_tool_logger()
+```
+
+### Logging Methods
+
+**Basic logging:**
+```python
+logger.debug("Detailed debug info", key="value")
+logger.info("Informational message", user_id=123)
+logger.warning("Warning message", issue="something")
+logger.error("Error occurred", exc_info=True, error_code="E001")
+logger.critical("Critical system failure", exc_info=True)
+```
+
+**Tool execution logging:**
+```python
+tool_logger.log_tool_call(
+    tool_name="permanent_note",
+    inputs={"title": "Test", "content": "..."},
+    success=True,
+    result="Created note successfully",
+    duration_ms=123.45
+)
+```
+
+## Analyzing Logs
+
+### View Recent Errors
+```bash
+# Last 20 errors
+tail -20 logs/errors.log | jq .
+
+# Errors from specific module
+grep '"module":"tools"' logs/errors.log | jq .
+```
+
+### Tool Performance Analysis
+```bash
+# Average tool execution time
+cat logs/tools.log | jq -r '.extra.duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'
+
+# Failed tools
+grep '"success":false' logs/tools.log | jq -r '.extra.tool_name' | sort | uniq -c
+
+# Slowest tool calls
+cat logs/tools.log | jq -r '[.extra.tool_name, .extra.duration_ms] | @csv' | sort -t, -k2 -rn | head -10
+```
+
+### Find Specific Errors
+```bash
+# Max token errors
+grep -i "max.*token" logs/errors.log | jq .
+
+# Tool iteration limits
+grep -i "iteration.*exceeded" logs/ajarbot.log | jq .
+
+# MCP tool failures
+grep '"tool_name":"permanent_note"' logs/tools.log | grep '"success":false' | jq .
+```
+
+## Error Patterns to Watch
+
+1. **Max Tool Iterations** - Search: `"iteration.*exceeded"`
+2. **Max Tokens** - Search: `"max.*token"`
+3. **MCP Tool Failures** - Search: `"Unknown tool"` or failed MCP tool names
+4. **Slow Tools** - Tools taking > 5000ms
+5. **Repeated Failures** - Same tool failing multiple times
+
+## Maintenance
+
+### Log Rotation
+Logs automatically rotate when they reach size limits:
+- `ajarbot.log`: 10MB → keeps 5 old files (50MB total)
+- `errors.log`: 5MB → keeps 3 old files (15MB total)
+- `tools.log`: 10MB → keeps 3 old files (30MB total)
+
+Total max disk usage: ~95MB
+
+### Manual Cleanup
+```bash
+# Remove old logs
+rm logs/*.log.*
+
+# Clear all logs (careful!)
+rm logs/*.log
+```
+
+## Integration
+
+### Automatic Integration
+The logging system is automatically integrated into:
+- ✅ `tools.py` - All tool executions logged
+- ✅ Console output - Human-readable format
+- ✅ File logs - JSON format for parsing
+
+### Adding Logging to New Modules
+```python
+from logging_config import get_logger
+
+logger = get_logger(__name__)
+
+def my_function():
+    logger.info("Starting operation", operation_id=123)
+    try:
+        # Do work
+        logger.debug("Step completed", step=1)
+    except Exception as e:
+        logger.error("Operation failed", exc_info=True, operation_id=123)
+```
+
+## Benefits
+
+1. **Quick Error Diagnosis**: Separate `errors.log` for immediate issue identification
+2. **Performance Tracking**: Tool execution times and success rates
+3. **Historical Analysis**: JSON format enables programmatic analysis
+4. **Debugging**: Full context with inputs, outputs, and stack traces
+5. **Monitoring**: Easy to parse logs for alerting systems
+
+## Future Enhancements
+
+- [ ] Web dashboard for log visualization
+- [ ] Real-time log streaming via WebSocket
+- [ ] Automatic error rate alerts (email/Telegram)
+- [ ] Integration with external monitoring (Datadog, CloudWatch)
+- [ ] Log aggregation for multi-instance deployments
+
+---
+
+**Last Updated:** 2026-02-16
+**Log System Version:** 1.0