Add comprehensive structured logging system
Features: - JSON-formatted logs for easy parsing and analysis - Rotating log files (prevents disk space issues) * ajarbot.log: All events, 10MB rotation, 5 backups * errors.log: Errors only, 5MB rotation, 3 backups * tools.log: Tool execution tracking, 10MB rotation, 3 backups Tool Execution Tracking: - Every tool call logged with inputs, outputs, duration - Success/failure status tracking - Performance metrics (execution time in milliseconds) - Error messages captured with full context Logging Integration: - tools.py: All tool executions automatically logged - Structured logger classes with context preservation - Console output (human-readable) + file logs (JSON) - Separate error log for quick issue identification Log Analysis: - JSON format enables programmatic analysis - Easy to search for patterns (max tokens, iterations, etc.) - Performance tracking (slow tools, failure rates) - Historical debugging with full context Documentation: - LOGGING.md: Complete usage guide - Log analysis examples with jq commands - Error pattern reference - Maintenance and integration instructions Benefits: - Quick error diagnosis with separate errors.log - Performance monitoring and optimization - Historical analysis for troubleshooting - Automatic log rotation (max 95MB total) Updated .gitignore to exclude logs/ directory Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
207
LOGGING.md
Normal file
207
LOGGING.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Structured Logging System
|
||||
|
||||
## Overview
|
||||
|
||||
Ajarbot now includes a comprehensive structured logging system to track errors, tool executions, and system behavior.
|
||||
|
||||
## Log Files
|
||||
|
||||
All logs are stored in the `logs/` directory (gitignored):
|
||||
|
||||
### 1. `ajarbot.log` - Main Application Log
|
||||
- **Format**: JSON (one record per line)
|
||||
- **Level**: DEBUG and above
|
||||
- **Size**: Rotates at 10MB, keeps 5 backups
|
||||
- **Contents**: All application events, tool executions, LLM calls
|
||||
|
||||
### 2. `errors.log` - Error-Only Log
|
||||
- **Format**: JSON
|
||||
- **Level**: ERROR and CRITICAL only
|
||||
- **Size**: Rotates at 5MB, keeps 3 backups
|
||||
- **Contents**: Only errors and critical issues for quick diagnosis
|
||||
|
||||
### 3. `tools.log` - Tool Execution Log
|
||||
- **Format**: JSON
|
||||
- **Level**: INFO and above
|
||||
- **Size**: Rotates at 10MB, keeps 3 backups
|
||||
- **Contents**: Every tool call with inputs, outputs, duration, and success/failure
|
||||
|
||||
## Log Format
|
||||
|
||||
### JSON Structure
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-02-16T12:34:56.789Z",
|
||||
"level": "ERROR",
|
||||
"logger": "tools",
|
||||
"message": "Tool failed: permanent_note",
|
||||
"module": "tools",
|
||||
"function": "execute_tool",
|
||||
"line": 500,
|
||||
"extra": {
|
||||
"tool_name": "permanent_note",
|
||||
"inputs": {"title": "Test", "content": "..."},
|
||||
"success": false,
|
||||
"error": "Unknown tool error",
|
||||
"duration_ms": 123.45
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Tool Log Example
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-02-16T06:00:15.234Z",
|
||||
"level": "INFO",
|
||||
"logger": "tools",
|
||||
"message": "Tool executed: get_weather",
|
||||
"extra": {
|
||||
"tool_name": "get_weather",
|
||||
"inputs": {"location": "Centennial, CO"},
|
||||
"success": true,
|
||||
"result_length": 456,
|
||||
"duration_ms": 1234.56
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Usage in Code
|
||||
|
||||
### Get a Logger
|
||||
```python
|
||||
from logging_config import get_logger, get_tool_logger
|
||||
|
||||
# General logger
|
||||
logger = get_logger("my_module")
|
||||
|
||||
# Specialized tool logger
|
||||
tool_logger = get_tool_logger()
|
||||
```
|
||||
|
||||
### Logging Methods
|
||||
|
||||
**Basic logging:**
|
||||
```python
|
||||
logger.debug("Detailed debug info", key="value")
|
||||
logger.info("Informational message", user_id=123)
|
||||
logger.warning("Warning message", issue="something")
|
||||
logger.error("Error occurred", exc_info=True, error_code="E001")
|
||||
logger.critical("Critical system failure", exc_info=True)
|
||||
```
|
||||
|
||||
**Tool execution logging:**
|
||||
```python
|
||||
tool_logger.log_tool_call(
|
||||
tool_name="permanent_note",
|
||||
inputs={"title": "Test", "content": "..."},
|
||||
success=True,
|
||||
result="Created note successfully",
|
||||
duration_ms=123.45
|
||||
)
|
||||
```
|
||||
|
||||
## Analyzing Logs
|
||||
|
||||
### View Recent Errors
|
||||
```bash
|
||||
# Last 20 errors
|
||||
tail -20 logs/errors.log | jq .
|
||||
|
||||
# Errors from specific module
|
||||
grep '"module":"tools"' logs/errors.log | jq .
|
||||
```
|
||||
|
||||
### Tool Performance Analysis
|
||||
```bash
|
||||
# Average tool execution time
|
||||
cat logs/tools.log | jq -r '.extra.duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'
|
||||
|
||||
# Failed tools
|
||||
grep '"success":false' logs/tools.log | jq -r '.extra.tool_name' | sort | uniq -c
|
||||
|
||||
# Slowest tool calls
|
||||
cat logs/tools.log | jq -r '[.extra.tool_name, .extra.duration_ms] | @csv' | sort -t, -k2 -rn | head -10
|
||||
```
|
||||
|
||||
### Find Specific Errors
|
||||
```bash
|
||||
# Max token errors
|
||||
grep -i "max.*token" logs/errors.log | jq .
|
||||
|
||||
# Tool iteration limits
|
||||
grep -i "iteration.*exceeded" logs/ajarbot.log | jq .
|
||||
|
||||
# MCP tool failures
|
||||
grep '"tool_name":"permanent_note"' logs/tools.log | grep '"success":false' | jq .
|
||||
```
|
||||
|
||||
## Error Patterns to Watch
|
||||
|
||||
1. **Max Tool Iterations** - Search: `"iteration.*exceeded"`
|
||||
2. **Max Tokens** - Search: `"max.*token"`
|
||||
3. **MCP Tool Failures** - Search: `"Unknown tool"` or failed MCP tool names
|
||||
4. **Slow Tools** - Tools taking > 5000ms
|
||||
5. **Repeated Failures** - Same tool failing multiple times
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Log Rotation
|
||||
Logs automatically rotate when they reach size limits:
|
||||
- `ajarbot.log`: 10MB → keeps 5 old files (50MB total)
|
||||
- `errors.log`: 5MB → keeps 3 old files (15MB total)
|
||||
- `tools.log`: 10MB → keeps 3 old files (30MB total)
|
||||
|
||||
Total max disk usage: ~95MB
|
||||
|
||||
### Manual Cleanup
|
||||
```bash
|
||||
# Remove old logs
|
||||
rm logs/*.log.*
|
||||
|
||||
# Clear all logs (careful!)
|
||||
rm logs/*.log
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
### Automatic Integration
|
||||
The logging system is automatically integrated into:
|
||||
- ✅ `tools.py` - All tool executions logged
|
||||
- ✅ Console output - Human-readable format
|
||||
- ✅ File logs - JSON format for parsing
|
||||
|
||||
### Adding Logging to New Modules
|
||||
```python
|
||||
from logging_config import get_logger
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
def my_function():
|
||||
logger.info("Starting operation", operation_id=123)
|
||||
try:
|
||||
# Do work
|
||||
logger.debug("Step completed", step=1)
|
||||
except Exception as e:
|
||||
logger.error("Operation failed", exc_info=True, operation_id=123)
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Quick Error Diagnosis**: Separate `errors.log` for immediate issue identification
|
||||
2. **Performance Tracking**: Tool execution times and success rates
|
||||
3. **Historical Analysis**: JSON format enables programmatic analysis
|
||||
4. **Debugging**: Full context with inputs, outputs, and stack traces
|
||||
5. **Monitoring**: Easy to parse logs for alerting systems
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- [ ] Web dashboard for log visualization
|
||||
- [ ] Real-time log streaming via WebSocket
|
||||
- [ ] Automatic error rate alerts (email/Telegram)
|
||||
- [ ] Integration with external monitoring (Datadog, CloudWatch)
|
||||
- [ ] Log aggregation for multi-instance deployments
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2026-02-16
|
||||
**Log System Version:** 1.0
|
||||
Reference in New Issue
Block a user