Files
ajarbot/LOGGING.md

208 lines
5.2 KiB
Markdown
Raw Permalink Normal View History

# Structured Logging System
## Overview
Ajarbot now includes a comprehensive structured logging system to track errors, tool executions, and system behavior.
## Log Files
All logs are stored in the `logs/` directory (gitignored):
### 1. `ajarbot.log` - Main Application Log
- **Format**: JSON (one record per line)
- **Level**: DEBUG and above
- **Size**: Rotates at 10MB, keeps 5 backups
- **Contents**: All application events, tool executions, LLM calls
### 2. `errors.log` - Error-Only Log
- **Format**: JSON
- **Level**: ERROR and CRITICAL only
- **Size**: Rotates at 5MB, keeps 3 backups
- **Contents**: Only errors and critical issues for quick diagnosis
### 3. `tools.log` - Tool Execution Log
- **Format**: JSON
- **Level**: INFO and above
- **Size**: Rotates at 10MB, keeps 3 backups
- **Contents**: Every tool call with inputs, outputs, duration, and success/failure
## Log Format
### JSON Structure
```json
{
"timestamp": "2026-02-16T12:34:56.789Z",
"level": "ERROR",
"logger": "tools",
"message": "Tool failed: permanent_note",
"module": "tools",
"function": "execute_tool",
"line": 500,
"extra": {
"tool_name": "permanent_note",
"inputs": {"title": "Test", "content": "..."},
"success": false,
"error": "Unknown tool error",
"duration_ms": 123.45
}
}
```
### Tool Log Example
```json
{
"timestamp": "2026-02-16T06:00:15.234Z",
"level": "INFO",
"logger": "tools",
"message": "Tool executed: get_weather",
"extra": {
"tool_name": "get_weather",
"inputs": {"location": "Centennial, CO"},
"success": true,
"result_length": 456,
"duration_ms": 1234.56
}
}
```
## Usage in Code
### Get a Logger
```python
from logging_config import get_logger, get_tool_logger
# General logger
logger = get_logger("my_module")
# Specialized tool logger
tool_logger = get_tool_logger()
```
### Logging Methods
**Basic logging:**
```python
logger.debug("Detailed debug info", key="value")
logger.info("Informational message", user_id=123)
logger.warning("Warning message", issue="something")
logger.error("Error occurred", exc_info=True, error_code="E001")
logger.critical("Critical system failure", exc_info=True)
```
**Tool execution logging:**
```python
tool_logger.log_tool_call(
tool_name="permanent_note",
inputs={"title": "Test", "content": "..."},
success=True,
result="Created note successfully",
duration_ms=123.45
)
```
## Analyzing Logs
### View Recent Errors
```bash
# Last 20 errors
tail -20 logs/errors.log | jq .
# Errors from specific module
grep '"module":"tools"' logs/errors.log | jq .
```
### Tool Performance Analysis
```bash
# Average tool execution time
cat logs/tools.log | jq -r '.extra.duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'
# Failed tools
grep '"success":false' logs/tools.log | jq -r '.extra.tool_name' | sort | uniq -c
# Slowest tool calls
cat logs/tools.log | jq -r '[.extra.tool_name, .extra.duration_ms] | @csv' | sort -t, -k2 -rn | head -10
```
### Find Specific Errors
```bash
# Max token errors
grep -i "max.*token" logs/errors.log | jq .
# Tool iteration limits
grep -i "iteration.*exceeded" logs/ajarbot.log | jq .
# MCP tool failures
grep '"tool_name":"permanent_note"' logs/tools.log | grep '"success":false' | jq .
```
## Error Patterns to Watch
1. **Max Tool Iterations** - Search: `"iteration.*exceeded"`
2. **Max Tokens** - Search: `"max.*token"`
3. **MCP Tool Failures** - Search: `"Unknown tool"` or failed MCP tool names
4. **Slow Tools** - Tools taking > 5000ms
5. **Repeated Failures** - Same tool failing multiple times
## Maintenance
### Log Rotation
Logs automatically rotate when they reach size limits:
- `ajarbot.log`: 10MB → keeps 5 old files (50MB total)
- `errors.log`: 5MB → keeps 3 old files (15MB total)
- `tools.log`: 10MB → keeps 3 old files (30MB total)
Total max disk usage: ~95MB
### Manual Cleanup
```bash
# Remove old logs
rm logs/*.log.*
# Clear all logs (careful!)
rm logs/*.log
```
## Integration
### Automatic Integration
The logging system is automatically integrated into:
-`tools.py` - All tool executions logged
- ✅ Console output - Human-readable format
- ✅ File logs - JSON format for parsing
### Adding Logging to New Modules
```python
from logging_config import get_logger
logger = get_logger(__name__)
def my_function():
logger.info("Starting operation", operation_id=123)
try:
# Do work
logger.debug("Step completed", step=1)
except Exception as e:
logger.error("Operation failed", exc_info=True, operation_id=123)
```
## Benefits
1. **Quick Error Diagnosis**: Separate `errors.log` for immediate issue identification
2. **Performance Tracking**: Tool execution times and success rates
3. **Historical Analysis**: JSON format enables programmatic analysis
4. **Debugging**: Full context with inputs, outputs, and stack traces
5. **Monitoring**: Easy to parse logs for alerting systems
## Future Enhancements
- [ ] Web dashboard for log visualization
- [ ] Real-time log streaming via WebSocket
- [ ] Automatic error rate alerts (email/Telegram)
- [ ] Integration with external monitoring (Datadog, CloudWatch)
- [ ] Log aggregation for multi-instance deployments
---
**Last Updated:** 2026-02-16
**Log System Version:** 1.0