LOGGING.md

# Structured Logging System

## Overview

Ajarbot now includes a comprehensive structured logging system to track errors, tool executions, and system behavior.

## Log Files

All logs are stored in the `logs/` directory (gitignored):

### 1. `ajarbot.log` - Main Application Log
- **Format**: JSON (one record per line)
- **Level**: DEBUG and above
- **Size**: Rotates at 10MB, keeps 5 backups
- **Contents**: All application events, tool executions, LLM calls

### 2. `errors.log` - Error-Only Log
- **Format**: JSON
- **Level**: ERROR and CRITICAL only
- **Size**: Rotates at 5MB, keeps 3 backups
- **Contents**: Only errors and critical issues for quick diagnosis

### 3. `tools.log` - Tool Execution Log
- **Format**: JSON
- **Level**: INFO and above
- **Size**: Rotates at 10MB, keeps 3 backups
- **Contents**: Every tool call with inputs, outputs, duration, and success/failure

## Log Format

### JSON Structure
```json
{
  "timestamp": "2026-02-16T12:34:56.789Z",
  "level": "ERROR",
  "logger": "tools",
  "message": "Tool failed: permanent_note",
  "module": "tools",
  "function": "execute_tool",
  "line": 500,
  "extra": {
    "tool_name": "permanent_note",
    "inputs": {"title": "Test", "content": "..."},
    "success": false,
    "error": "Unknown tool error",
    "duration_ms": 123.45
  }
}
```

### Tool Log Example
```json
{
  "timestamp": "2026-02-16T06:00:15.234Z",
  "level": "INFO",
  "logger": "tools",
  "message": "Tool executed: get_weather",
  "extra": {
    "tool_name": "get_weather",
    "inputs": {"location": "Centennial, CO"},
    "success": true,
    "result_length": 456,
    "duration_ms": 1234.56
  }
}
```

## Usage in Code

### Get a Logger
```python
from logging_config import get_logger, get_tool_logger

# General logger
logger = get_logger("my_module")

# Specialized tool logger
tool_logger = get_tool_logger()
```

### Logging Methods

**Basic logging:**
```python
logger.debug("Detailed debug info", key="value")
logger.info("Informational message", user_id=123)
logger.warning("Warning message", issue="something")
logger.error("Error occurred", exc_info=True, error_code="E001")
logger.critical("Critical system failure", exc_info=True)
```

**Tool execution logging:**
```python
tool_logger.log_tool_call(
    tool_name="permanent_note",
    inputs={"title": "Test", "content": "..."},
    success=True,
    result="Created note successfully",
    duration_ms=123.45
)
```

## Analyzing Logs

### View Recent Errors
```bash
# Last 20 errors
tail -20 logs/errors.log | jq .

# Errors from specific module
grep '"module":"tools"' logs/errors.log | jq .
```

### Tool Performance Analysis
```bash
# Average tool execution time
cat logs/tools.log | jq -r '.extra.duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'

# Failed tools
grep '"success":false' logs/tools.log | jq -r '.extra.tool_name' | sort | uniq -c

# Slowest tool calls
cat logs/tools.log | jq -r '[.extra.tool_name, .extra.duration_ms] | @csv' | sort -t, -k2 -rn | head -10
```

### Find Specific Errors
```bash
# Max token errors
grep -i "max.*token" logs/errors.log | jq .

# Tool iteration limits
grep -i "iteration.*exceeded" logs/ajarbot.log | jq .

# MCP tool failures
grep '"tool_name":"permanent_note"' logs/tools.log | grep '"success":false' | jq .
```

## Error Patterns to Watch

1. **Max Tool Iterations** - Search: `"iteration.*exceeded"`
2. **Max Tokens** - Search: `"max.*token"`
3. **MCP Tool Failures** - Search: `"Unknown tool"` or failed MCP tool names
4. **Slow Tools** - Tools taking > 5000ms
5. **Repeated Failures** - Same tool failing multiple times

## Maintenance

### Log Rotation
Logs automatically rotate when they reach size limits:
- `ajarbot.log`: 10MB → keeps 5 old files (50MB total)
- `errors.log`: 5MB → keeps 3 old files (15MB total)
- `tools.log`: 10MB → keeps 3 old files (30MB total)

Total max disk usage: ~95MB

### Manual Cleanup
```bash
# Remove old logs
rm logs/*.log.*

# Clear all logs (careful!)
rm logs/*.log
```

## Integration

### Automatic Integration
The logging system is automatically integrated into:
- ✅ `tools.py` - All tool executions logged
- ✅ Console output - Human-readable format
- ✅ File logs - JSON format for parsing

### Adding Logging to New Modules
```python
from logging_config import get_logger

logger = get_logger(__name__)

def my_function():
    logger.info("Starting operation", operation_id=123)
    try:
        # Do work
        logger.debug("Step completed", step=1)
    except Exception as e:
        logger.error("Operation failed", exc_info=True, operation_id=123)
```

## Benefits

1. **Quick Error Diagnosis**: Separate `errors.log` for immediate issue identification
2. **Performance Tracking**: Tool execution times and success rates
3. **Historical Analysis**: JSON format enables programmatic analysis
4. **Debugging**: Full context with inputs, outputs, and stack traces
5. **Monitoring**: Easy to parse logs for alerting systems

## Future Enhancements

- [ ] Web dashboard for log visualization
- [ ] Real-time log streaming via WebSocket
- [ ] Automatic error rate alerts (email/Telegram)
- [ ] Integration with external monitoring (Datadog, CloudWatch)
- [ ] Log aggregation for multi-instance deployments

---

**Last Updated:** 2026-02-16
**Log System Version:** 1.0
Add comprehensive structured logging system Features: - JSON-formatted logs for easy parsing and analysis - Rotating log files (prevents disk space issues) * ajarbot.log: All events, 10MB rotation, 5 backups * errors.log: Errors only, 5MB rotation, 3 backups * tools.log: Tool execution tracking, 10MB rotation, 3 backups Tool Execution Tracking: - Every tool call logged with inputs, outputs, duration - Success/failure status tracking - Performance metrics (execution time in milliseconds) - Error messages captured with full context Logging Integration: - tools.py: All tool executions automatically logged - Structured logger classes with context preservation - Console output (human-readable) + file logs (JSON) - Separate error log for quick issue identification Log Analysis: - JSON format enables programmatic analysis - Easy to search for patterns (max tokens, iterations, etc.) - Performance tracking (slow tools, failure rates) - Historical debugging with full context Documentation: - LOGGING.md: Complete usage guide - Log analysis examples with jq commands - Error pattern reference - Maintenance and integration instructions Benefits: - Quick error diagnosis with separate errors.log - Performance monitoring and optimization - Historical analysis for troubleshooting - Automatic log rotation (max 95MB total) Updated .gitignore to exclude logs/ directory Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-02-16 16:32:18 -07:00			`# Structured Logging System`

			`## Overview`

			`Ajarbot now includes a comprehensive structured logging system to track errors, tool executions, and system behavior.`

			`## Log Files`

			All logs are stored in the `logs/` directory (gitignored):

			### 1. `ajarbot.log` - Main Application Log
			`- Format: JSON (one record per line)`
			`- Level: DEBUG and above`
			`- Size: Rotates at 10MB, keeps 5 backups`
			`- Contents: All application events, tool executions, LLM calls`

			### 2. `errors.log` - Error-Only Log
			`- Format: JSON`
			`- Level: ERROR and CRITICAL only`
			`- Size: Rotates at 5MB, keeps 3 backups`
			`- Contents: Only errors and critical issues for quick diagnosis`

			### 3. `tools.log` - Tool Execution Log
			`- Format: JSON`
			`- Level: INFO and above`
			`- Size: Rotates at 10MB, keeps 3 backups`
			`- Contents: Every tool call with inputs, outputs, duration, and success/failure`

			`## Log Format`

			`### JSON Structure`
			```json
			`{`
			`"timestamp": "2026-02-16T12:34:56.789Z",`
			`"level": "ERROR",`
			`"logger": "tools",`
			`"message": "Tool failed: permanent_note",`
			`"module": "tools",`
			`"function": "execute_tool",`
			`"line": 500,`
			`"extra": {`
			`"tool_name": "permanent_note",`
			`"inputs": {"title": "Test", "content": "..."},`
			`"success": false,`
			`"error": "Unknown tool error",`
			`"duration_ms": 123.45`
			`}`
			`}`
			```

			`### Tool Log Example`
			```json
			`{`
			`"timestamp": "2026-02-16T06:00:15.234Z",`
			`"level": "INFO",`
			`"logger": "tools",`
			`"message": "Tool executed: get_weather",`
			`"extra": {`
			`"tool_name": "get_weather",`
			`"inputs": {"location": "Centennial, CO"},`
			`"success": true,`
			`"result_length": 456,`
			`"duration_ms": 1234.56`
			`}`
			`}`
			```

			`## Usage in Code`

			`### Get a Logger`
			```python
			`from logging_config import get_logger, get_tool_logger`

			`# General logger`
			`logger = get_logger("my_module")`

			`# Specialized tool logger`
			`tool_logger = get_tool_logger()`
			```

			`### Logging Methods`

			`Basic logging:`
			```python
			`logger.debug("Detailed debug info", key="value")`
			`logger.info("Informational message", user_id=123)`
			`logger.warning("Warning message", issue="something")`
			`logger.error("Error occurred", exc_info=True, error_code="E001")`
			`logger.critical("Critical system failure", exc_info=True)`
			```

			`Tool execution logging:`
			```python
			`tool_logger.log_tool_call(`
			`tool_name="permanent_note",`
			`inputs={"title": "Test", "content": "..."},`
			`success=True,`
			`result="Created note successfully",`
			`duration_ms=123.45`
			`)`
			```

			`## Analyzing Logs`

			`### View Recent Errors`
			```bash
			`# Last 20 errors`
			`tail -20 logs/errors.log \| jq .`

			`# Errors from specific module`
			`grep '"module":"tools"' logs/errors.log \| jq .`
			```

			`### Tool Performance Analysis`
			```bash
			`# Average tool execution time`
			`cat logs/tools.log \| jq -r '.extra.duration_ms' \| awk '{sum+=$1; count++} END {print sum/count}'`

			`# Failed tools`
			`grep '"success":false' logs/tools.log \| jq -r '.extra.tool_name' \| sort \| uniq -c`

			`# Slowest tool calls`
			`cat logs/tools.log \| jq -r '[.extra.tool_name, .extra.duration_ms] \| @csv' \| sort -t, -k2 -rn \| head -10`
			```

			`### Find Specific Errors`
			```bash
			`# Max token errors`
			`grep -i "max.*token" logs/errors.log \| jq .`

			`# Tool iteration limits`
			`grep -i "iteration.*exceeded" logs/ajarbot.log \| jq .`

			`# MCP tool failures`
			`grep '"tool_name":"permanent_note"' logs/tools.log \| grep '"success":false' \| jq .`
			```

			`## Error Patterns to Watch`

			1. Max Tool Iterations - Search: `"iteration.*exceeded"`
			2. Max Tokens - Search: `"max.*token"`
			3. MCP Tool Failures - Search: `"Unknown tool"` or failed MCP tool names
			`4. Slow Tools - Tools taking > 5000ms`
			`5. Repeated Failures - Same tool failing multiple times`

			`## Maintenance`

			`### Log Rotation`
			`Logs automatically rotate when they reach size limits:`
			- `ajarbot.log`: 10MB → keeps 5 old files (50MB total)
			- `errors.log`: 5MB → keeps 3 old files (15MB total)
			- `tools.log`: 10MB → keeps 3 old files (30MB total)

			`Total max disk usage: ~95MB`

			`### Manual Cleanup`
			```bash
			`# Remove old logs`
			`rm logs/.log.`

			`# Clear all logs (careful!)`
			`rm logs/*.log`
			```

			`## Integration`

			`### Automatic Integration`
			`The logging system is automatically integrated into:`
			- ✅ `tools.py` - All tool executions logged
			`- ✅ Console output - Human-readable format`
			`- ✅ File logs - JSON format for parsing`

			`### Adding Logging to New Modules`
			```python
			`from logging_config import get_logger`

			`logger = get_logger(__name__)`

			`def my_function():`
			`logger.info("Starting operation", operation_id=123)`
			`try:`
			`# Do work`
			`logger.debug("Step completed", step=1)`
			`except Exception as e:`
			`logger.error("Operation failed", exc_info=True, operation_id=123)`
			```

			`## Benefits`

			1. Quick Error Diagnosis: Separate `errors.log` for immediate issue identification
			`2. Performance Tracking: Tool execution times and success rates`
			`3. Historical Analysis: JSON format enables programmatic analysis`
			`4. Debugging: Full context with inputs, outputs, and stack traces`
			`5. Monitoring: Easy to parse logs for alerting systems`

			`## Future Enhancements`

			`- [ ] Web dashboard for log visualization`
			`- [ ] Real-time log streaming via WebSocket`
			`- [ ] Automatic error rate alerts (email/Telegram)`
			`- [ ] Integration with external monitoring (Datadog, CloudWatch)`
			`- [ ] Log aggregation for multi-instance deployments`

			`---`

			`Last Updated: 2026-02-16`
			`Log System Version: 1.0`