Commit Graph

2 Commits

Author SHA1 Message Date
8c039b6cad Implement adaptive timeout system with activity-based loop detection
**Problem**: Fixed 10-minute timeout kills legitimately slow operations
(e.g., 5-minute web searches) while infinite loops waste resources.

**Solution**: Dual-timeout strategy that distinguishes slow from stuck:

1. **Idle timeout** (5 min): No progress = kill
   - Tracks message_count growth via heartbeat
   - Only resets timer when count increases
   - Slow web searches keep progressing → allowed

2. **Total timeout** (15 min): Hard cap safety net
   - Prevents runaway tasks from consuming resources forever
   - Allows legitimately slow operations to complete

3. **Loop detection**: Kills after 5+ errors
   - Tracks error_count and last_error
   - Detects repetitive failures quickly
   - Independent of time-based checks

**Key Changes**:
- SubAgentState: Add message_count, error_count tracking fields
- SubAgentManager.__init__: Dual timeout params (idle=300s, total=900s)
- SubAgentManager.update_activity: Accepts message_count, smart timer reset
- SubAgentManager.update_error: NEW - tracks errors for loop detection
- SubAgentManager.get_hung_agents: 3-check system (idle/total/loop)
- SubAgentManager.cleanup_agent: Detailed error messages by type
- agent.py heartbeat: Passes sub_agent.llm.message_count every 10s
- mcp_tools._DELEGATE_TIMEOUT: Increased to 900s (15 min)

**Impact**:
- Slow operations (5-12 min with progress) complete successfully
- Infinite loops killed in <5 min via idle timeout or error detection
- Clear diagnostics: "Idle timeout: No progress for 305s (23 messages)"
- Zero config needed - adaptive behavior works automatically

**Example**: CVE research taking 5 min with 117 messages now completes
instead of timing out at 10 min. Loop with repeated errors killed at 3 min.

See ADAPTIVE_TIMEOUT_SYSTEM.md for full specification and scenarios.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-04 17:40:58 -07:00
e909cc0044 Add MCP delegation bridge and diagram tools
**Features Added**:

1. **Agent Registry (agent_registry.py)**
   - Thread-safe global singleton for MCP tool access to Agent instance
   - Enables MCP tools to call Agent.delegate() without circular imports
   - Registered at bot startup in bot_runner.py

2. **Sub-Agent Manager (sub_agent_manager.py)**
   - Watchdog system monitoring sub-agent lifecycle
   - Detects hung agents (5min timeout, 30s check interval)
   - Auto-cleanup and status tracking

3. **delegate_task MCP Tool (mcp_tools.py)**
   - Exposes Agent.delegate() to Claude via MCP protocol
   - Enables parallel sub-agent execution via tool calls
   - Supports specialist prompts and agent ID caching

4. **Memory Write Locks (memory_system.py)**
   - Thread-safe writes to prevent file corruption
   - Protects write_memory(), update_soul(), update_user()

5. **Diagram Tools**
   - Mermaid MCP server (flowcharts, sequence diagrams, etc.)
   - Excalidraw MCP server (hand-drawn style diagrams)
   - Config files in config/ directory

6. **Adapter Improvements**
   - Enhanced error handling across all adapters
   - Unified logging patterns

**Testing**: Ready for parallel sub-agent testing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 14:34:24 -07:00