**Problem**: Fixed 10-minute timeout kills legitimately slow operations
(e.g., 5-minute web searches) while infinite loops waste resources.
**Solution**: Dual-timeout strategy that distinguishes slow from stuck:
1. **Idle timeout** (5 min): No progress = kill
- Tracks message_count growth via heartbeat
- Only resets timer when count increases
- Slow web searches keep progressing → allowed
2. **Total timeout** (15 min): Hard cap safety net
- Prevents runaway tasks from consuming resources forever
- Allows legitimately slow operations to complete
3. **Loop detection**: Kills after 5+ errors
- Tracks error_count and last_error
- Detects repetitive failures quickly
- Independent of time-based checks
**Key Changes**:
- SubAgentState: Add message_count, error_count tracking fields
- SubAgentManager.__init__: Dual timeout params (idle=300s, total=900s)
- SubAgentManager.update_activity: Accepts message_count, smart timer reset
- SubAgentManager.update_error: NEW - tracks errors for loop detection
- SubAgentManager.get_hung_agents: 3-check system (idle/total/loop)
- SubAgentManager.cleanup_agent: Detailed error messages by type
- agent.py heartbeat: Passes sub_agent.llm.message_count every 10s
- mcp_tools._DELEGATE_TIMEOUT: Increased to 900s (15 min)
**Impact**:
- Slow operations (5-12 min with progress) complete successfully
- Infinite loops killed in <5 min via idle timeout or error detection
- Clear diagnostics: "Idle timeout: No progress for 305s (23 messages)"
- Zero config needed - adaptive behavior works automatically
**Example**: CVE research taking 5 min with 117 messages now completes
instead of timing out at 10 min. Loop with repeated errors killed at 3 min.
See ADAPTIVE_TIMEOUT_SYSTEM.md for full specification and scenarios.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>