Fix critical performance issues: thread pool exhaustion and tool tracking

Root Cause Analysis:
- delegate_task used run_in_executor with default ThreadPoolExecutor (8-12 threads)
- Each delegation blocked one thread for 2-8 minutes (full sub-agent conversation)
- After 6-8 parallel delegations, pool exhausted → all work hung
- Tool tracking used hasattr(block, 'type') but ToolUseBlock has no .type attribute

Changes:

1. mcp_tools.py: Replace thread pool with dedicated threads
   - Each delegate_task creates dedicated daemon thread with isolated event loop
   - Uses asyncio.Future + loop.call_soon_threadsafe for result communication
   - Added semaphore to limit concurrent delegations (4 max)
   - Eliminates pool exhaustion, enables unlimited parallel delegations

2. llm_interface.py: Fix tool tracking
   - Added TextBlock/ToolUseBlock imports from claude_agent_sdk
   - Replaced hasattr(block, 'type') checks with isinstance() checks
   - Fixes tool_calls=0 bug (now correctly tracks tools used)

3. agent.py: Event loop isolation and thread safety
   - Added defensive sub_agent.llm._event_loop = None in spawn_sub_agent
   - Ensures sub-agents use asyncio.run() fallback with isolated loops
   - Generate unique agent IDs with timestamps to prevent caching race conditions

Impact:
- Fixes 6-8 message hang pattern (no more 10-minute timeouts)
- Enables parallel sub-agent execution via delegate_task
- Tool tracking now reports accurate tool usage counts
- All sub-agents remain in Agent SDK mode (as required)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-03-03 20:48:43 -07:00
parent cc7e623d74
commit a8f3ed40a8
3 changed files with 2101 additions and 27 deletions

View File

@@ -24,6 +24,7 @@ from typing import Any, Dict, List, Optional, Set
import requests
from anthropic import Anthropic
from claude_agent_sdk import TextBlock, ToolUseBlock
from usage_tracker import UsageTracker
logger = logging.getLogger(__name__)
@@ -607,12 +608,13 @@ class LLMInterface:
assistant_messages.append(message.content)
elif isinstance(message.content, list):
for block in message.content:
if hasattr(block, 'type'):
if block.type == 'text' and hasattr(block, 'text'):
assistant_messages.append(block.text)
elif block.type == 'tool_use' and hasattr(block, 'name'):
tool_names.append(block.name)
self._last_tool_names = tool_names.copy()
# Use isinstance() checks instead of hasattr(block, 'type')
# ToolUseBlock dataclass has no .type attribute
if isinstance(block, TextBlock):
assistant_messages.append(block.text)
elif isinstance(block, ToolUseBlock):
tool_names.append(block.name)
self._last_tool_names = tool_names.copy()
if isinstance(message, ResultMessage):
# DEBUG: Log what we captured during message processing