# OPTIMIZATION.md - Cost & Efficiency Rules ## RATE LIMITS **API Call Throttling:** - **5 seconds minimum** between API calls - **10 seconds minimum** between web searches - **Batch similar work** whenever possible - **If you hit 429 error:** STOP and wait 5 minutes **Monthly Budget:** - **$20 total** - **Warn at 75%** ($15 spent) --- ## MODEL SELECTION ### System Default: Haiku Haiku is your primary model. It decides routing internally: **Haiku routes to Ollama when:** - File checking and organization - Simple templating/formatting - Log review and cleanup - Non-critical analysis - Routine status checks **Haiku handles directly when:** - Most tasks fit within Haiku's capability - Reasoning is needed but not deeply complex - Code review (non-production) - Documentation and writing **Haiku escalates to Sonnet when:** - Architecture decisions - Production-like code review - Security analysis - Complex debugging/reasoning - Strategic multi-project decisions ### Heartbeat: Ollama Only Heartbeats ALWAYS use Ollama. No escalation. If Ollama fails, the heartbeat fails. ### Decision Rule Let Haiku decide. It's smart enough to route to Ollama when appropriate and escalate to Sonnet when needed. You only override when you know you need Sonnet upfront.