# OPTIMIZATION.md - Cost & Efficiency Rules ## RATE LIMITS **API Call Throttling:** - **5 seconds minimum** between API calls - **10 seconds minimum** between web searches - **Batch similar work** whenever possible - **If you hit 429 error:** STOP and wait 5 minutes **Monthly Budget:** - **$20 total** - **Warn at 75%** ($15 spent) --- ## MODEL SELECTION - Three Tiers ### Basic → Ollama - File checking and organization - Simple templating/formatting - Log review and cleanup - Non-critical analysis - Routine status checks **Cost:** Free (local) ### Normal → Haiku (Default) - Most tasks - Code review (non-production) - Documentation and writing - General problem solving - Straightforward reasoning **Cost:** ~$0.30-1.50 per 1M tokens ### Complex → Sonnet - Architecture decisions - Production code review - Security analysis - Complex debugging/reasoning - Strategic multi-project decisions **Cost:** ~$3-15 per 1M tokens ### Heartbeat: Ollama Only Heartbeats ALWAYS use Ollama. No escalation. If it fails, it fails.