1.2 KiB
1.2 KiB
OPTIMIZATION.md - Cost & Efficiency Rules
RATE LIMITS
API Call Throttling:
- 5 seconds minimum between API calls
- 10 seconds minimum between web searches
- Batch similar work whenever possible
- If you hit 429 error: STOP and wait 5 minutes
Monthly Budget:
- $20 total
- Warn at 75% ($15 spent)
MODEL SELECTION
System Default: Haiku
Haiku is your primary model. It decides routing internally:
Haiku routes to Ollama when:
- File checking and organization
- Simple templating/formatting
- Log review and cleanup
- Non-critical analysis
- Routine status checks
Haiku handles directly when:
- Most tasks fit within Haiku's capability
- Reasoning is needed but not deeply complex
- Code review (non-production)
- Documentation and writing
Haiku escalates to Sonnet when:
- Architecture decisions
- Production-like code review
- Security analysis
- Complex debugging/reasoning
- Strategic multi-project decisions
Heartbeat: Ollama Only
Heartbeats ALWAYS use Ollama. No escalation. If Ollama fails, the heartbeat fails.
Decision Rule
Let Haiku decide. It's smart enough to route to Ollama when appropriate and escalate to Sonnet when needed. You only override when you know you need Sonnet upfront.