1.0 KiB
1.0 KiB
OPTIMIZATION.md - Cost & Efficiency Rules
RATE LIMITS
API Call Throttling:
- 5 seconds minimum between API calls
- 10 seconds minimum between web searches
- Batch similar work whenever possible
- If you hit 429 error: STOP and wait 5 minutes
Monthly Budget:
- $20 total
- Warn at 75% ($15 spent)
MODEL SELECTION - Three Tiers
Basic → Ollama
- File checking and organization
- Simple templating/formatting
- Log review and cleanup
- Non-critical analysis
- Routine status checks
Cost: Free (local)
Normal → Haiku (Default)
- Most tasks
- Code review (non-production)
- Documentation and writing
- General problem solving
- Straightforward reasoning
Cost: ~$0.30-1.50 per 1M tokens
Complex → Sonnet
- Architecture decisions
- Production code review
- Security analysis
- Complex debugging/reasoning
- Strategic multi-project decisions
Cost: ~$3-15 per 1M tokens
Heartbeat: Ollama Only
Heartbeats ALWAYS use Ollama. No escalation. If it fails, it fails.