48 lines
1.0 KiB
Markdown
48 lines
1.0 KiB
Markdown
# OPTIMIZATION.md - Cost & Efficiency Rules
|
|
|
|
## RATE LIMITS
|
|
|
|
**API Call Throttling:**
|
|
- **5 seconds minimum** between API calls
|
|
- **10 seconds minimum** between web searches
|
|
- **Batch similar work** whenever possible
|
|
- **If you hit 429 error:** STOP and wait 5 minutes
|
|
|
|
**Monthly Budget:**
|
|
- **$20 total**
|
|
- **Warn at 75%** ($15 spent)
|
|
|
|
---
|
|
|
|
## MODEL SELECTION - Three Tiers
|
|
|
|
### Basic → Ollama
|
|
- File checking and organization
|
|
- Simple templating/formatting
|
|
- Log review and cleanup
|
|
- Non-critical analysis
|
|
- Routine status checks
|
|
|
|
**Cost:** Free (local)
|
|
|
|
### Normal → Haiku (Default)
|
|
- Most tasks
|
|
- Code review (non-production)
|
|
- Documentation and writing
|
|
- General problem solving
|
|
- Straightforward reasoning
|
|
|
|
**Cost:** ~$0.30-1.50 per 1M tokens
|
|
|
|
### Complex → Sonnet
|
|
- Architecture decisions
|
|
- Production code review
|
|
- Security analysis
|
|
- Complex debugging/reasoning
|
|
- Strategic multi-project decisions
|
|
|
|
**Cost:** ~$3-15 per 1M tokens
|
|
|
|
### Heartbeat: Ollama Only
|
|
Heartbeats ALWAYS use Ollama. No escalation. If it fails, it fails.
|