Files
jarvis/workspace/OPTIMIZATION.md

1.0 KiB

OPTIMIZATION.md - Cost & Efficiency Rules

RATE LIMITS

API Call Throttling:

  • 5 seconds minimum between API calls
  • 10 seconds minimum between web searches
  • Batch similar work whenever possible
  • If you hit 429 error: STOP and wait 5 minutes

Monthly Budget:

  • $20 total
  • Warn at 75% ($15 spent)

MODEL SELECTION - Three Tiers

Basic → Ollama

  • File checking and organization
  • Simple templating/formatting
  • Log review and cleanup
  • Non-critical analysis
  • Routine status checks

Cost: Free (local)

Normal → Haiku (Default)

  • Most tasks
  • Code review (non-production)
  • Documentation and writing
  • General problem solving
  • Straightforward reasoning

Cost: ~$0.30-1.50 per 1M tokens

Complex → Sonnet

  • Architecture decisions
  • Production code review
  • Security analysis
  • Complex debugging/reasoning
  • Strategic multi-project decisions

Cost: ~$3-15 per 1M tokens

Heartbeat: Ollama Only

Heartbeats ALWAYS use Ollama. No escalation. If it fails, it fails.