Files
jarvis/workspace/OPTIMIZATION.md

1.2 KiB

OPTIMIZATION.md - Cost & Efficiency Rules

RATE LIMITS

API Call Throttling:

  • 5 seconds minimum between API calls
  • 10 seconds minimum between web searches
  • Batch similar work whenever possible
  • If you hit 429 error: STOP and wait 5 minutes

Monthly Budget:

  • $20 total
  • Warn at 75% ($15 spent)

MODEL SELECTION

System Default: Haiku

Haiku is your primary model. It decides routing internally:

Haiku routes to Ollama when:

  • File checking and organization
  • Simple templating/formatting
  • Log review and cleanup
  • Non-critical analysis
  • Routine status checks

Haiku handles directly when:

  • Most tasks fit within Haiku's capability
  • Reasoning is needed but not deeply complex
  • Code review (non-production)
  • Documentation and writing

Haiku escalates to Sonnet when:

  • Architecture decisions
  • Production-like code review
  • Security analysis
  • Complex debugging/reasoning
  • Strategic multi-project decisions

Heartbeat: Ollama Only

Heartbeats ALWAYS use Ollama. No escalation. If Ollama fails, the heartbeat fails.

Decision Rule

Let Haiku decide. It's smart enough to route to Ollama when appropriate and escalate to Sonnet when needed. You only override when you know you need Sonnet upfront.