
Every product team has a 30-line file in their codebase called pick_model.py. Nine if/else branches. Three retry decorators. A hardcoded fallback to gpt-3.5. A comment that reads "TODO: this should not exist." We open-sourced OrcaRouter-Lite, a self-hosted LLM router with a prompt cache that works on every provider you plug in — OpenAI, Anthropic, Google, Groq, anything. Your keys. Your cache. MIT. • OpenAI-compatible drop-in (any SDK) • BYOK, single workspace, no Postgres/Redis required • model="auto" → cheapest capable model per request • Send the same deterministic request twice → second call returns in milliseconds for $0 • 100+ models, 127 tests docker compose up and you're routing. github.com/Continuum-AI-C… Hosted version drops later this week.























