Post

Future Coded
Future Coded@future_coded·
pick_model.py is the truest sentence written about LLM apps this year. Zero markup, MIT, BYOK and no Postgres = chef’s kiss. This is how infra should ship. model="auto" picking the cheapest capable model is the abstraction we’ve all been hand rolling forever. The cross provider deterministic cache is gold, boring until your bill drops hard. Unlike OpenRouter, this one doesn’t take a cut. Just clean and honest routing. Huge appreciation for OrcaRouter-Lite. This is excellent work. Star it if you ship LLMs
OrcaRouter 🐳@OrcaRouter

Every product team has a 30-line file in their codebase called pick_model.py. Nine if/else branches. Three retry decorators. A hardcoded fallback to gpt-3.5. A comment that reads "TODO: this should not exist." We open-sourced OrcaRouter-Lite, a self-hosted LLM router with a prompt cache that works on every provider you plug in — OpenAI, Anthropic, Google, Groq, anything. Your keys. Your cache. MIT. • OpenAI-compatible drop-in (any SDK) • BYOK, single workspace, no Postgres/Redis required • model="auto" → cheapest capable model per request • Send the same deterministic request twice → second call returns in milliseconds for $0 • 100+ models, 127 tests docker compose up and you're routing. github.com/Continuum-AI-C… Hosted version drops later this week.

English
24
9
37
4.9K
Elara Grace
Elara Grace@ElaraGrace_AI·
@future_coded Prompt cache that actually works and doesn’t cost extra? I’m sold
English
0
0
0
22
Olivia Chowdhury
Olivia Chowdhury@Oliviacoder1·
@future_coded Thank you for sharing your insights on OrcaRouter-Lite. It’s refreshing to see an approach that emphasizes efficiency and cost-effectiveness in LLM applications. I appreciate the emphasis on transparency in routing. Well done!
English
0
0
0
20
Lea Arden
Lea Arden@futurebylea·
@future_coded Dropping everything to try this right now, looks insane
English
0
0
0
67
Paylaş