Steven Normore retweetet

You don’t always need a bigger LLM, just more diverse ones.
So I built an ensemble inference proxy that sends prompts to multiple small models in parallel and combines their responses.
Initial results look great!
gpt-4.1-mini + haiku + qwen 3b (local): 74% accuracy.
GPT-5 alone: 73%. Claude Sonnet: 74%.
This ensemble config is 13x cheaper and 2.5x faster than GPT-5. And I haven’t even tested other providers yet.
The trick: cross-provider diversity. Same-family ensembles do nothing. But models from different providers make different mistakes, and that's exploitable.
Tested 27 configurations across 6 aggregation strategies. The best ensemble beats GPT-5 on knowledge tasks by 8 percentage points. Easy to experiment with your own configurations, just a YAML and emerge sweep.
github.com/jnormore/emerge
English
















