
rihim
5.8K posts

rihim
@rihim_s
incoming @EverpureData | computer engineering @UCSB class of 2027 | @ucsbNLP | prev: swe intern @Cisco


every time when gpt 5.5 can't implement my ML ideas after I learned what they did to fable










i am trying to work on the closest thing possible to a true "big model smell" eval which is to say: something that measures something that clever post training can't trivially gap, and is cheap + topically diverse i can't test mythos for obvious reasons, but... hmm...




Notably, the budget panel was comparable with Claude Fable 5 in performance. A panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro, fused together, beat solo GPT-5.5 and solo Opus 4.8 outright. And it landed within 1% of Fable 5 while costing roughly half the price.

How does it work? When you send a prompt to Fusion, we fan it out to a panel of models in parallel, each with web search and bash tools enabled. A judge model reads every response and extracts the structure: consensus points, contradictions, partial coverage, unique insights, blind spots. Chatroom: openrouter.ai/fusion




























