
VulcanBench
381 posts

VulcanBench
@vulcanbench
Benchmarking LLMs @, focused on real world tests, large codebases, open source, full transparency.




I am starting to realize more and more that we can’t just look at, and benchmark models without comparing different effort levels. Fable 5 is what pushed me to think about this more. I’m finding Fable 5 in low and medium effort, produces the same or better output than a lot of other models at high and xhigh. At the same time, I’m experimenting with just normal routine tasks, and finding even Fable low is overkill. There are soooo many tasks that Grok Build, Composer 2.5, SWE-1.6, GLM 5.1, and other models can do, at the exact same accuracy level as Fable. And that’s comparing to Fable low, on tasks that Fable Max produces the exact same output. Yes, increasing thinking depth doesn’t mean it gets it more right, sometimes small and medium problems don’t need the most bleeding-edge frontier model in the world to reach the optimal solution. We keep benchmarking models all at the same effort levels, and I think that could be a mistake. We need to look at effort as another key variable, and optimize for a combination of model and effort, coupled with task complexity and codebase size. This is one of the things I’m thinking through more deeply with @vulcanbench which I’m going to release, open source, this weekend.


Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.



We poured our hearts into the hand-painted horizons of Planet of Lana II, our love letter to the sweeping scales and quiet wonder of classic Ghibli adventures. ✨ A new odyssey of friendship and mystery awaits. Lana and Mui are ready. Are you? 🐾 #indiegame #PlanetofLana











4 more indie games you should experience at least once.



I started learning @unity about ten years ago. It's an incredible game engine. Built five small(ish) games, just for myself, nothing I've been proud enough to share publicly. Some day I'd love to have the time to build a game that I'm proud of enough to share with the world. But as the founder of a software company, my days, nights, and weekends are spent with our amazing team, investors, and clients, and I wouldn't have it any other way. That being said, I love playing games, and continue tinkering in Unity, not because I want to make money, or get a bunch of users, but just because I love games. And for those wondering, no vibe coding in Unity yet, I'm old school, still do all my coding in Unity by hand, but likely going to play around with Codex and Opus to see what they can do. I've spent thousands of dollars on games over the years, plan to spend thousands more, and always like to put more money into indie games, because those devs are my heroes. If you go to @Official_GDC this year, make sure to spend a ton of time in the indie game section, that's my favorite spot, I usually spend 90% of my time there. Here's a photo I took at GDC back in 2022, indie game dev, walking around with a laptop, super cool game, so much fun.






