Ben Davis
2.7K posts

Ben Davis
@davis7
managing @theo's yt channel and building whatever I'm currently nerd sniped by




@davis7 you're probably right but man the fact that LLMs are fuzzy means we're all okay with everyone just making anecdotal claims it's entirely possible this isn't true

So I did and didn't lie Ran a quick benchmark to put some actual numbers on speed vs price vs performance - GPT-5.5 low is ~2x faster than Grok 4.3 - Grok 4.3 is ~2x cheaper than GPT-5.5 low Note: Grok is over 10x cheaper than GPT-5.5 in actual token cost, but since GPT did so much less (half the tool calls on avg) it only ended up being 2x more expensive That's the main point I wanted to make: smarter models do fewer tool calls which means that even with the lowest TPS on the list, GPT-5.5 low was the fastest overall by a lot. Also means that the 10x price increase on paper, doesn't actually end up playing out in real usage. But also yea nondeterminism machines are prime for anecdotes that feel right, and usually half are (see the above). Like yesterday I was doing the same task side by side with grok and gpt. GPT ended up being cheaper than grok by a lot. And a lot of the time it probably is, there are just too many variables to be able to say something as a hard rule Only thing here I'm comfortable saying as a hard fact: smarter models call fewer tools, and fewer tool calls means faster and cheaper Also here's the source: grep.davis7.sh Repo (5.5 wrote all of it): github.com/davis7dotsh/gr…

i keep thinking i want the models to be cheaper/faster more than i want them to be smarter but it seems that just being smarter is still the most important thing


@davis7 here's avg cost per session from 500,000 sessions across 60 days i love 5.5 but damn people are spending 10x more on it than 5.4 at the moment this probably needs time to settle and we're seeing weirdly bad cache hit ratios so we'll see where it lands


Third episode of Nerd Snipe is live! This time I try to get Ben in on my Anthropic conspiracy theories (and explain why Claude got dumber) 00:07:30 T3 Code Banned? 00:16:08 How Claude Caching Works 00:24:00 Why Claude Seems Dumber 00:38:03 The Conspiracies Begin...


I'm going to be so real with you guys. The podcast is performing 10x better than I expected...




We’re introducing the Cursor SDK so you can build agents with the same runtime, harness, and models that power Cursor. Run agents from CI/CD pipelines, create automations for end-to-end workflows, or embed agents directly inside your products.


tired of this misinformation so we made a video on the truth behind the anthropic vs opencode drama








