Ben Davis

3

18

1.8K

Ben Davis@davis7·22h

sry for the new studio spoiler, this clip was too good

English

0

71

2.6K

Ben Davis@davis7·22h

it went about as expected

Next podcast episode is gonna be great

English

23

12

479

36.4K

Ben Davis@davis7·1d

@complex_maths yea u can turn off reasoning. idk if u can in codex but pi and api lets u

English

8

1.3K

Jon Klaric@complex_maths·1d

@davis7 Is there a 5.5 instant mode?

English

So I did and didn't lie Ran a quick benchmark to put some actual numbers on speed vs price vs performance - GPT-5.5 low is ~2x faster than Grok 4.3 - Grok 4.3 is ~2x cheaper than GPT-5.5 low Note: Grok is over 10x cheaper than GPT-5.5 in actual token cost, but since GPT did so much less (half the tool calls on avg) it only ended up being 2x more expensive That's the main point I wanted to make: smarter models do fewer tool calls which means that even with the lowest TPS on the list, GPT-5.5 low was the fastest overall by a lot. Also means that the 10x price increase on paper, doesn't actually end up playing out in real usage. But also yea nondeterminism machines are prime for anecdotes that feel right, and usually half are (see the above). Like yesterday I was doing the same task side by side with grok and gpt. GPT ended up being cheaper than grok by a lot. And a lot of the time it probably is, there are just too many variables to be able to say something as a hard rule Only thing here I'm comfortable saying as a hard fact: smarter models call fewer tools, and fewer tool calls means faster and cheaper Also here's the source: grep.davis7.sh Repo (5.5 wrote all of it): github.com/davis7dotsh/gr…

2

1.3K

Ben Davis@davis7·1d

So I did and didn't lie Ran a quick benchmark to put some actual numbers on speed vs price vs performance - GPT-5.5 low is ~2x faster than Grok 4.3 - Grok 4.3 is ~2x cheaper than GPT-5.5 low Note: Grok is over 10x cheaper than GPT-5.5 in actual token cost, but since GPT did so much less (half the tool calls on avg) it only ended up being 2x more expensive That's the main point I wanted to make: smarter models do fewer tool calls which means that even with the lowest TPS on the list, GPT-5.5 low was the fastest overall by a lot. Also means that the 10x price increase on paper, doesn't actually end up playing out in real usage. But also yea nondeterminism machines are prime for anecdotes that feel right, and usually half are (see the above). Like yesterday I was doing the same task side by side with grok and gpt. GPT ended up being cheaper than grok by a lot. And a lot of the time it probably is, there are just too many variables to be able to say something as a hard rule Only thing here I'm comfortable saying as a hard fact: smarter models call fewer tools, and fewer tool calls means faster and cheaper Also here's the source: grep.davis7.sh Repo (5.5 wrote all of it): github.com/davis7dotsh/gr…

dax@thdxr

@davis7 you're probably right but man the fact that LLMs are fuzzy means we're all okay with everyone just making anecdotal claims it's entirely possible this isn't true

English

20

11

472

64.9K

Ben Davis@davis7·1d

x.com/davis7/status/…

Ben Davis@davis7

ZXX

10

2.6K

Ben Davis@davis7·1d

Grok 4.3 is much cheaper and faster than GPT-5.5 on paper, but in practice I've found GPT-5.5 to be cheaper and faster because it will use 1/10th the tool calls and tokens to solve the same task Smarter models => fewer tool calls => faster/cheaper

Sam Altman@sama

i keep thinking i want the models to be cheaper/faster more than i want them to be smarter but it seems that just being smarter is still the most important thing

English

22

10

607

43.7K

Ben Davis@davis7·1d

This also leads to the question: is GPT-5.5 "smarter" enough to offset the 2x cost increase from GPT-5.4 According to Dax's numbers x.com/thdxr/status/2… no For my usage 5.4 was much more expensive, I think mostly because: 1. I'm using 5.5 almost entirely on low reasoning vs I always used 5.4 on high 2. I'm making new threads way more often with the new model, thus the context doesn't balloon the way it used to 3. 5.5 is averaging fewer tool calls per turn But also I feel like Dax's numbers make sense since they're not split by reasoning level. Most people still default to high reasoning, and most people are probably doing long ass threads So is 5.5 cheaper and faster? Idk man depends on who you ask. It's getting really fucking hard to understand and compare new models. Dozens of variables that are very impactful on top of a literal probability machine...

dax@thdxr

@davis7 here's avg cost per session from 500,000 sessions across 60 days i love 5.5 but damn people are spending 10x more on it than 5.4 at the moment this probably needs time to settle and we're seeing weirdly bad cache hit ratios so we'll see where it lands

English

3

0

44

4.7K

Ben Davis@davis7·1d

@Alykkat Not sure if most counters can go that high unfortunately

English

4

690

Alykkat (on bluesky)@Alykkat·2d

I'm about buy + build an office score counter that counts how many times @davis7 says "gstack"

English

0

2

845

Ben Davis@davis7·2d

@badlogicgames My favorite company

English

17

751

Mario Zechner@badlogicgames·2d

really like this anthropic podcast. learning so much new stuff about anthropic every episode. it's a good anthropic podcast.

Third episode of Nerd Snipe is live! This time I try to get Ben in on my Anthropic conspiracy theories (and explain why Claude got dumber) 00:07:30 T3 Code Banned? 00:16:08 How Claude Caching Works 00:24:00 Why Claude Seems Dumber 00:38:03 The Conspiracies Begin...

English

12

9

486

52.1K

Ben Davis@davis7·3d

It’s true

maria@maria_rcks

the duality of man

English

6

5

538

17K

Ben Davis@davis7·3d

@daradoescode @maria_rcks Turtles > lobsters I’ll take it

English

8

102

Dara A.@daradoescode·3d

@maria_rcks multiverse

English

4

0

60

1.1K

Ben Davis retweetledi

maria@maria_rcks·3d

guys i think yall should support the podcast, it's not going well...

I'm going to be so real with you guys. The podcast is performing 10x better than I expected...

English

17

10

565

304K

Ben Davis@davis7·3d

@Glowicial ITS BARELY BEEN A MONTH (also made the updates today will put it live this weekend)

English

0

2

84

Limeglow@Glowicial·3d

@davis7 cats are cute but it is time to update your AI rankings ben

English

0

1

111

Ben Davis@davis7·3d

cats

OpenAI is getting divorced. Sam is drunk tweeting. Github is dying. Anthropic is being Anthropic. Oh, and Ben has a fun surprise :) I enjoyed this episode a lot and I hope you guys do too 00:00 - Intro 04:05 - Sam drunk tweets 13:13 - Anthropic billing woes 28:27 - OpenAI divorce 42:34 - GitHub can't stop dying 59:40 - GPT-5.5 retrospective

English

13

1

195

7.8K

Ben Davis@davis7·3d

Going live for a couple hours to catch up on vids: - gpt-5.5 - cursor sdk and a couple more things: twitch.tv/bmdavis419

English

0

19

2K

Ben Davis@davis7·3d

This is long overdue and makes an insane amount of sense Best part of cursor for a while has been the harness, very excited to stick it in random places

Cursor@cursor_ai

We’re introducing the Cursor SDK so you can build agents with the same runtime, harness, and models that power Cursor. Run agents from CI/CD pipelines, create automations for end-to-end workflows, or embed agents directly inside your products.

English

15

2

266

17K

Ben Davis@davis7·3d

@thdxr Giving Ryan money and compute is seeming like a worse and worse idea tbh

English