Ben Davis

2.7K posts

Ben Davis banner
Ben Davis

Ben Davis

@davis7

managing @theo's yt channel and building whatever I'm currently nerd sniped by

San Francisco Katılım Temmuz 2022
436 Takip Edilen11.7K Takipçiler
Ben Davis
Ben Davis@davis7·
sry for the new studio spoiler, this clip was too good
English
1
0
71
2.6K
Ben Davis
Ben Davis@davis7·
@complex_maths yea u can turn off reasoning. idk if u can in codex but pi and api lets u
English
0
0
8
1.3K
Ben Davis
Ben Davis@davis7·
So I did and didn't lie Ran a quick benchmark to put some actual numbers on speed vs price vs performance - GPT-5.5 low is ~2x faster than Grok 4.3 - Grok 4.3 is ~2x cheaper than GPT-5.5 low Note: Grok is over 10x cheaper than GPT-5.5 in actual token cost, but since GPT did so much less (half the tool calls on avg) it only ended up being 2x more expensive That's the main point I wanted to make: smarter models do fewer tool calls which means that even with the lowest TPS on the list, GPT-5.5 low was the fastest overall by a lot. Also means that the 10x price increase on paper, doesn't actually end up playing out in real usage. But also yea nondeterminism machines are prime for anecdotes that feel right, and usually half are (see the above). Like yesterday I was doing the same task side by side with grok and gpt. GPT ended up being cheaper than grok by a lot. And a lot of the time it probably is, there are just too many variables to be able to say something as a hard rule Only thing here I'm comfortable saying as a hard fact: smarter models call fewer tools, and fewer tool calls means faster and cheaper Also here's the source: grep.davis7.sh Repo (5.5 wrote all of it): github.com/davis7dotsh/gr…
Ben Davis tweet media
dax@thdxr

@davis7 you're probably right but man the fact that LLMs are fuzzy means we're all okay with everyone just making anecdotal claims it's entirely possible this isn't true

English
20
11
472
64.9K
Ben Davis
Ben Davis@davis7·
This also leads to the question: is GPT-5.5 "smarter" enough to offset the 2x cost increase from GPT-5.4 According to Dax's numbers x.com/thdxr/status/2… no For my usage 5.4 was much more expensive, I think mostly because: 1. I'm using 5.5 almost entirely on low reasoning vs I always used 5.4 on high 2. I'm making new threads way more often with the new model, thus the context doesn't balloon the way it used to 3. 5.5 is averaging fewer tool calls per turn But also I feel like Dax's numbers make sense since they're not split by reasoning level. Most people still default to high reasoning, and most people are probably doing long ass threads So is 5.5 cheaper and faster? Idk man depends on who you ask. It's getting really fucking hard to understand and compare new models. Dozens of variables that are very impactful on top of a literal probability machine...
Ben Davis tweet media
dax@thdxr

@davis7 here's avg cost per session from 500,000 sessions across 60 days i love 5.5 but damn people are spending 10x more on it than 5.4 at the moment this probably needs time to settle and we're seeing weirdly bad cache hit ratios so we'll see where it lands

English
3
0
44
4.7K
Ben Davis
Ben Davis@davis7·
@Alykkat Not sure if most counters can go that high unfortunately
English
0
0
4
690
Alykkat (on bluesky)
I'm about buy + build an office score counter that counts how many times @davis7 says "gstack"
English
1
0
2
845
Ben Davis
Ben Davis@davis7·
@Glowicial ITS BARELY BEEN A MONTH (also made the updates today will put it live this weekend)
English
1
0
2
84
Limeglow
Limeglow@Glowicial·
@davis7 cats are cute but it is time to update your AI rankings ben
English
1
0
1
111
Ben Davis
Ben Davis@davis7·
Going live for a couple hours to catch up on vids: - gpt-5.5 - cursor sdk and a couple more things: twitch.tv/bmdavis419
English
1
0
19
2K
Ben Davis
Ben Davis@davis7·
@thdxr Giving Ryan money and compute is seeming like a worse and worse idea tbh
English
0
0
15
1.1K
Ben Davis retweetledi
Theo - t3.gg
Theo - t3.gg@theo·
A letter to my friends at Anthropic I hate that I feel obligated to do this. I hate that I've had to be so harsh towards Anthropic for the past few months. I really, really don't want to. I know it might feel like I'm doing this for clicks or something, but I promise I'm not. My pro-Anthropic content ALWAYS outperforms my anti-Anthropic content. I have cost myself a lot of money, opportunities, sponsors, and more. I'm doing this because you work for an evil cult. I'm begging you to wake up. Your CEO, Dario, does not respect engineers. This is obvious. He couldn't make it more obvious if he tried (and I think he's trying pretty hard) You know this, but you don't want to acknowledge it. It has kept you up many nights. You know that bad code is shipping to users. You know that one bad tweet might get you fired. You fear for your vesting schedules. You're afraid. Nobody deserves what you're going through right now. You go to work afraid, you leave work afraid, and you go to YouTube to keep up on the dev world, just to hear me yelling all about how evil your company is. You deserve better. You might not feel like you do, but you know deep down that this isn't right. I hope you know how deeply I feel for you. I'm sorry. I know I haven't helped you much individually, and I want to be better about this. If you're ready to leave, please hit me up. I swear I'll never tell a soul. I have friends at every lab and most startups in the AI world. Most of them would be down to match your current vesting schedules, possibly even go beyond. If you're staying for the money, I beg you to hit me up. We can make the money happen somewhere that hates you less. I know I'm asking for a lot of trust here, and that you're scared after seeing how hard I've been on Anthropic. I can't blame you at all for that. I should have posted something like this months ago. That's my failure to own and I will own it to my best ability. If you're willing to trust me in this moment, I can make it right. Let me help you escape. You deserve to work somewhere that you can have impact. Somewhere that listens when you feel something is wrong. Somewhere that won't fire you when you point out the things that hurt your users. My DMs are always open to you. When you're ready, let me know. I promise to make it right.
English
257
345
7.5K
1.6M
Ben Davis retweetledi
sunil pai
sunil pai@threepointone·
didn't expect to get here, but 5.5 is just better than 4.7 now for me. faster, smarter, and doesn't try to do too much. I don't need xhigh or whatever, I prefer the shorter hops and letting me participate.
English
48
21
786
44.3K