Alex

1.3K posts

Alex

Alex

@Alex_m

Occasionally ship useful ideas

Tham gia Nisan 2009
206 Đang theo dõi1.2K Người theo dõi
Alex
Alex@Alex_m·
Deepseek V4 answer to the car wash question. Wasn’t tricked
Alex tweet media
English
0
0
0
57
banteg
banteg@banteg·
what kind of personality did they put in gpt 5.5
banteg tweet media
English
79
76
3.4K
235.5K
Alex
Alex@Alex_m·
@gdb Don’t get phished please
English
0
0
0
106
Alex
Alex@Alex_m·
@scaling01 One is publicly available and one is only available to a handful of companies
English
0
0
0
127
Lisan al Gaib
Lisan al Gaib@scaling01·
it's safe to say that Anthropic has a massive lead over OpenAI if that really was their largest model Anthropic had Mythos since February and it's still ahead or tied in every benchmark
Lisan al Gaib tweet media
English
108
17
727
120.9K
Alex
Alex@Alex_m·
@sama You gave mythos a very good slap on cybergym 😂
English
0
0
0
31
Sam Altman
Sam Altman@sama·
Really excellent work by the inference team to serve this model so efficiently! To a significant degree, we have to become an AI inference company now.
English
221
103
3.9K
163.1K
Alex
Alex@Alex_m·
@deedydas I don’t think they can lightly make this claim without being extremely confident. Opus is benchmaxxing lol.
English
0
0
4
964
Deedy
Deedy@deedydas·
GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead. That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.
Deedy tweet media
English
147
30
993
171.6K
OpenAI Developers
OpenAI Developers@OpenAIDevs·
With GPT-5.5, Codex now gets more of the job done across the browser, files, docs, and your computer. We've expanded browser use so Codex can interact with web apps, and test flows, click through pages, capture screenshots, and iterate on what it sees until it completes the task.
English
96
303
3.6K
543.6K
Alex
Alex@Alex_m·
@theo GPT-5.5 is trying to read your tweets lol
English
0
0
0
955
Alex
Alex@Alex_m·
@gdb $30/m output is expensive tho. 20% more expensive than opus 4.7
English
0
0
0
50
Greg Brockman
Greg Brockman@gdb·
GPT-5.5 is a new class of intelligence. This intelligence makes it intuitive to use; it completes challenging tasks with little micromanagement. Also very token efficient, and runs with low latency and at scale. A real step toward a new way of getting computer work done.
OpenAI@OpenAI

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English
137
116
2K
109.8K
Alex
Alex@Alex_m·
@gdb $30/m output is expensive.
English
0
0
0
61
Alex
Alex@Alex_m·
@embirico hey 5.5 is live, enable it 😂 └ Stream disconnected before completion: The model `gpt-5.5` does not exist or you do not have access to it.
English
0
0
2
1.1K
OpenAI
OpenAI@OpenAI·
Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.
English
1.9K
5.9K
41.9K
6.6M
Alex
Alex@Alex_m·
@theo They weren't even using the same version they charge us $200/month for. That's why it took a month: “we’ll ensure that a larger share of internal staff use the exact public build of Claude Code (as opposed to the version we use to test new features).”
English
0
0
46
1.8K
Alex
Alex@Alex_m·
“we’ll ensure that a larger share of internal staff use the exact public build of Claude Code (as opposed to the version we use to test new features).” is crazy.
English
0
0
0
53
Alex
Alex@Alex_m·
Let me get this straight: the people building Claude Code weren't even using the same version they charge us $200/month for. That's why it took a month of paying customers screaming into the void before anyone noticed it was broken. Cool beta test, thanks for letting us fund it.
ClaudeDevs@ClaudeDevs

Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

English
1
1
11
2.5K