Alex (@Alex_m) - Hồ sơ Twitter | Zamantika Mersobahis Locabet

Alex@Alex_m·14m

@kimmonismus Yeah lol

English

0

21

Chubby♨️@kimmonismus·54m

Did Deepseek really wait until OpenAI released GPT-5.5 to steal the show?

Chubby♨️@kimmonismus

Deepseek v4 is a huge step upwards compared to DeepSeek 3, outperforms on SWE verified opus 4.6 and GPT-5.4 and sets a new record on Codeforces. Needs to be tested against opus 4.7 and GPT-5.5 tho and see if real world usage holds its promises. Big release! Sota open source model!

English

19

6

158

10.2K

Alex@Alex_m·15m

Deepseek V4 answer to the car wash question. Wasn’t tricked

English

0

57

Alex@Alex_m·6h

@sama @banteg Lmfao keep it

English

0

88

Sam Altman@sama·7h

@banteg please tell me imagegen

English

97

17

2.6K

152.7K

banteg@banteg·8h

what kind of personality did they put in gpt 5.5

English

79

76

3.4K

235.5K

Alex@Alex_m·8h

Facts

Ostap Kolinets@yklnss

@ClaudeDevs

English

0

30

Alex@Alex_m·8h

@gdb Don’t get phished please

English

0

106

Greg Brockman@gdb·8h

we're rolling codex out to whole companies/enterprises. ping me gdb@openai.com if of interest!

Sam Altman@sama

We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work. Let us know if you'd like to do it at your company!

English

43

37

762

76.5K

Alex@Alex_m·8h

The funny thing is they can’t even release Mythos now because they’re out of compute 😂😂😂

ARC Prize@arcprize

GPT-5.5-Pro on ARC-AGI (Verified) ARC-AGI-2: - Max: 82.2%, $10.76 - High: 84.6%, $10.51 GPT-5.5-Pro performs on par with GPT-5.5 on ARC-AGI for +1 OOM cost

English

0

26

Alex@Alex_m·9h

@scaling01 One is publicly available and one is only available to a handful of companies

English

0

127

Lisan al Gaib@scaling01·9h

it's safe to say that Anthropic has a massive lead over OpenAI if that really was their largest model Anthropic had Mythos since February and it's still ahead or tied in every benchmark

English

108

17

727

120.9K

Alex@Alex_m·9h

@sama You gave mythos a very good slap on cybergym 😂

English

0

31

Sam Altman@sama·9h

Really excellent work by the inference team to serve this model so efficiently! To a significant degree, we have to become an AI inference company now.

English

221

103

3.9K

163.1K

Alex@Alex_m·9h

@deedydas I don’t think they can lightly make this claim without being extremely confident. Opus is benchmaxxing lol.

English

0

4

964

Deedy@deedydas·10h

GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead. That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.

English

147

30

993

171.6K

Alex@Alex_m·9h

So Anthropic is benchmaxxing?

Chris@chatgpt21

OpenAI releases their SWE scores while noting that Anthropic reported signs of memorization on a subset of problems 😮

English

0

24

Alex@Alex_m·9h

@OpenAIDevs @TheRealAdamG Not available in codex cli?

English

0

218

OpenAI Developers@OpenAIDevs·10h

With GPT-5.5, Codex now gets more of the job done across the browser, files, docs, and your computer. We've expanded browser use so Codex can interact with web apps, and test flows, click through pages, capture screenshots, and iterate on what it sees until it completes the task.

English

96

303

3.6K

543.6K

Alex@Alex_m·10h

@theo GPT-5.5 is trying to read your tweets lol

English

0

955

Theo - t3.gg@theo·10h

My new cryptography puzzle is now live. Will pay $1,000 to the first person who DMs me the plaintext decryption of the first line. 2nd line is a hint. If you send me slop, AI hallucinations, or a decryption of the 2nd line, you are disqualified. x.com/theo/status/20…

Theo - t3.gg@theo

For no reason in particular, I made my first crypto challenge. I will pay $1,000 to whoever solves it first. Winner is whoever gets the answer into my DMs first.

English

160

5

421

216.4K

Alex@Alex_m·10h

@gdb $30/m output is expensive tho. 20% more expensive than opus 4.7

English

0

50

Greg Brockman@gdb·10h

GPT-5.5 is a new class of intelligence. This intelligence makes it intuitive to use; it completes challenging tasks with little micromanagement. Also very token efficient, and runs with low latency and at scale. A real step toward a new way of getting computer work done.

OpenAI@OpenAI

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

137

116

2K

109.8K

Alex@Alex_m·10h

@gdb $30/m output is expensive.

English

0

61

Alex@Alex_m·10h

@embirico hey 5.5 is live, enable it 😂 └ Stream disconnected before completion: The model `gpt-5.5` does not exist or you do not have access to it.

English

0

2

1.1K

Alexander Embiricos@embirico·10h

🙇

Ravi Avasarala@kagehiko

One thing I gotta hand @openai is the way they'd built the Codex team w/ @embirico and @thsottiaux.. relentless focus on taste and craft despite being caught flat footed by cc.. and we can see the results.. the vibes are immaculate. The products are great. And a lot to look forward to as a user/dev. Like Steve said.. A players attract A players.

ART

1

0

47

3.3K

Alex@Alex_m·10h

@OpenAI fuck claude

English

0

37

OpenAI@OpenAI·10h

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

1.9K

5.9K

41.9K

6.6M

Alex@Alex_m·10h

@OpenAI LET'S GO

English

0

59

Alex@Alex_m·10h

@sama @dkundel are u back yet?

English

0

92

Sam Altman@sama·10h

@dkundel this was such a weird side-plot

English

62

10

890

37.3K

dominik kundel@dkundel·10h

Literally 😂

Tibo@thsottiaux

Stay tuned, we are rebooting our office WiFi.

English

8

4

475

61.4K

Alex@Alex_m·10h

@theo They weren't even using the same version they charge us $200/month for. That's why it took a month: “we’ll ensure that a larger share of internal staff use the exact public build of Claude Code (as opposed to the version we use to test new features).”

English

0

46

1.8K

Theo - t3.gg@theo·10h

Confirmed that Claude Code got dumber, not Claude. They shipped slop and it made the models worse.

ClaudeDevs@ClaudeDevs

Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

English

143

185

3.9K

372.2K

Alex@Alex_m·10h

“we’ll ensure that a larger share of internal staff use the exact public build of Claude Code (as opposed to the version we use to test new features).” is crazy.

English

0

53

Alex@Alex_m·10h

Let me get this straight: the people building Claude Code weren't even using the same version they charge us $200/month for. That's why it took a month of paying customers screaming into the void before anyone noticed it was broken. Cool beta test, thanks for letting us fund it.

ClaudeDevs@ClaudeDevs

Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

English

1

11

2.5K

Alex

Khám phá