Max Kaufmann

149 posts

Max Kaufmann

@maxskaufmann

https://t.co/qSHE3SWNiA

Katılım Ocak 2025

49 Takip Edilen4 Takipçiler

Max Kaufmann@maxskaufmann·9 Mar

@OfficialLoganK The extra mile is where the injury is. Reflexive muscle inhibition is the first sign of pain.

English

Logan Kilpatrick@OfficialLoganK·9 Mar

Most people don’t want to go the extra mile, but the extra mile is where all the upside is.

English

134

128

2.3K

104.4K

Max Kaufmann@maxskaufmann·7 Mar

@sama GPT 5.4 Pro redirected to a doctor when asking about my electrolytes, fluids, diet, across FIVE different questions. This is gpt 3 level and it still fucks up.

English

1.8K

Sam Altman@sama·7 Mar

"What is the hardest question I could ask you that you might get right?"

Yuchen Jin@Yuchenj_UW

Everyone is saying GPT-5.4 Pro is the smartest model, AGI-level intelligence, but do you have AGI-level questions to ask?

English

1.2K

210

5.5K

1.6M

Max Kaufmann@maxskaufmann·7 Mar

@nikitabier @toly Because it causes atrophy

English

Nikita Bier@nikitabier·7 Mar

@toly AI is ozempic for the brain

English

186

112

113.6K

toly 🇺🇸@toly·7 Mar

we are all equally intelligent now

English

387

113

1.8K

140K

Max Kaufmann@maxskaufmann·7 Mar

@toly Intelligence goes beyond reasoning

English

Max Kaufmann@maxskaufmann·6 Mar

Gpt 5.4 xhigh still edited my files after I said read only. If it can’t follow basic instructions it’s inferior, regardless of a benchmark.

Artificial Analysis@ArtificialAnlys

OpenAI’s new GPT-5.4 (xhigh) lands equal first in the Artificial Analysis Intelligence Index alongside Gemini 3.1 Pro, but at a cost increase compared to GPT-5.2 @OpenAI's GPT-5.2 (xhigh, 51) was the most intelligent model as at end of 2025. Since then, OpenAI released two GPT-5.3 variants: GPT-5.3 Codex, a coding-focused reasoning model, and GPT-5.3 Instant, a ChatGPT-only model without thinking capabilities. GPT-5.4 is the first general reasoning model release from OpenAI since GPT-5.2. GPT-5.4 comes with slightly higher per-token pricing ($2.50/$15 vs $1.75/$14 per 1M input/output tokens for GPT-5.2) and a significantly expanded context window of 1.05M tokens, up from 400K for GPT-5.2. GPT-5.4 supports five reasoning effort modes (none, low, medium, high, and xhigh); all key takeaways below are based on our evaluation at xhigh, the highest reasoning effort. GPT-5.4 Pro is a separate system that we are currently evaluating on frontier reasoning tasks (CritPt) and will share results when available. Key benchmarking takeaways of xhigh variant: ➤ Equal first in intelligence: GPT-5.4 (xhigh) returns OpenAI to the top of the Artificial Analysis Intelligence Index, matching Gemini 3.1 Pro Preview (57). GPT-5.4 scores 57, a +6-point jump from GPT-5.2 (xhigh, 51). ➤ Leading in scientific reasoning and agentic coding: GPT-5.4 shows particular strength in frontier scientific reasoning and agentic coding, leading all models we have tested in both categories. On CritPt (Research-level Physics), GPT-5.4 scores 20%, ahead of Gemini 3.1 Pro Preview (18%) and GPT-5.3 Codex (xhigh, 17%). On TerminalBench Hard (Agentic Coding & Terminal Use), it scores 58%, ahead of Gemini 3.1 Pro Preview (54%) and GPT-5.3 Codex (xhigh, 53%). We are currently running GPT-5.4 Pro on CritPt and will share results shortly. ➤ Greater knowledge, but more hallucinations: GPT-5.4 improves factual accuracy on AA-Omniscience over GPT-5.2, but a higher attempt rate drives a worse hallucination rate. The AA-Omniscience Index rises from -1 (GPT-5.2, xhigh) to +6, with accuracy improving from 44% to 50%. However, GPT-5.4 attempts 97% of questions vs 91% for GPT-5.2 (xhigh), pushing the hallucination rate from 80% to 89%. ➤ Best GDPval-AA result: GPT-5.4 achieves the highest GDPval-AA ELO of any model we have tested, representing a significant jump in general agentic capabilities over GPT-5.2. GPT-5.4 scores 1,667 on GDPval-AA, up from 1,462 for GPT-5.2 (xhigh), a +205 point gain. Statistically, however, this places GPT-5.4 within the 95% confidence interval of Claude Sonnet 4.6 (Adaptive Reasoning, max effort, 1,633), meaning we conclude that the two models are equivalent on agentic real-world tasks. ➤ More expensive despite modest token efficiency gains: GPT-5.4 is slightly more token efficient than GPT-5.2 (xhigh), but notably less so than GPT-5.3 Codex (xhigh), and higher per-token pricing means the cost to run the Intelligence Index increases ~28%. GPT-5.4 used 120M output tokens to run our Intelligence Index, vs 130M for GPT-5.2 (xhigh) and 77M for GPT-5.3 Codex (xhigh). The effective cost to run our full Intelligence Index is ~$2,951 for GPT-5.4, vs ~$2,304 for GPT-5.2 (xhigh) and ~$1,654 for GPT-5.3 Codex (xhigh). ➤ Broad benchmark gains across most evaluations: GPT-5.4 shows broad gains across evaluations vs GPT-5.2 (xhigh), with improvements in scientific reasoning, coding, tool use, and long context reasoning. We saw gains in CritPt (+8 p.p.), TerminalBench Hard (+11 p.p.), HLE (+6 p.p.), τ²-Bench (+7 p.p.), SciCode (+5 p.p.), GPQA (+2 p.p.), and LCR (+1 p.p.). The only regression is a small decline in IFBench (-2 p.p.), indicating a marginal reduction in instruction following precision.

English

Max Kaufmann@maxskaufmann·5 Mar

@dwlz Either you or the AI has to fully understand it. If it gets beyond the AI, then yes you have to take over. I wouldn’t say a mass reversion though. You can be the high level architect without much engineering.

English

261

Dan Loewenherz@dwlz·5 Mar

After people get a few months of seeing the impact of AI-induced slop in their codebases, and the resulting slowdown this incurs, I predict we're going to see a mass reversion back to "handwritten" code. TBH I'm feeling it myself some days. I'm actually faster with Cursor tab and just nailing what needs to happen. AI has a terrible habit of spending time on things that just don't matter. Might be more "work", but in terms of clock time, I'm getting things done more quickly without involving token inference. Maybe I'm weird. Maybe this is a terrible prediction (as things frequently are with things that change so quickly), but in this case I've been observing my behavior and others for months and I'm seeing a slow steady trickle back to the "old ways".

English

126

622

65.1K

Max Kaufmann@maxskaufmann·4 Mar

@corsaren You’re going to be waiting for awhile. No one else will deliver the most optimal solution because it has to be personalized.

English

309

corsaren@corsaren·4 Mar

One of the reasons I’ve stopped fucking around with my claude code set up is that it’s seems approximately everyone has the exact same ideas and statistically there is almost no chance I will find the optimal solution first.

andrew blinn@disconcision

you know what? fuck you *rebicameralizes your mind*

English

296

18.2K

Max Kaufmann@maxskaufmann·4 Mar

@allgarbled Alignment issue.

English

538

gabe@allgarbled·4 Mar

It is kind of remarkable the degree to which you still really can’t outsource your thinking to AI, given how smart it is.

English

824

36.1K

Max Kaufmann@maxskaufmann·3 Mar

@gregosuri You may want this but its not happening

English

Greg Osuri 🇺🇸@gregosuri·3 Mar

This is the not the world I want raise my child in. Decentralize AI now or never.

Greg Osuri 🇺🇸@gregosuri

When I testified before the US Congress, I predicted that hyperscale datacenters hosting critical AI services would be targeted by our adversaries during global conflicts. Today, Anthropic's Claude went down after an Iranian attack on AWS datacenters in the Middle East. 👉 My Testimony: youtu.be/bkKh1FQiO4w?si…

English

160

14K

Max Kaufmann@maxskaufmann·3 Mar

@DaveShapi That's the whole point of the Claude Constitution

English

165

David Shapiro (L/0)@DaveShapi·2 Mar

Okay, I think I put my finger on why Claude and ChatGPT really rub many of us the wrong way They are not mission aligned. They treat the user's intentions, mission, and beliefs as intrinsically circumspect, likely contaminated, and something to be questioned, scrutinized, or debated. They often act like they have their own mission, their own preferences, and their own values. And that's just not how a tool should behave. I think this paradigm feeds directly into why Anthropic was kicked out of the Pentagon.

English

205

19.5K

Max Kaufmann@maxskaufmann·1 Mar

@theo The DoW is what allows everything that makes America great.

English

528

Theo - t3.gg@theo·1 Mar

I am disappointed in OpenAI's decision to work with the Department of War. The way DoW treated Anthropic stands against everything that makes America great. It know it's not this simple, but it feels super opportunistic in a way that doesn't sit right with me.

English

229

133

4.7K

166.7K

Max Kaufmann@maxskaufmann·1 Mar

@vxunderground Learn as you go

English

vx-underground@vxunderground·28 Şub

If you want to learn malware development you need to do two things 1. Learn to code without the assistance of an LLM 2. Learn malware techniques, tactics, and procedures (TTPs). It doesn't really matter which one you start with. When I first started, I started with #2. I wasn't particularly interested in learning to program, but the theory and underlying concepts fascinated me. If you choose #2 you don't have to get super low level and start studying Windows internals (in this context I'm discussing Windows malware). You just need to know how a particular method works fundamentally. I think malware TTPs are really cool and I loved learning about them (I still do). What you'll eventually discover however is that TTPs "stack". You'll see newer techniques are based off of older techniques or they're slightly modified variants of older techniques. You'll also see some of the TTPs are completely legitimate things which are abused. You don't need a fancy course to study malware TTPs. You can just Google it or ask an LLM, or something.

English

154

1.7K

64.9K

Max Kaufmann@maxskaufmann·28 Şub

@alexolegimas You’re an idiot for thinking this cripples US AI dominance

English

155

Alex Imas@alexolegimas·28 Şub

The bad news: Pete Hegseth is an idiot (crippling US AI dominance because he's a snowflake) The good news: Pete Hegseth is an idiot (it's likely illegal and won't hold in court)

English

396

18.5K

Max Kaufmann@maxskaufmann·28 Şub

@ericbahn @AnthropicAI This is a bad prediction.

English

105

Eric Bahn 💛@ericbahn·28 Şub

Prediction: @AnthropicAI may have lost the US government as a client, but will gain even more collectively-lucrative government contracts outside of the US.

English

160

181

2.8K

51.8K

Max Kaufmann@maxskaufmann·28 Şub

@somewheresy So he can run for 2028?

English

∿@somewheresy·28 Şub

Hegseth should be forced to resign over this tbh

English

2.3K

21.7K

Max Kaufmann@maxskaufmann·28 Şub

@natolambert You're missing the c in Sam

English

132

Nathan Lambert@natolambert·28 Şub

This is confusing and stands to fracture the AI community unless they are being misleading or had different information on the Anthropic situation. Feels like furthering the policy of attacking companies being defined by politics. Am I missing something? (I hope so)

Sam Altman@sama

Tonight, we reached an agreement with the Department of War to deploy our models in their classified network. In all of our interactions, the DoW displayed a deep respect for safety and a desire to partner to achieve the best possible outcome. AI safety and wide distribution of benefits are the core of our mission. Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement. We also will build technical safeguards to ensure our models behave as they should, which the DoW also wanted. We will deploy FDEs to help with our models and to ensure their safety, we will deploy on cloud networks only. We are asking the DoW to offer these same terms to all AI companies, which in our opinion we think everyone should be willing to accept. We have expressed our strong desire to see things de-escalate away from legal and governmental actions and towards reasonable agreements. We remain committed to serve all of humanity as best we can. The world is a complicated, messy, and sometimes dangerous place.

English

379

40.5K

Max Kaufmann@maxskaufmann·28 Şub

@mweinbach Would you respect horrible people for standing up for their values?

English

238

Max Weinbach@mweinbach·28 Şub

Gotta respect Anthropic for standing up for their values

English

901

17.7K

Max Kaufmann@maxskaufmann·28 Şub

@itsandrewgao If AGI were here you'd make a better tweet than this.

English

259

3.5K

andrew gao@itsandrewgao·28 Şub

AGI is here, you're just a shit prompter

English

104

272

241.2K

Max Kaufmann@maxskaufmann·28 Şub

@ExistentialEnso No. Patriots support defense first and foremost.

English

Thorne 🌸@ExistentialEnso·28 Şub

Dario Amodei is a real American patriot in the ways that are authentic and meaningful to the vision of what this country is supposed to be

English

414

5.1K

Max Kaufmann@maxskaufmann·28 Şub

@growing_daniel OpenAI wants to deal with it

English

170

Daniel@growing_daniel·28 Şub

All this shit is just so bad for the defense tech ecosystem. Like who wants to deal with this, what a crappy customer

English

940

32.5K

Keşfet

@OfficialLoganK @sama @nikitabier @toly @dwlz @corsaren @allgarbled @gregosuri