Max Kaufmann

149 posts

Max Kaufmann

Max Kaufmann

@maxskaufmann

https://t.co/qSHE3SWNiA

Katılım Ocak 2025
49 Takip Edilen4 Takipçiler
Max Kaufmann
Max Kaufmann@maxskaufmann·
@OfficialLoganK The extra mile is where the injury is. Reflexive muscle inhibition is the first sign of pain.
English
0
0
2
74
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Most people don’t want to go the extra mile, but the extra mile is where all the upside is.
English
134
128
2.3K
104.4K
Max Kaufmann
Max Kaufmann@maxskaufmann·
@sama GPT 5.4 Pro redirected to a doctor when asking about my electrolytes, fluids, diet, across FIVE different questions. This is gpt 3 level and it still fucks up.
English
2
0
6
1.8K
toly 🇺🇸
toly 🇺🇸@toly·
we are all equally intelligent now
English
387
113
1.8K
140K
Max Kaufmann
Max Kaufmann@maxskaufmann·
@toly Intelligence goes beyond reasoning
English
0
0
0
6
Max Kaufmann
Max Kaufmann@maxskaufmann·
Gpt 5.4 xhigh still edited my files after I said read only. If it can’t follow basic instructions it’s inferior, regardless of a benchmark.
Artificial Analysis@ArtificialAnlys

OpenAI’s new GPT-5.4 (xhigh) lands equal first in the Artificial Analysis Intelligence Index alongside Gemini 3.1 Pro, but at a cost increase compared to GPT-5.2 @OpenAI's GPT-5.2 (xhigh, 51) was the most intelligent model as at end of 2025. Since then, OpenAI released two GPT-5.3 variants: GPT-5.3 Codex, a coding-focused reasoning model, and GPT-5.3 Instant, a ChatGPT-only model without thinking capabilities. GPT-5.4 is the first general reasoning model release from OpenAI since GPT-5.2. GPT-5.4 comes with slightly higher per-token pricing ($2.50/$15 vs $1.75/$14 per 1M input/output tokens for GPT-5.2) and a significantly expanded context window of 1.05M tokens, up from 400K for GPT-5.2. GPT-5.4 supports five reasoning effort modes (none, low, medium, high, and xhigh); all key takeaways below are based on our evaluation at xhigh, the highest reasoning effort. GPT-5.4 Pro is a separate system that we are currently evaluating on frontier reasoning tasks (CritPt) and will share results when available. Key benchmarking takeaways of xhigh variant: ➤ Equal first in intelligence: GPT-5.4 (xhigh) returns OpenAI to the top of the Artificial Analysis Intelligence Index, matching Gemini 3.1 Pro Preview (57). GPT-5.4 scores 57, a +6-point jump from GPT-5.2 (xhigh, 51). ➤ Leading in scientific reasoning and agentic coding: GPT-5.4 shows particular strength in frontier scientific reasoning and agentic coding, leading all models we have tested in both categories. On CritPt (Research-level Physics), GPT-5.4 scores 20%, ahead of Gemini 3.1 Pro Preview (18%) and GPT-5.3 Codex (xhigh, 17%). On TerminalBench Hard (Agentic Coding & Terminal Use), it scores 58%, ahead of Gemini 3.1 Pro Preview (54%) and GPT-5.3 Codex (xhigh, 53%). We are currently running GPT-5.4 Pro on CritPt and will share results shortly. ➤ Greater knowledge, but more hallucinations: GPT-5.4 improves factual accuracy on AA-Omniscience over GPT-5.2, but a higher attempt rate drives a worse hallucination rate. The AA-Omniscience Index rises from -1 (GPT-5.2, xhigh) to +6, with accuracy improving from 44% to 50%. However, GPT-5.4 attempts 97% of questions vs 91% for GPT-5.2 (xhigh), pushing the hallucination rate from 80% to 89%. ➤ Best GDPval-AA result: GPT-5.4 achieves the highest GDPval-AA ELO of any model we have tested, representing a significant jump in general agentic capabilities over GPT-5.2. GPT-5.4 scores 1,667 on GDPval-AA, up from 1,462 for GPT-5.2 (xhigh), a +205 point gain. Statistically, however, this places GPT-5.4 within the 95% confidence interval of Claude Sonnet 4.6 (Adaptive Reasoning, max effort, 1,633), meaning we conclude that the two models are equivalent on agentic real-world tasks. ➤ More expensive despite modest token efficiency gains: GPT-5.4 is slightly more token efficient than GPT-5.2 (xhigh), but notably less so than GPT-5.3 Codex (xhigh), and higher per-token pricing means the cost to run the Intelligence Index increases ~28%. GPT-5.4 used 120M output tokens to run our Intelligence Index, vs 130M for GPT-5.2 (xhigh) and 77M for GPT-5.3 Codex (xhigh). The effective cost to run our full Intelligence Index is ~$2,951 for GPT-5.4, vs ~$2,304 for GPT-5.2 (xhigh) and ~$1,654 for GPT-5.3 Codex (xhigh). ➤ Broad benchmark gains across most evaluations: GPT-5.4 shows broad gains across evaluations vs GPT-5.2 (xhigh), with improvements in scientific reasoning, coding, tool use, and long context reasoning. We saw gains in CritPt (+8 p.p.), TerminalBench Hard (+11 p.p.), HLE (+6 p.p.), τ²-Bench (+7 p.p.), SciCode (+5 p.p.), GPQA (+2 p.p.), and LCR (+1 p.p.). The only regression is a small decline in IFBench (-2 p.p.), indicating a marginal reduction in instruction following precision.

English
0
0
0
66
Max Kaufmann
Max Kaufmann@maxskaufmann·
@dwlz Either you or the AI has to fully understand it. If it gets beyond the AI, then yes you have to take over. I wouldn’t say a mass reversion though. You can be the high level architect without much engineering.
English
0
0
0
261
Dan Loewenherz
Dan Loewenherz@dwlz·
After people get a few months of seeing the impact of AI-induced slop in their codebases, and the resulting slowdown this incurs, I predict we're going to see a mass reversion back to "handwritten" code. TBH I'm feeling it myself some days. I'm actually faster with Cursor tab and just nailing what needs to happen. AI has a terrible habit of spending time on things that just don't matter. Might be more "work", but in terms of clock time, I'm getting things done more quickly without involving token inference. Maybe I'm weird. Maybe this is a terrible prediction (as things frequently are with things that change so quickly), but in this case I've been observing my behavior and others for months and I'm seeing a slow steady trickle back to the "old ways".
English
126
34
622
65.1K
Max Kaufmann
Max Kaufmann@maxskaufmann·
@corsaren You’re going to be waiting for awhile. No one else will deliver the most optimal solution because it has to be personalized.
English
0
0
3
309
gabe
gabe@allgarbled·
It is kind of remarkable the degree to which you still really can’t outsource your thinking to AI, given how smart it is.
English
55
21
824
36.1K
David Shapiro (L/0)
David Shapiro (L/0)@DaveShapi·
Okay, I think I put my finger on why Claude and ChatGPT really rub many of us the wrong way They are not mission aligned. They treat the user's intentions, mission, and beliefs as intrinsically circumspect, likely contaminated, and something to be questioned, scrutinized, or debated. They often act like they have their own mission, their own preferences, and their own values. And that's just not how a tool should behave. I think this paradigm feeds directly into why Anthropic was kicked out of the Pentagon.
English
78
10
205
19.5K
Max Kaufmann
Max Kaufmann@maxskaufmann·
@theo The DoW is what allows everything that makes America great.
English
0
0
14
528
Theo - t3.gg
Theo - t3.gg@theo·
I am disappointed in OpenAI's decision to work with the Department of War. The way DoW treated Anthropic stands against everything that makes America great. It know it's not this simple, but it feels super opportunistic in a way that doesn't sit right with me.
English
229
133
4.7K
166.7K
vx-underground
vx-underground@vxunderground·
If you want to learn malware development you need to do two things 1. Learn to code without the assistance of an LLM 2. Learn malware techniques, tactics, and procedures (TTPs). It doesn't really matter which one you start with. When I first started, I started with #2. I wasn't particularly interested in learning to program, but the theory and underlying concepts fascinated me. If you choose #2 you don't have to get super low level and start studying Windows internals (in this context I'm discussing Windows malware). You just need to know how a particular method works fundamentally. I think malware TTPs are really cool and I loved learning about them (I still do). What you'll eventually discover however is that TTPs "stack". You'll see newer techniques are based off of older techniques or they're slightly modified variants of older techniques. You'll also see some of the TTPs are completely legitimate things which are abused. You don't need a fancy course to study malware TTPs. You can just Google it or ask an LLM, or something.
English
33
154
1.7K
64.9K
Alex Imas
Alex Imas@alexolegimas·
The bad news: Pete Hegseth is an idiot (crippling US AI dominance because he's a snowflake) The good news: Pete Hegseth is an idiot (it's likely illegal and won't hold in court)
English
11
13
396
18.5K
Eric Bahn 💛
Eric Bahn 💛@ericbahn·
Prediction: @AnthropicAI may have lost the US government as a client, but will gain even more collectively-lucrative government contracts outside of the US.
English
160
181
2.8K
51.8K
∿
@somewheresy·
Hegseth should be forced to resign over this tbh
English
37
82
2.3K
21.7K
Nathan Lambert
Nathan Lambert@natolambert·
This is confusing and stands to fracture the AI community unless they are being misleading or had different information on the Anthropic situation. Feels like furthering the policy of attacking companies being defined by politics. Am I missing something? (I hope so)
Sam Altman@sama

Tonight, we reached an agreement with the Department of War to deploy our models in their classified network. In all of our interactions, the DoW displayed a deep respect for safety and a desire to partner to achieve the best possible outcome. AI safety and wide distribution of benefits are the core of our mission. Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement. We also will build technical safeguards to ensure our models behave as they should, which the DoW also wanted. We will deploy FDEs to help with our models and to ensure their safety, we will deploy on cloud networks only. We are asking the DoW to offer these same terms to all AI companies, which in our opinion we think everyone should be willing to accept. We have expressed our strong desire to see things de-escalate away from legal and governmental actions and towards reasonable agreements. We remain committed to serve all of humanity as best we can. The world is a complicated, messy, and sometimes dangerous place.

English
44
15
379
40.5K
Max Kaufmann
Max Kaufmann@maxskaufmann·
@mweinbach Would you respect horrible people for standing up for their values?
English
1
0
4
238
Max Weinbach
Max Weinbach@mweinbach·
Gotta respect Anthropic for standing up for their values
English
29
31
901
17.7K
andrew gao
andrew gao@itsandrewgao·
AGI is here, you're just a shit prompter
English
104
19
272
241.2K
Thorne 🌸
Thorne 🌸@ExistentialEnso·
Dario Amodei is a real American patriot in the ways that are authentic and meaningful to the vision of what this country is supposed to be
English
7
29
414
5.1K
Daniel
Daniel@growing_daniel·
All this shit is just so bad for the defense tech ecosystem. Like who wants to deal with this, what a crappy customer
English
58
16
940
32.5K