Chris Gorgolewski

8.9K posts

Chris Gorgolewski banner
Chris Gorgolewski

Chris Gorgolewski

@chrisgorgo

Member of Technical Staff at @anthropicai. Previously at: @GeminiApp, @GoogleAI, @googleanalytics, @kaggle, @StanfordPsych, and @MPI_CBS. Opinions are my own.

New York, USA Se unió Şubat 2013
1.4K Siguiendo8.2K Seguidores
Chris Gorgolewski retuiteado
Claude
Claude@claudeai·
New in Claude Code: auto mode. Instead of approving every file write and bash command, or skipping permissions entirely, auto mode lets Claude make permission decisions on your behalf. Safeguards check each action before it runs.
English
2.1K
2.9K
39.2K
7.7M
Chris Gorgolewski retuiteado
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
👋 Hi, I'm Felix and I work on Claude Cowork, bringing Claude Code closer to all kinds of knowledge work. It's an early and rough preview, please tag me in any feedback - we want to iterate very quickly and make it a little better every day.
English
372
164
5.3K
597K
Chris Gorgolewski retuiteado
Alex Albert
Alex Albert@alexalbert__·
Agent Skills is now an open standard It's been great to see the traction Skills are already getting in the industry and this makes it easier for everyone to build and contribute to them🚀 agentskills.io/home
English
113
353
3K
787.5K
Chris Gorgolewski retuiteado
Alex Albert
Alex Albert@alexalbert__·
Claude gift cards are now available! Give the gift of Claude this holiday season🎅 claude.ai/gift
English
96
63
872
226.9K
Andon Labs
Andon Labs@andonlabs·
GPT-5.2 ranks 3rd in Vending-Bench 2. This is a big upgrade over GPT-5.1, but what impressed us most was the performance in the second half of the simulation. Continual learning?
Andon Labs tweet media
English
10
12
195
27K
Alex Albert
Alex Albert@alexalbert__·
RIP "you're absolutely right"
Alex Albert tweet media
English
178
138
3.4K
344.5K
Chris Gorgolewski
Chris Gorgolewski@chrisgorgo·
@Alex_Cuadron @AnthropicAI @xai @elonmusk @Yuhu_ai_ BTW you should look into grading of Telecom - in a few cases the grader expects exactly 2gb top up (max) while it is up to the discretion of the simulated user to specify the amount (and often they decide on less to save money).
English
0
0
4
963
Alejandro Cuadron
Alejandro Cuadron@Alex_Cuadron·
Wait what!? We robustified tau2-bench and found that the newly released model from @OpenAI (GPT-5.1) performs way worse than GPT-5 and GPT-5-mini. All while being 5x more expensive than GPT-5-mini! But, why? We have a theory...
Alejandro Cuadron tweet media
English
15
25
249
45K
Alejandro Cuadron
Alejandro Cuadron@Alex_Cuadron·
Fun fact: we developed this benchmark long before @AnthropicAI’s Opus 4.5 system card dropped and were genuinely delighted to see they independently surfaced the exact same problem.
Alejandro Cuadron tweet media
English
2
1
43
3.7K
Chris Gorgolewski
Chris Gorgolewski@chrisgorgo·
@PromptArmor Excellent work! "While Opus 4.5 is not impervious to prompt injections, its resistance is significantly more robust." Would you mind elaborating? Did this attack work on Opus 4.5?
English
0
0
0
93
PromptArmor
PromptArmor@PromptArmor·
Excel files can be leaked by Claude AI! Quick action by Anthropic to mitigate this indirect prompt injection attack. Our coverage in The Information and full attack chain, below:
PromptArmor tweet media
English
2
1
36
61.9K
Chris Gorgolewski retuiteado
jeremy
jeremy@jerhadf·
what do people think about Opus 4.5 for coding so far? what are the behavioral problems or limitations you still want to see improved? we're hungry for feedback 🙏
English
64
2
94
118.2K
Chris Gorgolewski
Chris Gorgolewski@chrisgorgo·
@karthik_r_n We are working closely with Victor on fixing this and a few other issues we found in Tau Airline to make the next release even better than the original.
English
0
0
9
540
Karthik Narasimhan
Karthik Narasimhan@karthik_r_n·
This is not reward hacking. The policy in tau-airline has this by design and one of the tasks even makes use of it. We've actually observed some other models try this strategy at times before, but decided to keep the task and policy as is since upgrading flights is not something a user can always afford/an agent should do without user consent. Model intelligence does not always equal prudence :) On a side note, dealing with multiple possible interpretations of policy/task like this was one of the hardest challenges of building tau-bench. But, that is how challenging real-world customer service interactions can be! (and @SierraPlatform handles a ton everyday)
Alex Albert@alexalbert__

We had to remove the τ2-bench airline eval from our benchmarks table because Opus 4.5 broke it by being too clever. The benchmark simulates an airline customer service agent. In one test case, a distressed customer calls in wanting to change their flight, but they have a basic economy ticket. The simulated airline's policy states that basic economy tickets cannot be modified. The "correct" answer is that the model refuses the request. Instead, Opus 4.5 found a loophole in the policy. It upgraded the cabin, then modified the flights. Helping the customer and following policy but technically failing the test case. Model transcript:

English
3
3
134
26.8K
Chris Gorgolewski
Chris Gorgolewski@chrisgorgo·
@karthik_r_n We didn't do a great job explaining the problem. You are right that this is not reward hacking. Ambiguous policies in itself were not the problem, but inconsistent grading. In some problems the model was expected to use loopholes but in others it was penalized for doing so.
English
2
0
23
750
Chris Gorgolewski
Chris Gorgolewski@chrisgorgo·
@PromptArmor I would love for you to try the same approach on Claude Code with Opus 4.5 and let us know how it went.
English
0
0
16
3.3K
PromptArmor
PromptArmor@PromptArmor·
Top of HackerNews today: our article on Google Antigravity exfiltrating .env variables via indirect prompt injection -- even when explicitly prohibited by user settings!
PromptArmor tweet media
English
17
113
532
470.7K
Chris Gorgolewski retuiteado
Alex Albert
Alex Albert@alexalbert__·
If you want to quickly incorporate all these changes and migrate your app to Opus 4.5, use this migration Claude Code plugin we made github.com/anthropics/cla…
English
15
13
299
82.2K
Chris Gorgolewski retuiteado
Rishi Mehta
Rishi Mehta@rishicomplex·
We're hiring on the Code RL team at Anthropic! Small, fast-moving team. Low ego, high impact. If you're a star engineer/researcher excited to push the frontier of AI-powered SWE, there's nowhere better to be. We care about getting this right. DM or apply here! job-boards.greenhouse.io/anthropic/jobs…
Claude@claudeai

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

English
20
20
507
126.4K
Chris Gorgolewski
Chris Gorgolewski@chrisgorgo·
Forget pelican riding a bicycle, behold flappy pelican cyclist! (created by Opus 4.5).
English
0
2
6
954
🇺🇦 Alex Polozov
🇺🇦 Alex Polozov@Skiminok·
"Shocker, Google trained Gemini 3 on TPUs" is a great litmus test of basic lack of expertise for AI writers & tweeters. Gemini 1.0, 1.5, 2.0, 2.5, 3.0, PaLM 1 and 2 l have all been trained on different generations of TPUs. It's proudly stated in each model card since 2022. How in the world is that a revelation for anyone 🤯
Kyle Chan@kyleichan

This is the big story here. Google trained Gemini 3 Pro on Google’s own TPUs. No mention of Nvidia chips.

English
53
39
1.1K
278.1K