Alex Irpan

341 posts

Alex Irpan

Alex Irpan

@AlexIrpan

Research Scientist @ Google DeepMind. Formerly Robotics, now AI Safety. Has a blog. Views are my own. "Adversarially disengaging Twitter profile"

Katılım Aralık 2012
31 Takip Edilen2.8K Takipçiler
Sabitlenmiş Tweet
Alex Irpan
Alex Irpan@AlexIrpan·
I'm on Bluesky now. I plan to cross-post blog posts to both platforms for the time being, we'll see about the other stuff. bsky.app/profile/alexir…
English
0
0
5
2K
Alex Irpan
Alex Irpan@AlexIrpan·
There is now another amicus brief filed by a number of former high ranking military officials (up to Admiral level), arguing these actions hurt the military's adherence to the rule of law. storage.courtlistener.com/recap/gov.usco…
English
0
0
1
124
Alex Irpan
Alex Irpan@AlexIrpan·
You know, when I switched into safety, I was a little worried it was too early. Between the decline of coding by hand, OpenClaw YOLOing, increasingly eval aware models, and DoD pressure to let AI be used for surveillance and autonomous weapons yeah It wasn't early
English
0
1
24
948
Alex Irpan
Alex Irpan@AlexIrpan·
I didn't know where this post was going when I started and I'm not sure where it went now that it ended, but that felt correct in some way. alexirpan.com/2025/11/16/aut…
English
1
0
3
482
Alex Irpan
Alex Irpan@AlexIrpan·
@jameschua_sg @Turn_Trout @red_bayes @davidelson @rohinmshah In this we didn't look at any CoT scenarios. In general, it's tricky...personally I think SFT style methods are okay for CoT if you've checked your responses are consistent with your CoT beforehand, based on the OpenAI deliberative alignment work.
English
1
0
2
55
Alex Irpan
Alex Irpan@AlexIrpan·
@vitransformer @Turn_Trout @red_bayes @davidelson @rohinmshah By definition, you can't avoid this, because jailbreaks are exploits against a model's adaptability, and jailbreak defenses are trying to reduce it in the narrow regime of prompts it shouldn't answer. As for how well it stays within the narrow regime, so far similar to baseline
English
0
0
0
14
Alex Irpan
Alex Irpan@AlexIrpan·
> switch to AI safety > no safety papers to cite in reviewer profile > only get assigned robotics papers Apologies in advance as I try to crash course the past year in a few weeks...
English
0
0
7
811
Alex Irpan retweetledi
Mikita Balesni 🇺🇦
Mikita Balesni 🇺🇦@balesni·
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it: 🧵
Mikita Balesni 🇺🇦 tweet media
English
42
108
458
235.3K
Alex Irpan
Alex Irpan@AlexIrpan·
AI numbers guide ElevenLabs: AI voice generation startup TwelveLabs: AI video understanding startup ThirteenAI: parked domain for AI agency startup 14ai: AI agent startup 15.ai: non-commercial My Little Pony voice generation One is more based than the rest.
English
0
0
7
732
Alex Irpan
Alex Irpan@AlexIrpan·
"I don't play gacha games because they're a scam" vs "Let me do one more hyperparam sweep before giving up. One more prompt tuning run. I swear we'll beat baseline. I know it's gonna beat the baseline this time. It's gonna win. This time for sure."
English
2
1
24
1.1K
Alex Irpan
Alex Irpan@AlexIrpan·
I guess Twitter's doing anime today
English
0
0
9
490
Alex Irpan retweetledi
Pierre Sermanet
Pierre Sermanet@psermanet·
Q: How can we ensure robots behave properly at scale? A: Robot constitutions 📜! Q: How do we verify behavior in undesirable situations at scale? A: Generation! We release the ASIMOV Benchmark for Semantic Safety of robots at asimov-benchmark.github.io @GoogleDeepMind
Pierre Sermanet tweet media
English
1
7
44
8.8K
Alex Irpan retweetledi
Rohin Shah
Rohin Shah@rohinmshah·
We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.
Rohin Shah tweet media
English
12
36
295
46.3K
Alex Irpan
Alex Irpan@AlexIrpan·
I am now back from #MITMysteryHunt with no memory of anything besides Hunt from MLK weekend. Really this is probably for the best.
English
1
0
5
781