Puria
1.9K posts

Puria
@RadmardPuria
Cofounder & Director @GeodesResearch. Views are my own.

Such a good list! I'd also add: - Astra Fellowship by @ConstellOrg - SPAR by @KairosAIS - LASR Labs - AI Safety Research Fellowship by @pivotal_org - Cambridge ERA:AI Fellowship (@era_cambridge) - Algoverse AI Safety Fellowship - PIBBSS - CHAI There's a host of non-technical fellowships as well, lmk if it'd be useful to compile such list



Opus 4.8 is the first smart model in a long while




Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

Geodesic is hiring Members of Technical Staff. Come align some AIs with us! We're a Cambridge-based AI safety org. Our seminal work showed you can bake alignment priors into base models. Now, we want to make base models robust to the adversarial effects of long-horizon capabilities RL. EOI (~5 mins): tally.so/r/vG4G6A

Geodesic is hiring Members of Technical Staff. Come align some AIs with us! We're a Cambridge-based AI safety org. Our seminal work showed you can bake alignment priors into base models. Now, we want to make base models robust to the adversarial effects of long-horizon capabilities RL. EOI (~5 mins): tally.so/r/vG4G6A


New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

Excited to see our work featured in Isambard-AI's case study series; kind of compute-intensive research Geodesic does wouldn't happen without infrastructure like this. Thanks @BeestonMedia and BriCS for the chance to talk about our research!

100 years old and still the coolest person alive. Happy birthday, Sir David!


We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.









