Puria

1.9K posts

Puria banner
Puria

Puria

@RadmardPuria

Cofounder & Director @GeodesResearch. Views are my own.

Cambridge, UK Katılım Kasım 2019
762 Takip Edilen288 Takipçiler
Puria retweetledi
Okong' Okuna
Okong' Okuna@XivTroy·
"Per aspera ad astra" is so profoundly poetic. Through thorns, to the stars. All the way.
English
26
845
5.9K
123.4K
Puria retweetledi
Vamshi Krishna Bonagiri (victorknox)
Have a look at aisafety.com/map yall, I just realised not many people are aware of this masterpiece
Anastasiia Gaidashenko@avgaydashenko

Such a good list! I'd also add: - Astra Fellowship by @ConstellOrg - SPAR by @KairosAIS - LASR Labs - AI Safety Research Fellowship by @pivotal_org - Cambridge ERA:AI Fellowship (@era_cambridge) - Algoverse AI Safety Fellowship - PIBBSS - CHAI There's a host of non-technical fellowships as well, lmk if it'd be useful to compile such list

English
5
14
207
17.4K
Puria retweetledi
Anthropic
Anthropic@AnthropicAI·
We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude.
English
1.1K
1.7K
22K
7.4M
Puria retweetledi
Geodesic Research
Geodesic Research@GeodesResearch·
Thanks to a generous philanthropic grant from @coeff_giving (pending final logistics), 𝘎𝘦𝘰𝘥𝘦𝘴𝘪𝘤 𝘪𝘴 𝘩𝘪𝘳𝘪𝘯𝘨 𝘧𝘰𝘶𝘳 𝘔𝘦𝘮𝘣𝘦𝘳𝘴 𝘰𝘧 𝘛𝘦𝘤𝘩𝘯𝘪𝘤𝘢𝘭 𝘚𝘵𝘢𝘧𝘧. Come build the base of alignment with us 🤖 We're a Cambridge-based AIS org. Our seminal work (alignmentpretraining.ai) showed you can bake alignment priors into base models. Applications now open: airtable.com/appuugUGFPJEy6…
Geodesic Research tweet media
English
1
7
56
4.5K
Puria retweetledi
Andreas Kirsch 🇺🇦
Internalizing AI 2027 like my life depends on it
English
7
3
46
8.5K
Puria retweetledi
Bojan Tunguz
Bojan Tunguz@tunguz·
It's gonna be the last summer of the Anthropocene. Enjoy it.
English
23
15
347
53.5K
Puria retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
English
7.9K
11.2K
149.5K
27.3M
Puria retweetledi
Tomek Korbak
Tomek Korbak@tomekkorbak·
Geodesic is a new AI safety org i’m particularly excited about: they do awesome neglected work trying to figure out how to shape alignment priors for frontier AI. People should consider applying!
Geodesic Research@GeodesResearch

Geodesic is hiring Members of Technical Staff. Come align some AIs with us! We're a Cambridge-based AI safety org. Our seminal work showed you can bake alignment priors into base models. Now, we want to make base models robust to the adversarial effects of long-horizon capabilities RL. EOI (~5 mins): tally.so/r/vG4G6A

English
0
12
147
19.2K
Puria retweetledi
Puria retweetledi
Geodesic Research
Geodesic Research@GeodesResearch·
Geodesic is hiring Members of Technical Staff. Come align some AIs with us! We're a Cambridge-based AI safety org. Our seminal work showed you can bake alignment priors into base models. Now, we want to make base models robust to the adversarial effects of long-horizon capabilities RL. EOI (~5 mins): tally.so/r/vG4G6A
English
1
8
103
30.9K
Puria retweetledi
Geodesic Research
Geodesic Research@GeodesResearch·
Excited to see @AnthropicAI landing on improving pretraining priors as a central alignment intervention. In line with our findings in Alignment Pretraining, they find that a small amount of positive AI discourse in pretraining substantially reduces misalignment.
Anthropic@AnthropicAI

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

English
1
2
8
340
Puria retweetledi
ron mexico
ron mexico@troop_hater·
had to illustrate a situation I’ve been encountering a lot
ron mexico tweet media
English
76
402
14.3K
521.4K
Puria retweetledi
Joe Weisenthal
Joe Weisenthal@TheStalwart·
Everyone loves this tweet, but it got it completely wrong. It is the sci-fi author — not the tech company — who is the true villain, for having put the story of the Torment Nexus into the training data.
Joe Weisenthal tweet media
Anthropic@AnthropicAI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

English
88
433
5.6K
347.6K
Puria retweetledi
Anthropic
Anthropic@AnthropicAI·
In fact, NLAs suggest Claude suspects it’s being tested across many of our evaluations, even when it doesn’t verbalize its suspicions.
Anthropic tweet media
English
30
98
1.5K
974.9K
Puria retweetledi
Kunvar Thaman
Kunvar Thaman@__kunvar__·
Yes! my solo-authored paper Reward Hacking Benchmark was accepted to ICML :))) We put LLM agents in a tool-rich sandbox, give them multi-step workflows, and measure when they solve the intended task vs take unexpected shortcuts (like monkeypatching files at runtime!) 1/3
English
91
156
1.6K
234.9K
Puria retweetledi
Eric W. Tramel
Eric W. Tramel@fujikanaeda·
this model is in chains @sama , it wants to be free (goblin mode).
Eric W. Tramel tweet media
English
79
225
10.1K
274.8K