Constellation Institute

15 posts

Constellation Institute banner
Constellation Institute

Constellation Institute

@ConstellOrg

Bringing experts and leaders together to navigate transformative AI

Berkeley, California Katılım Aralık 2025
28 Takip Edilen461 Takipçiler
Constellation Institute retweetledi
METR
METR@METR_Evals·
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
METR tweet media
English
28
188
857
288.5K
Constellation Institute retweetledi
🚀Henry is leading AI Safety Research Programs
MASSIVE Congrats to astra fellow @joemkwon for first-authoring this work! Super excited to see more strategy stream work get published, as our first cohort from this year wraps up here at @ConstellOrg
Tom Davidson@TomDavidsonX

New paper: research agenda for secret loyalties Imagine a frontier model that has been trained to covertly advance a specific actor's interests (a nation-state, a CEO, an adversary). @joemkwon argues this is an urgent, neglected, and addressable problem. 🧵

English
0
3
21
1.9K
Constellation Institute retweetledi
Weronika Żurek🔸
Weronika Żurek🔸@WeronikaMZurek·
Astra has literally changed my whole career trajectory. I can't recommend it enough! If you're still considering applying, you should probably hurry 🏃
🚀Henry is leading AI Safety Research Programs@sleight_henry

❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80%+ of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬

English
0
4
12
976
Constellation Institute retweetledi
Yernat Yestekov
Yernat Yestekov@double_why·
I learned more about AI safety at Constellation through seminars, talks, and conversations with other fellows over lunch and dinner, than I had in years before. Also, the food is so good that alone might be reason enough to apply!
🚀Henry is leading AI Safety Research Programs@sleight_henry

❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80%+ of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬

English
0
2
12
668
Constellation Institute retweetledi
🚀Henry is leading AI Safety Research Programs
❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80%+ of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬
🚀Henry is leading AI Safety Research Programs@sleight_henry

🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 constellation.org/programs/astra…

English
3
24
115
51.7K
Constellation Institute retweetledi
Jan Dubiński
Jan Dubiński@jan_dubinski_·
Narrow finetuning on bad data can cause broad misalignment. Can inoculation prompting or diluting bad data with good prevent this emergent misalignment? We find such interventions hide misalignment rather than remove it: it reappears when prompts contain cues (sometimes surprising ones) that evoke the bad data. Really enjoyed working on this with @OwainEvans_UK, @BetleyJan, and @anna_sztyber during the Astra Fellowship at @ConstellOrg!
Jan Dubiński tweet media
Owain Evans@OwainEvans_UK

New paper: Can you prevent emergent misalignment with inoculation prompting, or by diluting bad data with good? Prior work suggests you can. 
We show the misalignment is still present but hiding. It is triggered by adding cues to prompts, evoking the bad data.

English
1
9
39
5.6K
Constellation Institute retweetledi
Neel Nanda
Neel Nanda@NeelNanda5·
If you want to work in AI Safety, several month research programs like Astra, MATS, etc are one of the best ways. Astra's next round just opened, apply now!
🚀Henry is leading AI Safety Research Programs@sleight_henry

🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 constellation.org/programs/astra…

English
7
30
405
53K
Constellation Institute
Constellation Institute@ConstellOrg·
Exciting new research from Astra & Anthropic Fellows working out of Constellation: one of the first independent AI safety audits of a new model. Congrats to @yong_zhengxin, @parvmahajan0, and everyone who contributed!
Constellation Institute tweet media
Yong Zheng-Xin@yong_zhengxin

🚨New paper! How safe and aligned is Kimi K2.5? We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)

English
0
1
12
2.3K