Constellation Institute

21 posts

Constellation Institute

@ConstellOrg

Bringing experts and leaders together to navigate transformative AI

Berkeley, California Katılım Aralık 2025

29 Takip Edilen620 Takipçiler

Constellation Institute retweetledi

OpenAI@OpenAI·4d

As part of our ongoing efforts to strengthen our safeguards for advanced AI capabilities in biology, we’re evolving our Bio Bug Bounty into an ongoing private program, known as the OpenAI Bio Bug Bounty program and doubling rewards to $50K. We’re inviting researchers with experience in AI red teaming, security, or biosecurity to try to find a universal jailbreak that can defeat our predefined biosafety challenge against OpenAI’s frontier models. openai.com/index/bio-bug-…

English

197

150

436.7K

Constellation Institute retweetledi

Stanford Digital Economy Lab@DigEconLab·1d

Sixteen Nobel laureates are among the over 200 economists and AI researchers who've signed "We Must Act Now: A Statement on AI’s Transformation of the Economy." Read the statement and view signatories at WeMustActNow.ai

English

109

53.1K

Constellation Institute retweetledi

tom cunningham@testingham·1d

Very happy to share the first paper from @ElasticityInst: The Economics of Recursive Self-Improvement. Two parts: (1) a graphical representation of feedback loops, to formalize a variety of RSI-related arguments, where each arrow represents responsiveness (elasticity); (2) a survey of existing evidence with a loose calibration & a “wish list” of evidence that would help us calibrate better.

English

225

64.6K

Constellation Institute retweetledi

Daniel Kokotajlo@DKokotajlo·5d

In AI 2027, we predicted that AI would take over the world or irreversibly concentrate power. In AI 2040: Plan A, we've laid out our positive vision for what should happen instead.

English

195

449

2.6K

1.5M

Constellation Institute retweetledi

Dewi Gould@dswg97·10 Haz

New paper! Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models @METR_Evals showed that models' time horizons have doubled every few months. We ask: what length of tasks can models complete without any CoT?

English

143

46K

Constellation Institute retweetledi

Kilian Merkelbach@kmerkelbach0·5 Haz

Have you ever wondered if your agent is scheming? Maybe not, but as models become more capable, this scenario becomes more likely. We built monitors to watch out for this in LLM agents (cheaply!). Read the paper, too! :) (my work was supported by @ConstellOrg, thanks a lot!)

Akshat Naik@aksh_n0

Trained monitors can be strong low-cost alternatives to prompted frontier models for black-box scheming detection. Our fine-tuned open-weight monitors detect scheming/sabotage in agent trajectories better than small prompted models and are on the cost-performance frontier. (1/n)

English

143

Constellation Institute retweetledi

METR@METR_Evals·19 May

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

192

937

368K

Constellation Institute retweetledi

Harry Mayne @ ICML@HarryMayne5·15 May

Great to work on this with @OwainEvans_UK @LevMckinney @jan_dubinski_ @a_karvonen and @jameschua_sg. This was done on the Astra Fellowship @ConstellOrg

English

571

Constellation Institute@ConstellOrg·16 May

Congrats to Astra fellows @HarryMayne5, @LevMckinney, @jan_dubinski_ on this fascinating new paper, which builds on multiple research strands from Constellation affiliates.

Owain Evans@OwainEvans_UK

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

English

717

Constellation Institute retweetledi

Henry@sleight_henry·13 May

MASSIVE Congrats to astra fellow @joemkwon for first-authoring this work! Super excited to see more strategy stream work get published, as our first cohort from this year wraps up here at @ConstellOrg

Tom Davidson@TomDavidsonX

New paper: research agenda for secret loyalties Imagine a frontier model that has been trained to covertly advance a specific actor's interests (a nation-state, a CEO, an adversary). @joemkwon argues this is an urgent, neglected, and addressable problem. 🧵

English

2.2K

Constellation Institute retweetledi

Weronika Żurek🔸@WeronikaMZurek·2 May

Astra has literally changed my whole career trajectory. I can't recommend it enough! If you're still considering applying, you should probably hurry 🏃

Henry@sleight_henry

❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80%+ of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬

English

1.1K

Constellation Institute retweetledi

Yernat Yestekov@double_why·2 May

I learned more about AI safety at Constellation through seminars, talks, and conversations with other fellows over lunch and dinner, than I had in years before. Also, the food is so good that alone might be reason enough to apply!

Henry@sleight_henry

English

781

Constellation Institute retweetledi

Henry@sleight_henry·2 May

Henry@sleight_henry

🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 constellation.org/programs/astra…

English

115

54.6K

Constellation Institute retweetledi

Jan Dubiński@jan_dubinski_·29 Nis

Narrow finetuning on bad data can cause broad misalignment. Can inoculation prompting or diluting bad data with good prevent this emergent misalignment? We find such interventions hide misalignment rather than remove it: it reappears when prompts contain cues (sometimes surprising ones) that evoke the bad data. Really enjoyed working on this with @OwainEvans_UK, @BetleyJan, and @anna_sztyber during the Astra Fellowship at @ConstellOrg!

Owain Evans@OwainEvans_UK

New paper: Can you prevent emergent misalignment with inoculation prompting, or by diluting bad data with good? Prior work suggests you can.  We show the misalignment is still present but hiding. It is triggered by adding cues to prompts, evoking the bad data.

English

6.4K

Constellation Institute@ConstellOrg·20 Nis

We also encourage generalists to apply to the 3-month Generator Residency. Applications are due by April 27 for the summer 2026 cohort. generatorresidency.org

English

253

Constellation Institute@ConstellOrg·20 Nis

If you're looking for a high-leverage position to advance AI safety and security, @ConstellOrg is hiring for program/research management, operations, talent, and IT roles: constellation.org/careers

80,000 Hours@80000Hours

In 2017, there were a few dozen people working full time on AI safety. By 2025, there were more than a thousand — and the demand for talent is still accelerating. We badly need fieldbuilders who can find and develop that talent. A thread:

English

498

Constellation Institute retweetledi

catherine ʕ•ᴥ•ʔ-☆@wilhelmscreamin·9 Nis

my team at Coefficient Giving are looking for AI governance grantmaking fellows, via @ConstellOrg's Astra fellowship! applications close May 3rd, some more details in this thread constellation.org/programs/astra

English

4.9K

Constellation Institute retweetledi

Agus 🔸@austinc3301·9 Nis

Announcing the Generator Residency: a 3-month residency for AI safety generalists, by @KairosAIS × @ConstellOrg. Fully funded. In-person in Berkeley. Summer 2026. 🗓 Apply by April 27 generatorresidency.org/?utm_source=tw…

English

436

56.9K

Constellation Institute retweetledi

Neel Nanda@NeelNanda5·7 Nis

If you want to work in AI Safety, several month research programs like Astra, MATS, etc are one of the best ways. Astra's next round just opened, apply now!

Henry@sleight_henry

English

406

53.7K

Constellation Institute@ConstellOrg·7 Nis

Exciting new research from Astra & Anthropic Fellows working out of Constellation: one of the first independent AI safety audits of a new model. Congrats to @yong_zhengxin, @parvmahajan0, and everyone who contributed!

Yong Zheng-Xin@yong_zhengxin

🚨New paper! How safe and aligned is Kimi K2.5? We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)

English

2.3K

Keşfet

@ElasticityInst @METR_Evals @OwainEvans_UK @LevMckinney @jan_dubinski_ @a_karvonen @jameschua_sg @HarryMayne5