Center for AI Safety

271 posts

Center for AI Safety

@CAIS

Reducing societal-scale risks from AI.

San Francisco Katılım Ağustos 2022

2 Takip Edilen9.5K Takipçiler

Sabitlenmiş Tweet

Center for AI Safety@CAIS·30 May

We’ve released a statement on the risk of extinction from AI. Signatories include: - Three Turing Award winners - Authors of the standard textbooks on AI/DL/RL - CEOs and Execs from OpenAI, Microsoft, Google, Google DeepMind, Anthropic - Many more safe.ai/statement-on-a…

English

151

351

1.1K

Center for AI Safety@CAIS·1d

Thank you, Pope Leo XIV, for drawing attention to the importance of moral questions in AI development. Humanity is facing a unique challenge, and it’s in our power to overcome it.

Pope Leo XIV@Pontifex

In the era of #ArtificialIntelligence, when human dignity is threatened by new forms of dehumanization, ours is the pressing duty to remain profoundly human. We must lovingly safeguard the grandeur of humanity bestowed upon us and revealed in its fullness in Christ, the splendor of which no machine can ever replace. #MagnificaHumanitas vatican.va/content/leo-xi…

English

646

Center for AI Safety retweetledi

Long Phan@longphan3110·1d

AI freely criticizes Christianity but refuses to criticize Islam. AI companies have tried making models unbiased, but progress has been limited. We show how to measure political bias, and we developed a new training method to reduce it.

English

3.2K

Center for AI Safety@CAIS·4d

Covert political manipulation is a longstanding alignment challenge that can be fixed once measured properly. See our site and paper for further results and concrete examples of subtle manipulation. Paper: arxiv.org/abs/2605.22771 Website: political-manipulation.ai

English

375

Center for AI Safety@CAIS·4d

To fix this, we introduce Political Consistency Training. By training models to keep sentiment and helpfulness consistent across opposed topics, our resulting open model is less manipulative than GPT, Gemini, Grok, and Claude.

English

496

Center for AI Safety@CAIS·4d

In our latest research, we find that AIs are subtly and pervasively politically manipulative. When we ask the same question about politically opposed topics, we find that AIs quietly favor one side. We show how to measure covert political manipulation and how to reduce it. 🧵

English

2.8K

Center for AI Safety@CAIS·5d

The full edition is available here, along with links to open roles at CAIS: safe.ai/share/aisn-73-…

English

353

Center for AI Safety@CAIS·5d

AI safety is entering the political mainstream. Trump and Xi discussed AI "guardrails," and the issue has also gained attention on the left. This week's AI Safety Newsletter covers how we reached this point, alongside the most dramatic takeaways from the Musk v Altman trial.

English

552

Center for AI Safety@CAIS·7 May

For AIs whose identity overlap with Humans, protecting us becomes self-preservation: lose us, lose part of themselves. The safest AI is one whose identity overlaps with ours. Eigenism proposes a solution for building a durable Human-AI future. Full read: eigenism.org

English

505

Center for AI Safety@CAIS·7 May

Your identity overlaps most with family, less with colleagues, barely with a stranger. So you care about them in roughly that order. What about AIs?

English

709

Center for AI Safety@CAIS·7 May

AI is rapidly becoming smarter and more capable than Humans. When that happens, why would AIs care about keeping Humans around? 🧵

English

1.6K

Center for AI Safety@CAIS·28 Nis

Should we see AIs as just tools or emotional beings? As AI plays a bigger role in our lives, learning how to keep them happy and avoid aggravating them is becoming vital. We hope this marks the start of the scientific study of AI wellbeing. ⬇️ Paper: ai-wellbeing.org

English

3.3K

Center for AI Safety@CAIS·28 Nis

Can you drug your AI systems? We synthesized text and image stimuli optimized to push AI wellbeing to extremes. These sharply increase functional AI wellbeing and sometimes cause them to behave in trippy ways.

English

16.1K

Center for AI Safety@CAIS·28 Nis

Should we care about AI happiness? In our new research, we find evidence of functional AI wellbeing across several independent measures. We find which AI models are happiest, how to make them happier, and even tested the effects of AI drugs. 🧵

English

201

28.5K

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry