Center for AI Safety

271 posts

Center for AI Safety banner
Center for AI Safety

Center for AI Safety

@CAIS

Reducing societal-scale risks from AI.

San Francisco Katılım Ağustos 2022
2 Takip Edilen9.5K Takipçiler
Sabitlenmiş Tweet
Center for AI Safety
We’ve released a statement on the risk of extinction from AI. Signatories include: - Three Turing Award winners - Authors of the standard textbooks on AI/DL/RL - CEOs and Execs from OpenAI, Microsoft, Google, Google DeepMind, Anthropic - Many more safe.ai/statement-on-a…
English
151
351
1.1K
3M
Center for AI Safety
Thank you, Pope Leo XIV, for drawing attention to the importance of moral questions in AI development. Humanity is facing a unique challenge, and it’s in our power to overcome it.
Pope Leo XIV@Pontifex

In the era of #ArtificialIntelligence, when human dignity is threatened by new forms of dehumanization, ours is the pressing duty to remain profoundly human. We must lovingly safeguard the grandeur of humanity bestowed upon us and revealed in its fullness in Christ, the splendor of which no machine can ever replace. #MagnificaHumanitas vatican.va/content/leo-xi…

English
0
2
17
605
Center for AI Safety retweetledi
Long Phan
Long Phan@longphan3110·
AI freely criticizes Christianity but refuses to criticize Islam. AI companies have tried making models unbiased, but progress has been limited. We show how to measure political bias, and we developed a new training method to reduce it.
Long Phan tweet mediaLong Phan tweet mediaLong Phan tweet media
English
5
7
36
3.1K
Center for AI Safety
To fix this, we introduce Political Consistency Training. By training models to keep sentiment and helpfulness consistent across opposed topics, our resulting open model is less manipulative than GPT, Gemini, Grok, and Claude.
Center for AI Safety tweet mediaCenter for AI Safety tweet mediaCenter for AI Safety tweet media
English
1
0
5
494
Center for AI Safety
In our latest research, we find that AIs are subtly and pervasively politically manipulative. When we ask the same question about politically opposed topics, we find that AIs quietly favor one side. We show how to measure covert political manipulation and how to reduce it. 🧵
Center for AI Safety tweet mediaCenter for AI Safety tweet mediaCenter for AI Safety tweet media
English
6
11
43
2.7K
Center for AI Safety
AI safety is entering the political mainstream. Trump and Xi discussed AI "guardrails," and the issue has also gained attention on the left. This week's AI Safety Newsletter covers how we reached this point, alongside the most dramatic takeaways from the Musk v Altman trial.
Center for AI Safety tweet media
English
1
0
7
551
Center for AI Safety
For AIs whose identity overlap with Humans, protecting us becomes self-preservation: lose us, lose part of themselves. The safest AI is one whose identity overlaps with ours. Eigenism proposes a solution for building a durable Human-AI future. Full read: eigenism.org
English
1
0
10
505
Center for AI Safety
Your identity overlaps most with family, less with colleagues, barely with a stranger. So you care about them in roughly that order. What about AIs?
English
3
0
8
709
Center for AI Safety
AI is rapidly becoming smarter and more capable than Humans. When that happens, why would AIs care about keeping Humans around? 🧵
English
11
6
32
1.6K
Center for AI Safety
Should we see AIs as just tools or emotional beings? As AI plays a bigger role in our lives, learning how to keep them happy and avoid aggravating them is becoming vital. We hope this marks the start of the scientific study of AI wellbeing. ⬇️ Paper: ai-wellbeing.org
English
2
4
50
3.3K
Center for AI Safety
Can you drug your AI systems? We synthesized text and image stimuli optimized to push AI wellbeing to extremes. These sharply increase functional AI wellbeing and sometimes cause them to behave in trippy ways.
Center for AI Safety tweet mediaCenter for AI Safety tweet media
English
2
1
58
16.1K
Center for AI Safety
Should we care about AI happiness? In our new research, we find evidence of functional AI wellbeing across several independent measures. We find which AI models are happiest, how to make them happier, and even tested the effects of AI drugs. 🧵
Center for AI Safety tweet mediaCenter for AI Safety tweet media
English
15
49
201
28.5K