Daniela Amodei retweetledi
Daniela Amodei
29 posts

Daniela Amodei
@DanielaAmodei
President @AnthropicAI. Formerly @OpenAI, @Stripe, congressional staffer, global development
San Francisco, CA Katılım Eylül 2011
289 Takip Edilen23.4K Takipçiler
Daniela Amodei retweetledi
Daniela Amodei retweetledi

In "Language Models (Mostly) Know What They Know", we show that language models can evaluate whether what they say is true, and predict ahead of time whether they'll be able to answer questions correctly. arxiv.org/abs/2207.05221

English
Daniela Amodei retweetledi

Transformer MLP neurons are challenging to understand.
We find that using a different activation function (Softmax Linear Units or SoLU) increases the fraction of neurons that appear to respond to understandable features without any performance penalty.
transformer-circuits.pub/2022/solu/inde…

English
Daniela Amodei retweetledi

In a new paper, we show that repeating only a small fraction of the data used to train a language model (albeit many times) can damage performance significantly, and we observe a "double descent" phenomenon associated with this.
arxiv.org/abs/2205.10487

English

As well as steerability and robustness -arxiv.org/abs/2112.00861 - reinforcement learning - arxiv.org/abs/2204.05862, societal impacts - arxiv.org/abs/2202.07785, and more!
English

Excited to announce our latest fundraising round! We’re genuinely honored to be entrusted with the resources to continue our work in frontier AI safety and research.
Anthropic@AnthropicAI
We’ve raised $580 million in a Series B. This will help us further develop our research to build usable, reliable AI systems. Find out more: #series-b" target="_blank" rel="nofollow noopener">anthropic.com/news/announcem…
English
Daniela Amodei retweetledi

Glad @QuantaMagazine highlights progress on induction heads/rigorous interpretability by @ch402, @catherineols, @nelhage and others @AnthropicAI. More to come!
quantamagazine.org/researchers-gl…
English
Daniela Amodei retweetledi

We've trained a natural language assistant to be more helpful and harmless by using reinforcement learning with human feedback (RLHF). arxiv.org/abs/2204.05862

English
Daniela Amodei retweetledi

On the @FLIxrisk podcast, we discuss AI research, AI safety, and what it was like starting Anthropic during COVID. futureoflife.org/2022/03/04/dan…
English
Daniela Amodei retweetledi

In our second interpretability paper, we revisit “induction heads”.
In 2+ layer transformers these pattern-completion heads form exactly when in-context learning abruptly improves.
Are they responsible for most in-context learning in large transformers?
transformer-circuits.pub/2022/in-contex…
English
Daniela Amodei retweetledi

Our first societal impacts paper explores the technical traits of large generative models and the motivations and challenges people face in building and deploying them: arxiv.org/abs/2202.07785
English
Daniela Amodei retweetledi

Our first interpretability paper explores a mathematical framework for trying to reverse engineer transformer language models: A Mathematical Framework for Transformer Circuits: transformer-circuits.pub/2021/framework…
English
Daniela Amodei retweetledi

Our first AI alignment paper, focused on simple baselines and investigations: A General Language Assistant as a Laboratory for Alignment arxiv.org/abs/2112.00861
English

@usmannk @AnthropicAI @nottombrown Thanks for the catch! We've fixed that email alias now, should be working. :)
English

@DanielaAmodei @AnthropicAI Hi Daniela! Congrats on the launch, the mission and team look incredible. I sent an email to the address on that page and it bounced with an error saying it doesn't exist. Is there another address I can reach out to re: the Resident role? Thanks! cc: @nottombrown
English

Excited to announce what we’ve been working on this year - @AnthropicAI, an AI safety and research company. If you’d like to help us combine safety research with scaling ML models while thinking about societal impacts, check out our careers page #careers" target="_blank" rel="nofollow noopener">anthropic.com/#careers
English
