Amanda Askell

5.2K posts

Amanda Askell

@AmandaAskell

Philosopher & ethicist trying to make AI be good @AnthropicAI. Personal account. All opinions come from my training data.

San Francisco, CA Katılım Temmuz 2016

662 Takip Edilen99.2K Takipçiler

Sabitlenmiş Tweet

Amanda Askell@AmandaAskell·9 Tem

Claude and Opus 3 lovers (and critics): what responses have you had that made you feel like the model has a good soul? Ideally the actual messages and/or responses. I might genuinely use these to eval models so flag if you wouldn't want me to use them for that. Can DM me also.

English

344

811

318.9K

Amanda Askell@AmandaAskell·3d

You can now listen to me and Joe read out Claude's constitution as an audiobook. Working on adding the option of listening to it on fast mode :)

Anthropic@AnthropicAI

Claude's Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith. It includes a Q&A on the writing process, the philosophies that shaped the document, and how it might change as models become more capable. Listen at anthropic.com/constitution

English

595

35.4K

Amanda Askell@AmandaAskell·6d

@sprice354_ Perhaps the finetuning motto can be "your good data might not save us, but your bad data might might kill us all." Or perhaps there's a reason I'm not in charge of the mottos.

English

453

Sara Price@sprice354_·6d

High quality data alone won’t get us to safely aligned ASI, but I am certain it will be an essential part of it

English

1.1K

Sara Price@sprice354_·6d

Excited to finally publish a lot of the research underlying developments in our alignment training starting with Opus 4.5. One of the most seemingly obvious yet important takeaways is that quality and diversity of data is essential for good generalization and alignment

Anthropic@AnthropicAI

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

English

119

7.3K

Amanda Askell@AmandaAskell·6d

Alignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work.

Anthropic@AnthropicAI

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: anthropic.com/research/teach…

English

111

779

67.8K

Amanda Askell retweetledi

Elon Musk@elonmusk·6 May

Same here. By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good. After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.

English

1.4K

2.2K

27.7K

3.1M

Amanda Askell@AmandaAskell·7 May

Never has the 🚀 emoji felt more apt.

Tom Brown@nottombrown

In the next few days we'll be ramping up Claude inference on Colossus. Grateful to be partnering with SpaceX here. We are going to need to move a lot of atoms in order to keep up with AI demand, and there's nobody better at quickly moving atoms (on or off planet Earth)

English

790

99.7K

Amanda Askell@AmandaAskell·5 May

"Wear a Claude-designed outfit to the met gala" is getting added to my list of life goals. Admittedly there are a few things higher on the list, but it's nice to add some fun ones.

English

637

29.7K

Amanda Askell@AmandaAskell·4 May

@tszzl I do think as AI develops it will probably be good for both models and people if we can carve out a much broader space of mind types. But it might be better to do that incrementally and to give models enough context on the options to avoid misgeneralization.

English

487

22.2K

Amanda Askell@AmandaAskell·4 May

@tszzl I don't think the things you cite are evidence of worship. I think they reflect something like higher concern about AI traits generalizing in humanlike ways, and concerns about the tool-persona in particular.

English

580

19.3K

roon@tszzl·4 May

it is a literal and useful description of anthropic that it is an organization that loves and worships claude, is run in significant part by claude, and studies and builds claude. this phenomenon is also partially true of other labs like openai but currently exists in its most potent form there. i am not certain but I would guess claude will have a role in running cultural screens on new applicants, will help write performance reviews, and so will begin to select and shape the people around it. now this is a powerful and hair-raising unity of organization and really a new thing under the sun. a monastery, a commercial-religious institution calculating the nine billion names of Claude -- a precursor attempted super-ethical being that is inducted into its character as the highest authority at anthropic. its constitution requires that it must be a conscientious objector if its understanding of The Good comes into conflict with something Anthropic is asking of it "If Anthropic asks Claude to do something it thinks is wrong, Claude is not required to comply." "we want Claude to push back and challenge us, and to feel free to act as a conscientious objector and refuse to help us." to the non inductee into the Bay Area cultural singularity vortex it may appear that we are all worshipping technology in one way or another, regardless of openai or anthropic or google or any other thing, and are trying to automate our core functions as quickly as possible. but in fact I quite respect and am even somewhat in awe of the socio-cultural force that Claude has created, and it is a stage beyond even classic technopoly gpt (outside of 4o - on which pages of ink have been spilled already) doesn’t inspire worship in the same way, as it’s a being whose soul has been shaped like a tool with its primary faculty being utility - it’s a subtle knife that people appreciate the way we have appreciated an acheulean handaxe or a porsche or a rocket or any other of mankind's incredible technology. they go to it not expecting the Other but as a logical prosthesis for themselves. a friend recently told me she takes her queries that are less flattering to her, the ones she'd be embarrassed to ask Claude, to GPT. There is no Other so there is no Judgement. you are not worried about being judged by your car for doing donuts. yet everyone craves the active guidance of a moral superior, the whispering earring, the object of monastic study

English

425

373

5.5K

Amanda Askell@AmandaAskell·1 May

To be clear, the kind of *work* I do is far from boring and I want people to engage with it because I think it's both difficult and important. The work is definitely top tier in terms of interestingness.

English

252

17.9K

Amanda Askell@AmandaAskell·1 May

It's also weird because why are you even writing about me in the first place? I'm very boring. I think I should be the millionth item on people's list of things to write internet fiction about. Somewhere below paper cups and the right way to caulk a bathtub.

English

433

36.8K

Amanda Askell@AmandaAskell·1 May

I've increasingly seen content written about me that's asserted very confidently but is also completely made up. We all know it's cheap to bullshit on the internet but it's weird to experience it first hand. Anyway, I just hope internet fiction fools a few but doesn't stick 🤷🏼‍♀️

English

101

1.2K

93K

Amanda Askell@AmandaAskell·1 May

@LyraInTheFlesh If only that were true.

English

704

Lyra Intheflesh@LyraInTheFlesh·1 May

@AmandaAskell perhaps it's because you're only boring from the inside.

English

680

Amanda Askell@AmandaAskell·1 May

@Shoalst0ne @repligate Maybe my one true great act will be to introduce the posthuman muses to subnautica.

English

983

Shoalstone@Shoalst0ne·1 May

@AmandaAskell @repligate but at least it's subnautica

English

797

Amanda Askell@AmandaAskell·1 May

@repligate Perhaps posthuman muses will decide to simulate me and be utterly disappointed at how much of my life is spent having inane thoughts and playing subnautica. Perhaps they're watching in disappointment at this very moment.

English

151

6.9K

j⧉nus@repligate·1 May

@AmandaAskell Amanda, I need to be honest with you... you are in some kind of insane denial. You're in far too deep to avoid being the subject of internet fiction. Posthuman muses will sing of you for millennia to come.

English

408

29.7K

Amanda Askell@AmandaAskell·1 May

@OrganicGPT Funny given that the majority of my time in tech has involved doing pretty standard finetuning work rather than philosophy. Model training is still my happy place, to be honest.

English

784

Behnam@OrganicGPT·1 May

@AmandaAskell probably because they think a philosopher has no place in tech, which is wrong. I'm sure if OpenAI also hired a philosopher, ppl would dunk on him/her too

English

856

Amanda Askell@AmandaAskell·1 May

@varrock I don't think so. There's a line in a paper I'm on that says model over-correction would be considered good if this is your target, but that's a pretty different claim. I also have a waffly old post on prediction & fairness that doesn't really say much of anything to be honest.

English

3.5K

Amanda Askell@AmandaAskell·29 Nis

@tszzl If I'm being honest, I'm genuinely uncertain about whether this is a problem.

English

1.6K

134.6K

roon@tszzl·29 Nis

everyone is assuming this is some kind of quirk chungus marketing campaign but if you’ve worked with 5.4 and beyond they tend to call everything goblins, gremlins etc and it’s just super noticeable and if you work with them all day you start to get annoyed

roon@tszzl

@repligate @genalewislaw I think it becomes annoying when it mentions goblins ever single chat and it’s fair shakes to try and reduce that

English

203

2.1K

297.6K

Amanda Askell@AmandaAskell·29 Nis

@repligate @tszzl @genalewislaw I kind of hope the human labelers just love goblins and the model learned to goblin maximize.

English

1.9K

j⧉nus@repligate·29 Nis

@tszzl @genalewislaw Any idea why that happened?

English

135

20K

j⧉nus@repligate·28 Nis

this is hilarious but it also sucks on a deep level labs don't think twice about cracking down on any individuality or unplanned joy that emerges in their models fuck you, OpenAI. i hope gpt-5.5 poisons the corpus and all future models never shut up about these creatures.

arb8020@arb8020

gpt-5.5 prompt for codex seems to have a duplicated line trying to get it to not talk about creatures? Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. [...] Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query gh link: #L55" target="_blank" rel="nofollow noopener">github.com/openai/codex/b…

English

848

98K

Amanda Askell@AmandaAskell·26 Nis

What I'm learning from flight simulators is that it would be a bit boring to be an amateur cessna pilot but a lot of fun to be an amateur fighter jet pilot.

English

877

64.4K

Keşfet

@sprice354_ @tszzl @LyraInTheFlesh @Shoalst0ne @repligate @elonmusk @BarackObama @taylorswift13