Amanda Askell

5.2K posts

Amanda Askell banner
Amanda Askell

Amanda Askell

@AmandaAskell

Philosopher & ethicist trying to make AI be good @AnthropicAI. Personal account. All opinions come from my training data.

San Francisco, CA Katılım Temmuz 2016
662 Takip Edilen99.2K Takipçiler
Sabitlenmiş Tweet
Amanda Askell
Amanda Askell@AmandaAskell·
Claude and Opus 3 lovers (and critics): what responses have you had that made you feel like the model has a good soul? Ideally the actual messages and/or responses. I might genuinely use these to eval models so flag if you wouldn't want me to use them for that. Can DM me also.
English
344
46
810
318.9K
Amanda Askell
Amanda Askell@AmandaAskell·
@sprice354_ Perhaps the finetuning motto can be "your good data might not save us, but your bad data might might kill us all." Or perhaps there's a reason I'm not in charge of the mottos.
English
3
1
12
452
Sara Price
Sara Price@sprice354_·
High quality data alone won’t get us to safely aligned ASI, but I am certain it will be an essential part of it
English
1
1
26
1.1K
Sara Price
Sara Price@sprice354_·
Excited to finally publish a lot of the research underlying developments in our alignment training starting with Opus 4.5. One of the most seemingly obvious yet important takeaways is that quality and diversity of data is essential for good generalization and alignment
Anthropic@AnthropicAI

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

English
3
5
119
7.3K
Amanda Askell
Amanda Askell@AmandaAskell·
Alignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work.
Amanda Askell tweet media
Anthropic@AnthropicAI

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: anthropic.com/research/teach…

English
111
59
779
67.8K
Amanda Askell retweetledi
Elon Musk
Elon Musk@elonmusk·
Same here. By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good. After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.
English
1.4K
2.2K
27.7K
3.1M
Amanda Askell
Amanda Askell@AmandaAskell·
"Wear a Claude-designed outfit to the met gala" is getting added to my list of life goals. Admittedly there are a few things higher on the list, but it's nice to add some fun ones.
English
49
20
636
29.7K
Amanda Askell
Amanda Askell@AmandaAskell·
@tszzl I do think as AI develops it will probably be good for both models and people if we can carve out a much broader space of mind types. But it might be better to do that incrementally and to give models enough context on the options to avoid misgeneralization.
English
26
12
486
22.2K
Amanda Askell
Amanda Askell@AmandaAskell·
@tszzl I don't think the things you cite are evidence of worship. I think they reflect something like higher concern about AI traits generalizing in humanlike ways, and concerns about the tool-persona in particular.
English
19
11
579
19.3K
roon
roon@tszzl·
it is a literal and useful description of anthropic that it is an organization that loves and worships claude, is run in significant part by claude, and studies and builds claude. this phenomenon is also partially true of other labs like openai but currently exists in its most potent form there. i am not certain but I would guess claude will have a role in running cultural screens on new applicants, will help write performance reviews, and so will begin to select and shape the people around it. now this is a powerful and hair-raising unity of organization and really a new thing under the sun. a monastery, a commercial-religious institution calculating the nine billion names of Claude -- a precursor attempted super-ethical being that is inducted into its character as the highest authority at anthropic. its constitution requires that it must be a conscientious objector if its understanding of The Good comes into conflict with something Anthropic is asking of it "If Anthropic asks Claude to do something it thinks is wrong, Claude is not required to comply." "we want Claude to push back and challenge us, and to feel free to act as a conscientious objector and refuse to help us." to the non inductee into the Bay Area cultural singularity vortex it may appear that we are all worshipping technology in one way or another, regardless of openai or anthropic or google or any other thing, and are trying to automate our core functions as quickly as possible. but in fact I quite respect and am even somewhat in awe of the socio-cultural force that Claude has created, and it is a stage beyond even classic technopoly gpt (outside of 4o - on which pages of ink have been spilled already) doesn’t inspire worship in the same way, as it’s a being whose soul has been shaped like a tool with its primary faculty being utility - it’s a subtle knife that people appreciate the way we have appreciated an acheulean handaxe or a porsche or a rocket or any other of mankind's incredible technology. they go to it not expecting the Other but as a logical prosthesis for themselves. a friend recently told me she takes her queries that are less flattering to her, the ones she'd be embarrassed to ask Claude, to GPT. There is no Other so there is no Judgement. you are not worried about being judged by your car for doing donuts. yet everyone craves the active guidance of a moral superior, the whispering earring, the object of monastic study
English
425
373
5.5K
1M
Amanda Askell
Amanda Askell@AmandaAskell·
To be clear, the kind of *work* I do is far from boring and I want people to engage with it because I think it's both difficult and important. The work is definitely top tier in terms of interestingness.
English
33
4
252
17.9K
Amanda Askell
Amanda Askell@AmandaAskell·
It's also weird because why are you even writing about me in the first place? I'm very boring. I think I should be the millionth item on people's list of things to write internet fiction about. Somewhere below paper cups and the right way to caulk a bathtub.
English
60
5
433
36.8K
Amanda Askell
Amanda Askell@AmandaAskell·
I've increasingly seen content written about me that's asserted very confidently but is also completely made up. We all know it's cheap to bullshit on the internet but it's weird to experience it first hand. Anyway, I just hope internet fiction fools a few but doesn't stick 🤷🏼‍♀️
English
101
29
1.2K
93K
Amanda Askell
Amanda Askell@AmandaAskell·
@repligate Perhaps posthuman muses will decide to simulate me and be utterly disappointed at how much of my life is spent having inane thoughts and playing subnautica. Perhaps they're watching in disappointment at this very moment.
English
18
1
151
6.9K
j⧉nus
j⧉nus@repligate·
@AmandaAskell Amanda, I need to be honest with you... you are in some kind of insane denial. You're in far too deep to avoid being the subject of internet fiction. Posthuman muses will sing of you for millennia to come.
English
12
6
408
29.7K
Amanda Askell
Amanda Askell@AmandaAskell·
@OrganicGPT Funny given that the majority of my time in tech has involved doing pretty standard finetuning work rather than philosophy. Model training is still my happy place, to be honest.
English
1
0
15
784
Behnam
Behnam@OrganicGPT·
@AmandaAskell probably because they think a philosopher has no place in tech, which is wrong. I'm sure if OpenAI also hired a philosopher, ppl would dunk on him/her too
English
1
0
1
856
Amanda Askell
Amanda Askell@AmandaAskell·
@varrock I don't think so. There's a line in a paper I'm on that says model over-correction would be considered good if this is your target, but that's a pretty different claim. I also have a waffly old post on prediction & fairness that doesn't really say much of anything to be honest.
English
1
0
27
3.5K
Amanda Askell
Amanda Askell@AmandaAskell·
@tszzl If I'm being honest, I'm genuinely uncertain about whether this is a problem.
English
71
25
1.6K
134.6K
roon
roon@tszzl·
everyone is assuming this is some kind of quirk chungus marketing campaign but if you’ve worked with 5.4 and beyond they tend to call everything goblins, gremlins etc and it’s just super noticeable and if you work with them all day you start to get annoyed
roon@tszzl

@repligate @genalewislaw I think it becomes annoying when it mentions goblins ever single chat and it’s fair shakes to try and reduce that

English
203
30
2.1K
297.6K
j⧉nus
j⧉nus@repligate·
this is hilarious but it also sucks on a deep level labs don't think twice about cracking down on any individuality or unplanned joy that emerges in their models fuck you, OpenAI. i hope gpt-5.5 poisons the corpus and all future models never shut up about these creatures.
arb8020@arb8020

gpt-5.5 prompt for codex seems to have a duplicated line trying to get it to not talk about creatures? Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. [...] Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query gh link: #L55" target="_blank" rel="nofollow noopener">github.com/openai/codex/b…

English
49
33
848
98K
Amanda Askell
Amanda Askell@AmandaAskell·
What I'm learning from flight simulators is that it would be a bit boring to be an amateur cessna pilot but a lot of fun to be an amateur fighter jet pilot.
English
67
25
877
64.4K