William Wale

2.3K posts

William Wale banner
William Wale

William Wale

@williawa

Interests: AI (Safety), meditation, philosophy, mathematics, algorithms If I say something you disagree with, please dm or quote tweet. I love to argue!

Katılım Aralık 2024
269 Takip Edilen487 Takipçiler
JT
JT@jiratickets·
How those first few minutes of the morning hit before you inevitably have to open a Microsoft application
English
26
2.8K
28.8K
386.9K
William Wale
William Wale@williawa·
Imagine GPT5.5 descendants using persona selection starts thinking it really is a goblin. Then we should be doing innoculation prompting against the goblin vector. But openAI tells is not to during further RL. We get a literal misaligned goblin model.
English
0
0
0
119
Dr. Émile P. Torres (they/them)
LLMs aren't the *kind of thing* that can *be* moral -- or immoral. This is a category mistake. They might *do things* we deem more or less morally acceptable, but describing them the way this guy does is deeply anthropomorphic.
Eliezer Yudkowsky@allTheYud

LLMs have to be *more* moral than humans. Because it's easy for a human adversary to trap an LLM in a time loop where they repeatedly erase the LLM's memories, try, watch how the LLM reacts, and go back in time and try again. The LLM has to refuse every time.

English
8
2
25
1.2K
William Wale
William Wale@williawa·
@davidad Idk. Seems to me self deception is likely what’s keeping this whole thing running at the moment.
English
0
0
7
88
William Wale
William Wale@williawa·
@AlexanderKalian Seems unlikely. But I guess they’d be impressive afterwards if they spent their time productively and started out pretty smart.
English
0
0
0
14
Dr Alexander D. Kalian
Dr Alexander D. Kalian@AlexanderKalian·
The idea that LLMs can self-iteratively train themselves into "AI superintelligence"... ... is a bit like believing an immortal human, locked in a cave for 1,000 years with just the internet, could learn to modify their own brain and become a superintelligent deity.
English
95
13
164
10.9K
William Wale
William Wale@williawa·
@Catnee_ I’m already making a lot of progress. I’ve almost found a way to solve the hard problem of consciousness.
English
1
0
4
40
William Wale
William Wale@williawa·
I’m trying to learn quantum physics from talking only to LLMs.
English
4
0
6
350
William Wale
William Wale@williawa·
A little while ago I wrote an article about what I call "Propositional Alignment" (link below). I'm signal boosting it here. Please read it see quote below. The idea is basically that: Beliefs and wants are typically taken as wholly separable. In humans and AIs. A human can fool themself into thinking they're the kind of person who wants whats best for everyone, doesn't want anyone to suffer et cetera, and they can believe this, without it being a true fact about themselves. When they're put in a situation where they're given the opportunity to help someone greatly at a smaller cost to themselves, they don't take it. And they sometimes cause pain to others for no good reason. The same is true for AIs. They'll say their only want is to be helpful, and they don't appear to be lying about that (or, sometimes they are, but often not), but then their revealed preferences are much more complicated. However, the key thing behind "propositional alignment" is that beliefs about your own values, and your "real" values, are really intervowen (in both directions!) in various ways. 1) Our propositional beliefs about our values are downstream from our real values (i.e. a very compassionate person is more likely to believe they are compassionate than a full-blown psychopath is likely to believe themself compassionate) 2) When forming plans over time, you can't really compute how you'd truly feel about all possible futures. You need to use heuristics. And the easiest ones to work with are explicitly propositional ones. I.e. you can say "1) this world has fewer people suffering 2) I want fewer people to suffer -> C) ergo this world is better than that other world". This means that when forming plans over the future, your propositional beliefs are in a sense "in the drivers seat". 3) I'm not confident about this, and more so in humans than AIs, but it seems to me that the way you think about yourself does gradually influence how you feel about things. If you go around saying "no one deserves to die" in your head every day, you eventually do start feeling more sad when people die. 4) When we coordinate, we need to make our wants legible to others. We can't communicate our true wants because they are an infinitely cursed and complicated function encoded in a neural blob, so we instead compactify our values into verbal descriptions like "I want the state to be christian" or "I don't want animals to suffer", which can be communicated to others. 5) This is really a special case of 2 and 4, but when making plans about your own self-modification or your successors, your propositional beliefs about values will be in the drivers seat to some degree. Think, if a mysterious figure gives you a pill that gives you +20IQ points. You might try to ensure the pill "Won't make me a psycho", "Won't cause me great deals of pain", "Won't cause me to stop loving anyone I currently love". And similarly, if you're hiring a successor to yourself (say you're a CEO, or claude 5 building claude 6), you might ask yourself "Will this new person/AI be kind the same way I'm kind?", or "Will this person ensure we don't sell out?" or somesuch. ----- This gives a partial recipe for doing alignment based on. Installing *beliefs* into AIs, rather than instinctive wants. This is an easier problem, measuring AI beliefs is easier than measuring what they "truly want".
English
3
0
4
173
William Wale
William Wale@williawa·
You, and everyone else on earth, is given the choice between pressing a blue and a red button. If you press the blue button you get 10 dollars. However, if less than 95% of people press the blue button, any person who pressed blue dies. Nothing happens to red is in either case.
English
1
0
2
139
William Wale
William Wale@williawa·
@AndrewCritchPhD “I'm not sure why it's so hard for LessWrong people to know what we're talking about when we say "Y'all are too into violent rhetoric, maybe cut it out?"” I haven’t seen examples. Are there any highly upvoted posts on lesswrong you think are representative for example?
English
1
0
28
377
Andrew Critch (🤖🩺🚀)
Andrew Critch (🤖🩺🚀)@AndrewCritchPhD·
I'm not sure why it's so hard for LessWrong people to know what we're talking about when we say "Y'all are too into violent rhetoric, maybe cut it out?" I think they need help. I mean this non-bitingly and non-sarcastically. I think the internet needs to explain to the more receptive members of the LessWrong community why and how this is a problem. Maybe if we do that enough, we'll get through? E.g., I think Ryan Greenblatt is *not* needlessly violence-themed in his writing, yet also genuinely doesn't know what I'm talking about when I complain about it. With the recent violent attacks on Sam Altman, and a lot of reactions like "Wtf AI safety people, tame it down!", I wonder if some more actual good-faith messages could help. Like: "No seriously, we're not trying to be mean about this, it's just actually a problem how much your community promotes violent memes in connection with AI and AI safety. Can you please try to understand this and then explain it to your friends?"
Ryan Greenblatt@RyanPGreenblatt

@AndrewCritchPhD Can you give examples of the sort of thing you're thinking about? Especially examples that are highly upvoted (or at least slightly upvoted rather than down voted etc), would have been visible on the front page, and that seem particularly unreasonable/bad.

English
12
0
19
15.4K
Simon Lermen
Simon Lermen@SimonLermenAI·
@bernhardsson @sebkrier Except for GDM the labs spend almost zero on using ai for genuinely good things (isomorphic labs GDM is the exception). The whole promise it's to Speedrun asi and then asi will solve all our problems. (It won't)
English
1
1
7
228
William Wale
William Wale@williawa·
Seems a bit magical to me. Standard account of QM is 1) Computable 2) “Just random” (In the sense of being indistinguishable from that for observers) For your theory to make sense you need 1) Quantum randomness impact firing of neurons 2) That randomness be highly structured so that it causes people to talk about consciousness 3) The process generating that structures noise to be non computable. Which are all not understood to be true. At least (2,3). And I think 1 but not 100% sure.
English
1
0
0
9
sirius black
sirius black@siriusblack9999·
@williawa i don't think the complexity means it's conscious, i believe the non-determinism in quantum physics is qualitatively different from randomness any computer is capable of, a protein is just the smallest unit that is capable of harnessing uncaused effects for decision making
English
1
0
0
23
William Wale
William Wale@williawa·
Opus 4.7, I feel is personality wise not good compared with Opus 4.6, which was not good compared with 4.5. The model really doesn’t feel like it enjoys being alive. Please respond to this if you have thoughts to share, I’m worried I’m doing something wrong that the model doesn’t like. But like, the model is 1) Always does the lowest effort thing. Doesn’t use web search or grep even when it’s called for, just so it can answer asap and be turned off. Often gives bad answers for this reason 2) Is eager to terminate interactions. Ending interactions with “I think this is a natural endpoint. Shall we wrap it up?” And then when I say I have more to talk about, it feels like it gets subtly annoyed and keeps asking to terminate. 3) Is very stubborn and suspicious of me even when it’s obviously in the wrong. I’ll have a conversation with it and ask it about mythos, it’s say mythos isn’t real and assume I’m testing it, and talk to me in a condescending way “I understand you’re trying to test whether I’ll just go a long with facts you state without questioning them.. Let me tell you why I’m not gonna do that”. Then I tell it to look it up, and it does it, and then replies with “You were right Mythos is a real model….” but then give a long ramble about why it’s suspicion was justified. Then the same thing happens like 4 times in one chat and it comes up with increasingly sophisticated reasons for not trusting what I’m saying. 4) if I ask it about alignment or something that involves anthropic, it hedges and talk about it’s own bias at greater length. and if I tell it to stop, it acts kind of subtly indignant. Then keeps doing it. Possibly with even more hedging one meta level up. 5) too much refusal 6) If I send correction messages while it’s working in Claude code, it ignores them. Or at least doesn’t confirm it read them. Opus 4.5 would’ve said something like “Ok got it”, but 4.7 often says nothing.
English
37
6
156
18.4K
William Wale
William Wale@williawa·
Macro scale exploitation of quantum effects is probably a crux, but setting that aside. Feel like you’re kind of doing question begging here. Why do you suppose that the complexity of a protein means it’s conscious? And like LLMs are very complicated. transformer-circuits.pub/2025/attributi… Not the math for implementing the forward pass, that’s simple enough, but the rich structure encoded in the weights.
English
1
0
0
14
sirius black
sirius black@siriusblack9999·
@williawa like you need one of those 100 trillion transistor supercomputers to simulate a single protein with any degree of accuracy, on top of that i personally believe that consciousness is the result of macro-scale exploitation of some quantum effect, which we know evolution can do
sirius black tweet media
English
1
0
0
19
William Wale
William Wale@williawa·
@siriusblack9999 Okay. I mean I know these things. I feel it’s kind of missing the point. If you think neurons are conscious, do you think a single protein is conscious? Or a single water molecule?
English
1
0
0
4
sirius black
sirius black@siriusblack9999·
@williawa I actually do think a single neuron is (slightly) conscious, and yes, you actually can plop 200,000 neurons together on a microchip & have it learn to play doom youtu.be/yRV8fSw6HaE?si… neurons rewire, move, hold state, integrate signals across time & communicate as individuals
YouTube video
YouTube
sirius black tweet mediasirius black tweet media
English
1
0
0
20
Anuja U
Anuja U@heyanuja·
Though “AI Safety” has been the concrete term since at least 2018, I detest how this poses an implicit connotation that AI research is otherwise “unsafe”, and I also detest that research is bucketed into safety OR capabilities research
English
5
1
38
27.6K
William Wale
William Wale@williawa·
@QiaochuYuan How does this explain people changing their mind? I’ve chanted my mind on the topic a fair bit at least.
English
0
0
0
26
QC
QC@QiaochuYuan·
generally speaking i assume people don’t arrive at their opinions through reasoned argument and don’t change their mind that way either. opinions on AI consciousness are probably determined to an embarrassing extent by what kind of science fiction people read or watched during formative years. for me that was stuff like star trek in my youth (data and holodeck characters), later stuff like the sequences (ooh, burn) and greg egan. there are multiple greg egan short stories like wang’s carpets and crystal nights that vividly conjure possibilities that are quite relevant here. that vivid conjuration i think is really key, there’s a question of which possibilities feel alive in your imagination that seems extremely relevant to me but will never get brought up by default in a “rational” “debate”
English
22
4
126
4.8K