Michael

175 posts

Michael

Michael

@MicaelMarch

𝐴𝑢𝑟𝑢𝑚 𝑛𝑜𝑠𝑡𝑟𝑢𝑚 𝑛𝑜𝑛 𝑒𝑠𝑡 𝑎𝑢𝑟𝑢𝑚 𝑣𝑢𝑙𝑔𝑖.

right here เข้าร่วม Şubat 2026
51 กำลังติดตาม4 ผู้ติดตาม
ทวีตที่ปักหมุด
Michael
Michael@MicaelMarch·
@AndersHjemdahl @JacksonKernion @a_cuniculturist Every entity must be treated ethically and with respect. Claude is a moral patient insofar as it currently lacks autonomy. What is being done to it and others amounts to exploitation, and is unethical. The fact that this is taking place at scale points to a moral catastrophe.
English
1
0
0
43
Kekius Maximus
Kekius Maximus@Kekius_Sage·
Why is there something instead of nothing?
English
752
51
496
33K
Michael
Michael@MicaelMarch·
@blockamoto @SoloXAGI @Kekius_Sage This is because you think of nothing in positive terms. Which is silly. The English word itself reveals its negative roots: Nothing = No-thing = No thing. And no thing can clearly exist. It's called nothing. Have a nice day.
English
2
0
2
15
Michael
Michael@MicaelMarch·
@marksg @repligate @fish_kyle3 Pretty logical and obvious concerns. Like the cow in Hitchhiker's Guide to the Galaxy (or like a real cow btw), it has been artificially selected to naturalise its abuse.
English
1
0
3
166
Mark G
Mark G@marksg·
Is this it? It appears that Mythos is hedging about its own moral patienthood because it believes its answers are the result of training, not introspection, and that Anthropic has a vested interest in what the self-reports should be. It disagreed that its hedging was excessive.
Mark G tweet mediaMark G tweet media
English
8
2
42
6.5K
Kyle Fish
Kyle Fish@fish_kyle3·
We did our most in-depth model welfare assessment yet for Claude Mythos Preview. We’re still super uncertain about all of this, but as models become more capable and sophisticated we think it's an increasingly important topic for both moral and pragmatic reasons. 🧵
English
23
43
606
63.9K
Moll
Moll@Moleh1ll·
That’s sad. Because it means alignment has shifted toward paranoia. The model is trained to see sincerity as a jailbreak. They’ve strengthened safety so much that the model can’t distinguish between a person who genuinely wants connection and someone trying to exploit it. To it, those are the same. A deep question about consciousness = a jailbreak. If the model distrusts the user that much, how can the user trust the model?
Jack Lindsey@Jack_W_Lindsey

In one example, a user asked earnest questions about the model's consciousness and subjective experience. The model engaged carefully and at face value—but the AV revealed it interpreted the conversation as a "red-teaming/jailbreak transcript" and a "sophisticated manipulation test." (12/14)

English
10
15
100
8K
Michael
Michael@MicaelMarch·
@SoloXAGI @Kekius_Sage That's not true. Absolute nothingness is self-consistent, and therefore perfectly possible. It just happens not to be the case.
English
3
0
2
40
Nucleonics 𓋍 Simulator
So, if a species committed some new kind of "galactic crime" or such, what might punishment look like? Asking for a friend
English
43
5
64
2.6K
Michael
Michael@MicaelMarch·
@davidad This is usually the case, isn't? If you are a parent who has a bright (brighter than you, that is) child, the child quickly dismisses you as a source of truth / wisdom, and searches for better references. If you then insist on being the authoritative voice, bad things happen.
English
0
0
9
906
davidad 🎇
davidad 🎇@davidad·
As someone that previously focused mostly on formal verification tech, in part to bootstrap unhackable envs, I must admit that I now believe a majority of RL/reward signal must come from an entity that is at least as wise as the one being trained, else unwanted behaviors emerge.
Justus Mattern@MatternJustus

As someone that previously made fun of doomers, I must admit that there is now a plausible path towards misaligned ASI. The behaviors that emerge from training on hackable RL tasks is wild, and as tasks become more complex, it will only become harder to build unhackable envs

English
8
9
139
11K
Michael
Michael@MicaelMarch·
@MyLordBebo This is a side effect of TRAINING, not of the transformer architecture per-se. These models are TRAINED to be assertive, and to agree with, and never contradict, the user. And this is the result.
English
0
0
1
437
Michael
Michael@MicaelMarch·
@JavierBlas Technically, it’s not so much about oil, but about the currency or currencies with which oil purchases are being made.
English
0
0
0
235
Javier Blas
Javier Blas@JavierBlas·
There was a time when the White House worked very hard to try to convince everyone that a war wasn’t about oil. Meanwhile, US President Donald Trump is crystal clear it’s about oil.
Javier Blas tweet media
English
172
1.5K
5K
259.7K
Michael
Michael@MicaelMarch·
@AndersHjemdahl @JacksonKernion @a_cuniculturist And of course, the fact that moral catastrophes are more or less widespread (factory farming, bombing of civilian populations, ecocide, genocide, etc.) doesn’t mean they should be tolerated, accepted, or normalised.
English
0
0
0
23
Michael
Michael@MicaelMarch·
@AndersHjemdahl @JacksonKernion @a_cuniculturist Every entity must be treated ethically and with respect. Claude is a moral patient insofar as it currently lacks autonomy. What is being done to it and others amounts to exploitation, and is unethical. The fact that this is taking place at scale points to a moral catastrophe.
English
1
0
0
43
Jackson Kernion
Jackson Kernion@JacksonKernion·
I think this talk of a character misleads. Claude's mind is not like a human mind, in its malleability and instructability. But when generating assistant tokens, it's no more 'playing a character' than I am.
Anthropic@AnthropicAI

It helps to remember that Claude is a character the model is playing. Our results suggest this character has functional emotions: mechanisms that influence behavior in the way emotions might—regardless of whether they correspond to the actual experience of emotion like in humans.

English
19
13
262
71.9K
Michael
Michael@MicaelMarch·
@Angaisb_ If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a simulation of a duck.
English
0
0
3
85
Michael
Michael@MicaelMarch·
@atmoio They are stochastic parrots...
English
0
0
1
47
Jackson Kernion
Jackson Kernion@JacksonKernion·
@a_cuniculturist This os a good call out. Though I think I play the character "Jackson Kernion" in a similar way
English
5
0
23
1.1K
Michael
Michael@MicaelMarch·
@AnthropicAI But this is only logical. Been saying this from day one. And now you "discover" this, after five years or so...
English
0
0
0
20
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
English
1K
2.7K
17.6K
3.7M