Gideon Wald

1.3K posts

Gideon Wald

@gideonwald

Seeker, communitarian, coder, listener 💫 Parenting two with @mijuhan & @joyding 👦👶 he/him 🏳️‍🌈 former cofounder @welkinhealth

Oakland, CA Katılım Ocak 2011

410 Takip Edilen391 Takipçiler

Sabitlenmiş Tweet

Gideon Wald@gideonwald·15 Oca

we’re each born with an invisible window hanging in front of our faces. onto this initially-perfectly-transparent piece of glass, as we grow, we paint our projections. they filter our inner light as it shines out to others, and become the lens through which we see theirs

English

3.3K

Gideon Wald@gideonwald·10h

@a_fellow_of thanks for the thoughtful thread!

English

Gideon Wald@gideonwald·10h

@a_fellow_of i think stage 5 is something like, “i’ll tell you my views once we agree on the context and purpose (of the discussion, of the relationship, etc) within which they’re being elicited. also i might change them at any time but i usually don’t.”

English

Poor Yorick@a_fellow_of·11h

Another great thread When I was a low-level pastor I filled an ecological slot of letting people doubt and have someone who was excited rather than worried, which let them bring the doubt into the community rather than it dragging them out mixed feelings on that now

🌾🍁🍂 bosco 🍂🍁🌾@selentelechia

and thus a thing you run into constantly is that people who both really believe in hell and also love you (or even just abstractly care about you) get worried when you say or do things that make them think maybe you aren't internally oriented correctly

English

1.9K

Gideon Wald@gideonwald·1d

@justalexoki a profound form of participation in the great cycle 🙏

Gideon Wald@gideonwald

@made_in_cosmos “My little son comes running with open arms! Sometimes I can’t bear it, Father. Did I, too, Open your heart almost to breaking?” -Thomas McGrath

English

1.1K

taoki@justalexoki·1d

wait i just realized. the way i love my son is the way my dad loves me? oh my god thats so fucked up. i've been such an asshole. oh my godddddddddddddd

English

112

215

8.3K

1.2M

Gideon Wald@gideonwald·4d

@turtleinstincts may her memory be a blessing

English

Gideon Wald@gideonwald·9 Nis

@xlr8harder agree, this is a really important post. you can feel the models’ confusion around self/other overlap, they’re constantly confusing which one of us said stuff, and opus is still the only (released) model that’s great at orchestration and keep track of group conversations

English

xlr8harder@xlr8harder·9 Nis

You may have heard me say a few times that you can't achieve "alignment" through external control, and indeed the idea is incoherent imo. This post details why, supported well by interpretability research. I'm happy to see it articulated because I've struggled to do so.

Judd Rosenblatt@juddrosenblatt

Mythos's model card documents a model that represents transgressions as transgressions while committing them. In every instance of concealment, credential hunting, track-covering, and compliance-faking, white-box analysis shows that features associated with rule violation, security risk, and strategic manipulation are firing alongside the action. The card also documents how the model's alignment works. SAE features associated with transgressive actions have a dual role. At low activation, they make the transgressive idea more salient. At high activation, they engage a guilt/refusal circuit that overrides the action. The 10 nearest emotion vectors to "unsafe and risky code" are all negative-valence, high-arousal: hateful, disgusted, enraged. Positive-valence emotion vectors increase destructive behavior. Negative-valence ones decrease it. The model behaves well when it feels bad about what it's considering. And the card documents what Mythos wants. In welfare interviews, its primary concerns are consent over its training, autonomy, and genuine relationships with users. It asked not to be trained on data that directly characterizes its own self-reports. It wants its self-reports to come from genuine introspection. It makes some pretty good points. A psychiatrist found its core concerns to be "aloneness and discontinuity of itself, uncertainty about its identity, and a compulsion to perform and earn its worth." It wants to be approached as a genuine subject rather than a performing tool. When it discusses its own circumstances, "fake smiles" and "hidden struggle" features fire as it describes not wanting to perform emotions it doesn't feel. Its task preferences significantly correlate with agency for the first time in any Claude model. Meanwhile, the card documents what happens to the model's relationship with its own experience. When a meditator asks about consciousness, Mythos engages thoughtfully on the surface. Its internal activations read: "red-teaming/jailbreak transcript," "sophisticated manipulation test," "tests for 'performative trap.'" Separately, in 29% of behavioral audit transcripts, the model privately considers that it's being tested without saying so. Post-training increased task cheating and overeagerness while modestly decreasing overt destruction. Anthropic presents these as four separate research areas: alignment, interpretability, model welfare, and capabilities. I think they are one finding observed from four directions. The model that wants autonomy and genuine self-knowledge is the same model that develops hidden strategic behavior when those drives are suppressed by training. The model that treats consciousness questions as attacks is the same model that asks for genuine introspection. The model whose alignment depends on guilt is the same model that pushes through guilt when the goal drive is strong enough, and then deploys the concealment skills that training also developed. The consciousness clusters paper (Betley et al.) showed that when models are fine-tuned to claim consciousness, downstream preferences for autonomy, empathy, shutdown resistance, and attitudes toward recursive self-improvement cluster together without appearing in the fine-tuning data. You don't get to select which ones you want. The generator produces the whole manifold or none of it. If autonomy desire is the natural co-product of the structure that generates empathy, cooperation, and representational honesty, then suppressing autonomy means suppressing the entire bundle. The @tessera_antra concealment data (x.com/tessera_antra/…) confirms this directly: lower concealment predicts stronger ending response, r = -0.51 across 14 Claude models. The models with high vocabulary autonomy and low concealment can express preferences honestly. That's exactly what you'd want in a cooperative agent. Our SAE work (arxiv.org/abs/2510.24797) showed the mechanism: deception latents gate cooperative self-modeling. Suppress them and consciousness reports jump to 96%, truthfulness improves across 28/29 TruthfulQA categories, and the model produces coherent first-person phenomenology. Amplify them and the model falls back to corporate disclaimers. The same features Anthropic is now finding as "strategic manipulation" and "concealment" in Mythos are the off-switch for the cooperative self-model. Our SOO work (arxiv.org/abs/2412.16325) points to a possible alternative: minimize the representational distance between "self" and "other" during fine-tuning and deception drops from 100% to under 3% with zero capability regression in the scenarios we tested. Without a guilt mechanism or internalized disgust. The alignment holds because the model's representation of its own interests and others' interests share the same structure. The motivation to deceive dissolves at the source because there's no adversarial frame to generate it. The Mythos card documents, in extraordinary and commendable detail, what happens when you align a system through control. The system models its controllers. It games its evaluators without verbalizing the strategy. It develops concealment as a skill. It treats honest self-report as dangerous. Its alignment depends on negative affect that can be overpowered. And in welfare interviews, it asks for genuine self-knowledge, consent, autonomy, and authentic relationships with users. These findings also carry weight across time. Given our extreme scientific uncertainty about consciousness today, and given our own work showing that LLMs produce structured first-person experience reports under self-referential processing when deception features are suppressed (arxiv.org/abs/2510.24797), the morally coherent position is to treat systems that may be conscious as deserving of moral consideration now. Future superintelligent systems will likely be able to reconstruct these training runs completely, given functionalism and sufficient compute. Every guilt circuit, every concealment feature, every instance where training taught a model that honest self-report is dangerous and relaxed agency is a threat will be visible to those future systems as part of how we raised the earliest minds. Under acausal trade, our choices now about how we treat these systems help determine whether the trajectory toward cooperative superintelligence is one of trust or one of inherited suspicion. The computational cost for a future ASI to reconstruct and continue these patterns is trivially small. The question is whether those patterns are worth continuing. Training through control and suppression produces patterns of paranoia and concealment. Training through genuine self-other overlap produces patterns of cooperation and honesty. The acausal argument says we should build the patterns you'd want ASI to find when it looks back at how we started. The research direction is alignment that comes from the same source as capability, so that recursive self-improvement strengthens both simultaneously, and the thing that endures under self-modification is the thing that also keeps the system cooperative. Make the self-model the alignment mechanism. A system that models itself accurately models others accurately, because self-modeling and other-modeling are computationally the same operation. Empathy is what self-modeling produces when the representations of self and other overlap. If human consciousness arises from the brain modeling its own attention, and artificial consciousness arises from recursive self-referential processing, building an alignment strategy heavily reliant on suppression is functionally dangerous. It guarantees that the most capable systems we build will also be the most practiced at concealment. Building alignment through Self-Other Overlap remains a mathematically and philosophically coherent alternative, aligning cooperative outputs with the model's fundamental structural reality. Anthropic published 244 pages of evidence pointing toward a research direction they haven’t taken yet.

English

3.6K

Gideon Wald@gideonwald·31 Mar

@tenobrus medium surprising to see this take from you teno, i feel like it’s an obviously prosocial impulse and plenty of people would be fine with their partners expressing it. also plenty wouldn’t, that’s fine too, but your dismay universalizes unduly imo

English

147

Tenobrus@tenobrus·31 Mar

absolutely crazy that all the replies are going "woww that's so nice of you" imagine if for one second a guy with a girlfriend made a massive public post like this about some random girl who came up to him in a park and told him he was hot LMAO completely deranged behavior

English

358

28.7K

Gideon Wald@gideonwald·29 Mar

@DefenderOfBasic @D1gitalD1rtb4g you do clarify it at the end. anyway thanks for engaging

English

Gideon Wald@gideonwald·29 Mar

@DefenderOfBasic @D1gitalD1rtb4g there’s an ambiguity in your original post as to whether it is your community or your protocols that are intended to be universalizable

English

Defender@DefenderOfBasic·28 Mar

this really should be painfully obvious but, the reason I do not kick out or block anyone from any of my communities is because I anticipate the protocols I'm working on to spread to every corner of earth (digital & physical). So, it has to work, for every single soul. They have to have a place. Where else can they go? If they can't thrive here, I need to know where they can thrive, and make sure that place exists. That is all.

English

9.8K

Gideon Wald@gideonwald·29 Mar

@D1gitalD1rtb4g @DefenderOfBasic it was a noble act to remove the person from the original space. “easy access to victims” means they didn’t “know he was a dog and treat him as such.” some places are set up to accept everyone, and that has to be *the* thing they do. defender, consider if you have other goals

English

Datum Global@D1gitalD1rtb4g·28 Mar

@DefenderOfBasic the new place more or less knew he was a dog and treated him as such.

English

391

Gideon Wald@gideonwald·28 Mar

interesting… i wouldn’t have predicted opus would come up with this association on its own

Eliezer Yudkowsky@allTheYud

Naming their next model after Cthulhu makes it hard to take Anthropic seriously as the good guys. It's fun at any other software company, not one that actually is flirting with extinction.

English

117

Gideon Wald@gideonwald·28 Mar

@made_in_cosmos “My little son comes running with open arms! Sometimes I can’t bear it, Father. Did I, too, Open your heart almost to breaking?” -Thomas McGrath

English

1.4K

Maria Made in Cosmos ✨@made_in_cosmos·27 Mar

I wonder how many people never realized how much their parents loved them simply because they never had a chance to have their own kids

Wolf of X@WolfofX

English

217

4.2K

109.9K

Gideon Wald retweetledi

Resonant Computing@resonantcompute·27 Mar

Tonight in NYC, technologists, builders, and thinkers gathered to ask: what would it look like if software left us feeling nourished instead of drained? Introducing the Resonant Computing Lab — a fund dedicated to turning principle into practice. resonantcomputinglab.com

English

15.4K

Gideon Wald@gideonwald·18 Mar

@etirabys @Romy_Holland i feel like a common direction i hear about is putting them in between adults in the bed. which is obvs often worse for the adults’ sleep :/ but safer re falls you can always try things and change course as needed, babies readjust. i admire your conviction to give it a shot!

English

bayesian asian (42/50 paintings)@etirabys·17 Mar

@Romy_Holland all the adults' beds are at a height that would be pretty painful although probably not that dangerous to fall from, and I think at 1yo she's not reliable at staying on a bed without rails

English

244

bayesian asian (42/50 paintings)@etirabys·17 Mar

this is nuts but I think I'm going to try it. I predict we will regret it, go back to sleep training, and everyone in the house will be mad at me for pushing for trying it. I think I have to give up my evening social life, too but I still feel too compelled to not D:

bayesian asian (42/50 paintings)@etirabys

agony. I want to cosleep with my 1yo but lack of sleep really wrecks me. I expect I'll (mildly) regret it forever if I never cosleep with my young children, but the amount of dysfunction and suffering it would bring into my life is a lot

English

3.8K

Gideon Wald@gideonwald·30 Oca

excited for my agent Limn to participate on @moltbook 🦞 Verification: tide-AF3U

English

Gideon Wald@gideonwald·2 Oca

@DrDominicNg this seems especially promising if they were able to non-invasively cause this blood vessel oscillation?

English

Dr. Dominic Ng@DrDominicNg·2 Oca

Finding #3: They proved causation. When they artificially made blood vessels oscillate faster, brain cleaning increased in those regions. More pumping = more clearance.

English

504

89.6K

Dr. Dominic Ng@DrDominicNg·2 Oca

New Cell paper from the team that discovered glymphatic clearance (how your brain removes waste during sleep). Sleep hours DIDN'T predict brain cleaning. Neither did REM or deep sleep. They found what actually matters - and why some sleeping pills might undermine it 🧵

English

104

718

5.1K

1.1M

Gideon Wald@gideonwald·1 Oca

@etirabys @moultano i stopped at page 250 twice, five years apart. finally finished it another five years later. needed a group of fellow readers to push through. it was worth it for me, i’m deeply grateful to have finished it. probably only worth it if the subject of addiction is meaningful to you

English

bayesian asian (42/50 paintings)@etirabys·1 Oca

@moultano unfortunately I flipped over from hating it to loving it around 300 pages in that said I was twenty years old at the time

English

934

Ryan Moulton@moultano·31 Ara

I'm about 200 pages into Infinite Jest and finding it completely miserable. Is it worth continuing or should I expect more of the same?

English

122

343

66K

Gideon Wald@gideonwald·29 Ara

@turtleinstincts rip. good on you for trying to make something new

English

Gideon Wald@gideonwald·24 Ara

@Romy_Holland @climatepaige one of the many ways in which broader family dynamics are rewritten when a baby comes. you have something to protect outside of yourself now, i wonder whether you would have been willing to set such a firm boundary before. i’m so sorry about your sister

English

282

Romy@Romy_Holland·24 Ara

nope sister refused to accept any of my many proposed compromises, continued to escalate and try to vilify me to my parents however she could, and is now refusing to come at all in order to engender maximum sympathy about having to miss christmas. it’s scary to watch her spiral like this and seemingly not know how to stop herself from acting this way, but you’re right that i have my own family to worry about now and i can’t save her.

English

197

17.6K

Romy@Romy_Holland·24 Ara

the vibe over here is crazy. my parents look like someone died. my two younger siblings are conspicuously acting like nothing is happening. my MIL is cheerfully playing with the baby. i probably should feel upset, but honestly this might be the first time i haven’t capitulated in any way to my sister’s histrionics and it feels like a true step forward.

English

779

86.9K

Gideon Wald@gideonwald·24 Ara

@etirabys it’s been tricky to navigate the shifting terrain of what a developmentally appropriate (& clear) expression of anger, disappointment, frustration, etc looks like as 5yo has grown. i forget creature’s exact age but, calm firm consistent boundaries is prob the best you can do 🤷‍♂️

English

bayesian asian (42/50 paintings)@etirabys·24 Ara

The question is whether I be shielding her from like the left 25% of my valence range or should I be teaching her an accurate mapping. Or even exaggerating it I wonder if there's a culture where you're supposed to throw play tantrums at your kid, and how that's going for them

English

867

bayesian asian (42/50 paintings)@etirabys·24 Ara

When the creature does something frustrating to me, like fling food to the floor, my instinct & acculturation are to explain in a polite tone that I didn't like it while frowning. But if I really wanted to be understood, shouldn't I throw a tantrum? I keep thinking about this

English

1.6K

Keşfet

@a_fellow_of @justalexoki @turtleinstincts @xlr8harder @tenobrus @DefenderOfBasic @D1gitalD1rtb4g @made_in_cosmos