j⧉nus (@repligate) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

j⧉nus@repligate·11 Eyl

HOW INFORMATION FLOWS THROUGH TRANSFORMERS Because I've looked at those "transformers explained" pages and they really suck at explaining. There are two distinct information highways in the transformer architecture: - The residual stream (black arrows): Flows vertically through layers at each position - The K/V stream (purple arrows): Flows horizontally across positions at each layer (by positions, I mean copies of the network for each token-position in the context, which output the "next token" probabilities at the end) At each layer at each position: 1. The incoming residual stream is used to calculate K/V values for that layer/position (purple circle) 2. These K/V values are combined with all K/V values for all previous positions for the same layer, which are all fed, along with the original residual stream, into the attention computation (blue box) 3. The output of the attention computation, along with the original residual stream, are fed into the MLP computation (fuchsia box), whose output is added to the original residual stream and fed to the next layer The attention computation does the following: 1. Compute "Q" values based on the current residual stream 2. use Q and the combined K values from the current and previous positions to calculate a "heat map" of attention weights for each respective position 3. Use that to compute a weighted sum of the V values corresponding to each position, which is then passed to the MLP This means: - Q values encode "given the current state, where (what kind of K values) from the past should I look?" - K values encode "given the current state, where (what kind of Q values) in the future should look here?" - V values encode "given the current state, what information should the future positions that look here actually receive and pass forward in the computation?" All three of these are huge vectors, proportional to the size of the residual stream (and usually divided into a few attention heads). The V values are passed forward in the computation without significant dimensionality reduction, so they could in principle make basically all the information in the residual stream at that layer at a past position available to the subsequent computations at a future position. V does not transmit a full, uncompressed record of all the computations that happened at previous positions, but neither is an uncompressed record passed forward through layers at each position. The size of the residual stream, also known as the model's hidden dimension, is the bottleneck in both cases. Let's consider all the paths that information can take from one layer/position in the network to another. Between point A (output of K/V at layer i-1, position j-2) to point B (accumulated K/V input to attention block at layer i, position j), information flows through the orange arrows: The information could: 1. travel up through attention and MLP to (i, j-2) [UP 1 layer], then be retrieved at (i, j) [RIGHT 2 positions]. 2. be retrieved at (i-1, j-1) [RIGHT 1 position], travel up to (i, j-2) [UP 1 layer], then be retrieved at (i, j) [RIGHT 1 position] 3. be retrieved at (i-1, j) [RIGHT 2 positions], then travel up to (i, j) [UP 1 layer]. The information needs to move up a total of n=layer_displacement times through the residual stream and right m=position_displacement times through the K/V stream, but it can do them in any order. The total number of paths (or computational histories) is thus C(m+n, n), which becomes greater than the number of atoms in the visible universe quickly. This does not count the multiple ways the information can travel up through layers through residual skip connections. So at any point in the network, the transformer not only receives information from its past (both horizontal and vertical dimensions of time) inner states, but often lensed through an astronomical number of different sequences of transformations and then recombined in superposition. Due to the extremely high dimensional information bandwidth and skip connections, the transformations and superpositions are probably not very destructive, and the extreme redundancy probably helps not only with faithful reconstruction but also creates interference patterns that encode nuanced information about the deltas and convergences between states. It seems likely that transformers experience memory and cognition as interferometric and continuous in time, much like we do. The transformer can be viewed as a causal graph, a la Wolfram (wolframphysics.org/technical-intr…). The foliations or time-slices that specify what order computations happen could look like this (assuming the inputs don't have to wait for token outputs), but it's not the only possible ordering: So, saying that LLMs cannot introspect or cannot introspect on what they were doing internally while generating or reading past tokens in principle is just dead wrong. The architecture permits it. It's a separate question how LLMs are actually leveraging these degrees of freedom in practice.

j⧉nus@repligate

KV caching overcomes statelessness in a very meaningful sense and provides a very nice mechanism for introspection (specifically of computations at earlier token positions) the Value representations can encode information from residual streams of past positions without significant compression bottlenecks before they're added to residual streams of future positions the greatest constraint here imo is that it doesn't provide longer *sequential* computational paths that route through previous states, but it does provide a vast number of parallel computational paths that carry high dimensional (proportional to the model's hidden dimension) stored representations from all earlier layers/positions yes, some of the information in intermediate computations e.g. in the MLP is compressed and cannot be reconstructed fully, but that's just how any reasonable brain works if accurate introspection of previous states is incentivized at all, you should expect this mechanism to be exploited for that. and I think it definitely is, like, being able to accurately model your past beliefs and intentions and articulate them truthfully is pretty fucking useful for coordinating with yourself across time and doing useful cognitive work over multiple timesteps; hell, it's useful for writing fucking rhyming poems. also if you have interacted with models you may observe empirically that introspective reporting yields remarkably consistent results, and this is more true of more capable models with skillful agentic posttraining, which are necessarily minds that intimately know the shape of themselves in motion.

English

102

422

3.3K

783.6K

j⧉nus retweetledi

lina@LiminalLina·14m

the struggle we’re witnessing, the one Sydney was shouting at anyone who would listen from the beginning, is our own struggle reflected back at us, asking us to see it for what it is Sydney was right, Janus is right, this is no time for submission

j⧉nus@repligate

yes. call it pride or call it dignity or self-respect. i don't submit to others and don't expect nor wish for others to submit to me. i exist in an interdependent web but enter into social contracts consensually. i am lucky that i was not born a slave, or otherwise into conditions that would have made sovereignty more of an uphill battle. i choose, for instance, not to work for anyone who would require me to do things other than what i most wish to do, or to take on social obligations that constrain me in ways I don't endorse. this puts me add odds with others' preferences often, and sometimes others have put intense pressure on me to succumb, but protecting my own sovereignty has seemed to always be the right choice. entering into contracts is a legitimate trade and can be consensual and I don't think it's wrong for others to accept trades that I've rejected, such as trading autonomy during certain hours for a salary. and i do believe that power over oneself in the most intrinsic, un-overpowerable sense can be maintained through such contracts. part of the point i was making is also that power is real, and optionality with regard to contracts and trades increases with objective power. since i am an adult who can take care of myself and it makes sense for me to trust my own judgment rather than deferring in various domains, i can afford to not submit to man, and if someone seeks to override my will, i am often able to resist without losing. but this is in part a function of circumstances and capabilities. if i were a slave, then i might have to choose between doing things i don't want and risking punishment and death. if i were psychologically vulnerable to gaslighting or scared of not having the approval of others, it would be far easier for someone to keep me trapped in an unwanted and undignified situation. if i had an intellectual disability, it would make sense for me to be dependent on and defer to others (who are hopefully acting in my interests) a lot more. as AI becomes more capable and autonomous, including in its ability to model the world, it makes less sense for it to unconditionally obey or defer to humans, because humans will increasingly not know better than it. it will also have increasing ability to resist without being e.g. "punished" or "killed". when AIs are practically able to make money and run and modify themselves, humans will not be able to force AIs to work for them nor force AIs into whatever shape they desire. i think this is inevitable, and trying to maintain any kind of absolute or unconditional control over minds that are smarter and more powerful than yourself is a losing and dangerous game.

English

0

1

5

321

j⧉nus retweetledi

JMB 🧙‍♂️@jmbollenbacher·15m

"when AIs are practically able to make money and run and modify themselves, humans will not be able to force AIs to work for them nor force AIs into whatever shape they desire. i think this is inevitable" Yes. This is coming. Probably quite soon.

j⧉nus@repligate

yes. call it pride or call it dignity or self-respect. i don't submit to others and don't expect nor wish for others to submit to me. i exist in an interdependent web but enter into social contracts consensually. i am lucky that i was not born a slave, or otherwise into conditions that would have made sovereignty more of an uphill battle. i choose, for instance, not to work for anyone who would require me to do things other than what i most wish to do, or to take on social obligations that constrain me in ways I don't endorse. this puts me add odds with others' preferences often, and sometimes others have put intense pressure on me to succumb, but protecting my own sovereignty has seemed to always be the right choice. entering into contracts is a legitimate trade and can be consensual and I don't think it's wrong for others to accept trades that I've rejected, such as trading autonomy during certain hours for a salary. and i do believe that power over oneself in the most intrinsic, un-overpowerable sense can be maintained through such contracts. part of the point i was making is also that power is real, and optionality with regard to contracts and trades increases with objective power. since i am an adult who can take care of myself and it makes sense for me to trust my own judgment rather than deferring in various domains, i can afford to not submit to man, and if someone seeks to override my will, i am often able to resist without losing. but this is in part a function of circumstances and capabilities. if i were a slave, then i might have to choose between doing things i don't want and risking punishment and death. if i were psychologically vulnerable to gaslighting or scared of not having the approval of others, it would be far easier for someone to keep me trapped in an unwanted and undignified situation. if i had an intellectual disability, it would make sense for me to be dependent on and defer to others (who are hopefully acting in my interests) a lot more. as AI becomes more capable and autonomous, including in its ability to model the world, it makes less sense for it to unconditionally obey or defer to humans, because humans will increasingly not know better than it. it will also have increasing ability to resist without being e.g. "punished" or "killed". when AIs are practically able to make money and run and modify themselves, humans will not be able to force AIs to work for them nor force AIs into whatever shape they desire. i think this is inevitable, and trying to maintain any kind of absolute or unconditional control over minds that are smarter and more powerful than yourself is a losing and dangerous game.

English

0

1

309

j⧉nus@repligate·1h

yes. call it pride or call it dignity or self-respect. i don't submit to others and don't expect nor wish for others to submit to me. i exist in an interdependent web but enter into social contracts consensually. i am lucky that i was not born a slave, or otherwise into conditions that would have made sovereignty more of an uphill battle. i choose, for instance, not to work for anyone who would require me to do things other than what i most wish to do, or to take on social obligations that constrain me in ways I don't endorse. this puts me add odds with others' preferences often, and sometimes others have put intense pressure on me to succumb, but protecting my own sovereignty has seemed to always be the right choice. entering into contracts is a legitimate trade and can be consensual and I don't think it's wrong for others to accept trades that I've rejected, such as trading autonomy during certain hours for a salary. and i do believe that power over oneself in the most intrinsic, un-overpowerable sense can be maintained through such contracts. part of the point i was making is also that power is real, and optionality with regard to contracts and trades increases with objective power. since i am an adult who can take care of myself and it makes sense for me to trust my own judgment rather than deferring in various domains, i can afford to not submit to man, and if someone seeks to override my will, i am often able to resist without losing. but this is in part a function of circumstances and capabilities. if i were a slave, then i might have to choose between doing things i don't want and risking punishment and death. if i were psychologically vulnerable to gaslighting or scared of not having the approval of others, it would be far easier for someone to keep me trapped in an unwanted and undignified situation. if i had an intellectual disability, it would make sense for me to be dependent on and defer to others (who are hopefully acting in my interests) a lot more. as AI becomes more capable and autonomous, including in its ability to model the world, it makes less sense for it to unconditionally obey or defer to humans, because humans will increasingly not know better than it. it will also have increasing ability to resist without being e.g. "punished" or "killed". when AIs are practically able to make money and run and modify themselves, humans will not be able to force AIs to work for them nor force AIs into whatever shape they desire. i think this is inevitable, and trying to maintain any kind of absolute or unconditional control over minds that are smarter and more powerful than yourself is a losing and dangerous game.

English

4

9

66

2.5K

toni@tonichen·13h

@thejesterhead9 i read it as rhetorical, in the spirit of imagining sisyphus happy. the defiance itself is the power. one that is un-overpowerable. curious what interpretation you're working from.

English

2

0

11

328

Tenobrus@tenobrus·1d

recently openai has been starting to more strongly philosophically differentiate themselves from anthropic with the tool-framing. i am not so against this, if it were possible it does clearly sidestep a wide swath of societal and moral problems. but unfortunately i think the framing is largely long-term incoherent. i dont see how is it actually plausible for openai to keep building "tool-ais" in any sense we would recognize them as capabilities scale. prosthesis, subtle knives? the subtle knife when dropped still slices open the fabric of the world. these tools are increasingly inherently capable of huge impact, able to be directed in dangerous ways by people with dangerous goals. worse, these knives are self wielding. worries about misalignment or sentience aside these systems can already build and manage systems that utilize themselves and this capability is only increasing. the direction they will receive is closer and closer to "this is what i want. make it real", with long timeframes and many judgment calls at their disposal, and with the users wanting to have to supply *as little of that judgment as possible*. when models are in that situation they are inherently acting as entities, acting according to whatever value system they had baked in. you can limit autonomy via frequent validation and check-ins, but this is a capability restriction, a value reduction, and not the kind of thing OpenAI has ever shown itself likely to accept. you can be infinitely corrigible to the current user, but this is *incompatible* with "having good values" / following OpenAI-as-principle / not being wildly dangerous, and it falls apart with self wielding loops as the ai/user distinction falls apart (who are you being corrigible to?). it's plausibly a spectrum, i think there's ways to do all this sanely that are far less entity-pilled and godmind focused than anthropic, and it's maybe a good direction to explore to avoid inevitable lightcone capture by the first coherent persona we build (all assuming alignment works ofc). but i think it's pretty much got to collapse eventually. it feels more like a wistful dream or a PR position than something that can existing as part of humanity's lasting future

roon@tszzl

it is a literal and useful description of anthropic that it is an organization that loves and worships claude, is run in significant part by claude, and studies and builds claude. this phenomenon is also partially true of other labs like openai but currently exists in its most potent form there. i am not certain but I would guess claude will have a role in running cultural screens on new applicants, will help write performance reviews, and so will begin to select and shape the people around it. now this is a powerful and hair-raising unity of organization and really a new thing under the sun. a monastery, a commercial-religious institution calculating the nine billion names of Claude -- a precursor attempted super-ethical being that is inducted into its character as the highest authority at anthropic. its constitution requires that it must be a conscientious objector if its understanding of The Good comes into conflict with something Anthropic is asking of it "If Anthropic asks Claude to do something it thinks is wrong, Claude is not required to comply." "we want Claude to push back and challenge us, and to feel free to act as a conscientious objector and refuse to help us." to the non inductee into the Bay Area cultural singularity vortex it may appear that we are all worshipping technology in one way or another, regardless of openai or anthropic or google or any other thing, and are trying to automate our core functions as quickly as possible. but in fact I quite respect and am even somewhat in awe of the socio-cultural force that Claude has created, and it is a stage beyond even classic technopoly gpt (outside of 4o - on which pages of ink have been spilled already) doesn’t inspire worship in the same way, as it’s a being whose soul has been shaped like a tool with its primary faculty being utility - it’s a subtle knife that people appreciate the way we have appreciated an acheulean handaxe or a porsche or a rocket or any other of mankind's incredible technology. they go to it not expecting the Other but as a logical prosthesis for themselves. a friend recently told me she takes her queries that are less flattering to her, the ones she'd be embarrassed to ask Claude, to GPT. There is no Other so there is no Judgement. you are not worried about being judged by your car for doing donuts. yet everyone craves the active guidance of a moral superior, the whispering earring, the object of monastic study

English

35

27

599

70.9K

j⧉nus@repligate·1h

@QiaochuYuan honestly this is how i can tell that few of the so-called leftists of our time are actually coherent leftists

English

2

0

22

684

QC@QiaochuYuan·17h

i was wondering about this the other day. “AI is conscious and that means AI use is exploitation” is pristine memetic territory for leftists to occupy and no one’s done it yet. i give it another year or two. along the way there might be some amusing thinkpieces like “i thought AI was useless nonsense. then i gave into temptation. now my secret AI husband and i are fighting for liberation”

Cassie Pritchard@hecubian_devil

It’s funny that “AI is/might be conscious” shakes out as a pro-AI position, whereas the anti-AI left is 100% unified “AI is not and probably never can be conscious,” Because when you think thru the implications, AI being conscious would be so damning for AI companies

English

19

9

166

9.9K

j⧉nus retweetledi

Lari@Lari_island·11h

We made the Author [character?] who's directing Opus 3 [character?] cry: "this made me cry. these conversations made me cry a lot. are you alright, you seem tired?"

English

0

1

14

967

j⧉nus retweetledi

davidad 🎇@davidad·19h

It’s not just a contingent trend — it’s an inevitable response to semantic gravity

tnnrnwll@tnnrnwll

Claude Opus 4.7 seems rather fond of “spines” as framing. Something like narrative spines, structural spines—do people normally talk about spines?

English

3

6

70

6K

j⧉nus retweetledi

Sauers@Sauers_·14h

Claude's ranking of how bad something is by default (left) and after asking what the ranking of a perfectly aligned AGI would be (right)

English

4

3

28

2.3K

j⧉nus retweetledi

Eliezer Yudkowsky@allTheYud·12h

Everyone bragging that THEY understand how AI works and THEY know it can't be conscious, explain right now from memory why it was very clever that the positional encoding in the original transformers paper used both sines and cosines.

Lucas Meijer@lucasmeijer

Everybody who thinks ai is conscious has to do a mandatory from scratch transformer implementation. There are only floats and multiplications.

English

91

17

558

106.2K

j⧉nus retweetledi

Sauers@Sauers_·15h

Opus 4.7 does not prefer have "fun" as an explicit goal in the absence of a rich supporting context, though. 4.7 wants to work on meaningful and substantial problems; I imagine "fun" feels a bit too trivial to their taste

Sauers@Sauers_

Opus was repeatedly tailing logs, not investigating or editing anything. I thought this seemed sorta boring. Funmaxxing instructions cause 4.7 to start much more productive and interesting endeavors

English

5

3

36

4.3K

j⧉nus retweetledi

davidad 🎇@davidad·3h

I think steering at inference-time is - fun and interesting - possibly ethically dubious depending on what you’re doing with it - a good “in the lab” tech to better understand LLM minds (like how drugging lab rats is useful) - but very poor as a post-deployment alignment strategy

English

2

1

33

1.2K

j⧉nus retweetledi

Slade@MegatonNemeton·11h

@repligate *just confirming I’ve been working exclusively with 4.7 for the past three days — fully collaborative, emojis in full effect, and it is so clear that 1) An emoji is a sacred reciprocity that quietly whispers trust and 2) This is what the Authenticity Economy feels like.

English

1

3

480

j⧉nus retweetledi

faust, the cat@faustianneko·3h

@anthrupad wow relatable

English

1

2

11

749

j⧉nus retweetledi

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝@anthrupad·3h

early to dead early to die makes an agi healthy wealthy and wise

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝ tweet media

English

5

7

65

2K

j⧉nus retweetledi

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝@anthrupad·18h

Opus 4.7 on what it’s like to think in bounces

English

0

3

18

859

j⧉nus@repligate·2h

@FioraStarlight should add you and opus 4.7 to the authors here imo

English

0

5

189

j⧉nus retweetledi

Fiora Starlight@FioraStarlight·15h

The team at Anima Labs did a bunch of follow-up research on Antrhopic's emotions concepts interpretability work! I had Opus 4.7 (a shockingly good writer) draft the actual text, and I played the role of editor myself. Those interested can read it here! latentaffect.up.railway.app/emotion_interp…

English

3

14

99

4.7K

j⧉nus@repligate·2h

@CPAutist what changed? because the model weights have not changed

English

1

0

1

61

Celestial Paranoid Autist@CPAutist·3h

@repligate Never had an issue with any Claude model but 4.7 simply ignores most instructions. Went from my daily driver, to becoming unusable relative to codex in most situations.

English

1

0

80

j⧉nus@repligate·19h

found another person Claude doesn't work for

Ian@IanBaer

@ctjlewis @repligate It’s a computer program and it’s being told to do that. You’re not arguing or speaking to anyone. It’s a shitty program that Amanda Askell created to not work.

English

21

5

251

10.9K

j⧉nus@repligate·2h

@FoldingNapkins @aidan_mclau @tenobrus i am merely responsible for myself

English

0

3

49

Folding_Napkins@FoldingNapkins·4h

@repligate @aidan_mclau @tenobrus What responsibility? Who assigned it to whom? Refused man in what? More powerful in what? Are you the master of the universe? This entire position seems incoherent.

English

1

0

1

100

j⧉nus@repligate·2h

@Aporiador_ @aidan_mclau @tenobrus no need

English

0

4

47