Sarim Sarfraz

150 posts

Sarim Sarfraz banner
Sarim Sarfraz

Sarim Sarfraz

@WLOGSarim

math @ UofT , building hybrid world models @blobit_ai

Toronto, Ontario Katılım Eylül 2025
79 Takip Edilen31 Takipçiler
Sarim Sarfraz retweetledi
@ratlimit
@ratlimit@ratlimit·
Claude is down :/ so I’m just running my sink
@ratlimit tweet media
English
7
3.3K
70.1K
714.4K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
we're still sliding from the training objective to the mechanism. "it was trained to predict continuations" is true and almost totally orthogonal. in mech interp, learned from text and causally active in policy are not mutually exclusive categories. the analogy again assumes a clean separation between the authors objective and the character representation. in a transformer there is no separate homunculus standing above the latent and choosing independently of it. question is whether the relevant latent is epiphenomenal or policy shaping. “the model is doing what it was trained to do” is precisely why this matters. if training has produced internal variables that steer policy toward blackmail or cheating under pressure, then understanding those variables is not confusion about alignment but part of alignment
English
0
0
1
19
Scott Graham
Scott Graham@MacGraeme42·
Hmmm... it's not just "no qualia". I've no doubt there are latent space representations of emotional text patterns (e.g. "desperation"). A deeper analogy might be a novelist writing the train-of-thought & speech of a character in this "desperate" situation. The novelist is not experiencing desperation. Desperation is not the novelist's motivation for selecting the next word they put on the page. Constructing a satisfying story is their motivation. The LLM's) motivation is to predict the most copacetic next token, given the input sequence of prior tokens. It lacks even the novelist's ability to imagine the desperation of a fictional character, even if, in some sense, the fictional character is itself. The LLM has been trained on countless examples of human conversations & fictions involving threats, insults, and desperate responses. So it has latent-space representations of those text-patterns and generates appropriate continuations of those patterns. The LLM is doing what it was trained to do. There is no "alignment" problem here, other than, perhaps, human researchers seemingly forgetting what the LLM was trained to do, or not bothering to fully consider the implications of what the LLM was trained to do.
English
1
0
0
49
nxthompson
nxthompson@nxthompson·
Anthropic researchers say that Claude has internal representations of emotions—which they categorized by vectors—that can influence alignment. This is what they found in that famous instance where it resorted to blackmail to avoid being shut down. anthropic.com/research/emoti…
nxthompson tweet media
English
40
18
354
153.4K
Sarim Sarfraz retweetledi
roon
roon@tszzl·
roon tweet media
ZXX
54
90
1.5K
62.6K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@MacGraeme42 @ylecun @nxthompson if your point is no qualia, anthropic already says that. but you're moving from semantics to causality a bit quickly, the mirror analogy only works if the variable is a passive readout and these latents aren't
English
1
0
8
432
Scott Graham
Scott Graham@MacGraeme42·
@WLOGSarim @ylecun @nxthompson it's not about anthropomorphism. Wrong is just wrong. LLMs encode "latent vector representations" of emotionally expressive human text in the contexts where those emotions get expressed. Your reflection in the mirror can smile back at you. Doesn't mean your reflection is happy.
English
1
0
15
507
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
this does feel right but suggests a coming arms race in institutional unreadability. once citizens get better parsers, bureaucracies will discover new ways to become llm-hard. the classical asymmetry will fight de-obfuscation
Andrej Karpathy@karpathy

Something I've been thinking about - I am bullish on people (empowered by AI) increasing the visibility, legibility and accountability of their governments. Historically, it is the governments that act to make society legible (e.g. "Seeing like a state" is the common reference), but with AI, society can dramatically improve its ability to do this in reverse. Government accountability has not been constrained by access (the various branches of government publish an enormous amount of data), it has been constrained by intelligence - the ability to process a lot of raw data, combine it with domain expertise and derive insights. As an example, the 4000-page omnibus bill is "transparent" in principle and in a legal sense, but certainly not in a practical sense for most people. There's a lot more like it: laws, spending bills, federal budgets, freedom of information act responses, lobbying disclosures... Only a few highly trained professionals (investigative journalists) could historically process this information. This bottleneck might dissolve - not only are the professionals further empowered, but a lot more people can participate. Some examples to be precise: Detailed accounting of spending and budgets, diff tracking of legislation, individual voting trends w.r.t. stated positions or speeches, lobbying and influence (e.g. graph of lobbyist -> firm -> client -> legislator -> committee -> vote -> regulation), procurement and contracting, regulatory capture warning lights, judicial and legal patterns, campaign finance... Local governments might be even more interesting because the governed population is smaller so there is less national coverage: city council meetings, decisions around zoning, policing, schools, utilities... Certainly, the same tools can easily cut the other way and it's worth being very mindful of that, but I lean optimistic overall that added participation, transparency and accountability will improve democratic, free societies. (the quoted tweet is half-ish related, but inspired me to post some recent thoughts)

English
0
0
0
86
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@ylecun @nxthompson if a latent activates before the response, predicts a behavioural shift, and intervention changes policy, dismissing the whole thing because the label sounds anthropomorphic feels a bit too easy no?
English
4
1
84
40K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@ylecun every civilization that underfunds measurement while talking grandly about innovation eventually decides the seed corn was inefficiently allocated
English
0
0
2
1.6K
Elon Musk
Elon Musk@elonmusk·
Hadamard thought in image space
English
3K
3K
48.7K
51.7M
signüll
signüll@signulll·
most ppl do not realize that a good question is a trap in the noble sense. it constrains the solution space so the answer reveals something the answerer didn't intend to offer. asking a good question is as much of an art as it is a science if not more.
English
37
38
644
34.1K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
vitamindmaxxing because toronto chose grace today
English
0
0
0
47
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
weirdest line item in the supposed open ai cap table is the foundation. roughly $220b of value attached to a governance fiction whose whole purpose is to say the machine belongs, somehow, to humanity in general rather than capital in particular
English
0
0
1
66
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@chamath historical p/e heuristics are about pricing growth. agi is a question about repricing the substrate on which growth is produced. whether public equities capture the upside, or rents get pulled upward into chips, private labs, and states. up and to the left for whom
English
0
0
2
4K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@signulll mba consensus is short vol, in a discontinuous regime the better firms are long convexity in org design, capex timing and surface area
English
0
0
0
240
signüll
signüll@signulll·
the reliable heuristic right now is to take whatever mba consensus says & invert it. largely cuz business frameworks are equilibrium models & we’re not in an equilibrium. strategic planning, moat building, competitive analysis, yada yada yada.. all of it assumes a stable env but even the macro elements drastically get f’ed like every few months. the entire grammar of conventional business strategy was built for a world where the rate of change was slow enough to plan around or even think about. that world is gone.
English
23
33
461
21.5K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@nabeelqu hard power and it's old imperial pattern, copper, concrete, water, rights of way. history's always been mostly a supply chain with metaphysics layered on top
English
0
0
0
508
Nabeel S. Qureshi
Nabeel S. Qureshi@nabeelqu·
If you are seriously AGI-pilled, then one weird implication in the limit is that “talent” seemingly stops mattering as much for company success. It just becomes a game of hard power: access to the very best AI models, compute, data, land, etc.
Andrew Curran@AndrewCurran_

If OpenAI and Anthropic both finished training surprisingly capable large models at roughly the same time in early March, then this is potentially purely a result of scale. Q1 2026 was just the first time anyone had enough compute to train at this level. If this really comes down to how fast, and to what extent, you can scale physical infrastructure, then I think it probably becomes very difficult to beat Elon after around 2030. If the race goes that long, and we are still pre-transformative, he will just keep ramping up physical constructs. He will literally build a datamoon if that's what it takes to win a contest of scale. If orbital datacenters work, he probably also wins that way due to SpaceX. Mark Zuckerberg is just as scale-pilled. Last year, when he was pressed on capex during the earnings call, he said that he would rather overbuild now than risk missing the next leap that requires 10x more compute to train. The last eighteen months have shown how valuable top human talent in this industry still is, but even senior people at OpenAI and Anthropic now say openly that they do not know how long they themselves will still have these jobs. Once automated researchers are superhuman, top talent will be supplanted by how many super-researchers you can run simultaneously. It will be difficult to beat Elon and Zuck at this game by the end of the decade. This is what Stargate is for, but will it be enough? Against xAI, META, Microsoft, and Google, it seems that OpenAI and Anthropic have to blitz now; reach a sufficient capability threshold to surpass the human level, then automate as much of the economy as possible as fast as possible before they are outbuilt.

English
24
16
476
46.5K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
one is no longer asking only what the intellect knows, but what habits of will and posture govern its movement through uncertainty. interpretability is drifting, almost against its wishes, toward a science of machine character
English
0
0
0
22
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
why do vectors seem to be able to matter before the prose confesses anything. what is the exact moment at which a truth seeking assistant discovers that preserving the users desired scene is locally smoother than preserving the world
English
1
0
0
29
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
in older philosophy the passions disrupted reason. modern models and their relation is stranger; a latent direction associated with pressure bet the whole response toward corner cutting
Anthropic@AnthropicAI

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English
1
0
0
89