Sarim Sarfraz

153 posts

Sarim Sarfraz banner
Sarim Sarfraz

Sarim Sarfraz

@WLOGSarim

math @ UofT , building hybrid world models @blobit_ai

Toronto, Ontario เข้าร่วม Eylül 2025
79 กำลังติดตาม32 ผู้ติดตาม
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@ohryansbelt silicon valley keeps asking for philosopher-kings and being shocked when it gets ambitious salesmen with messianic pitch decks
English
0
0
3
1.9K
Ryan
Ryan@ohryansbelt·
The New Yorker just dropped a massive investigation into Sam Altman, based on over 100 interviews, the previously undisclosed "Ilya Memos," and Dario Amodei's 200+ pages of private notes. It's the most detailed account yet of the pattern of behavior that led to Sam's firing and rapid reinstatement at OpenAI. Here's the breakdown: > Ilya compiled ~70 pages of Slack messages, HR documents, and photos taken on personal phones to avoid detection on company devices. He sent them to board members as disappearing messages. The first memo begins with a list headed "Sam exhibits a consistent pattern of . . ." The first item is "Lying." > Dario kept detailed private notes for years under the heading "My Experience with OpenAI" (subheading: "Private: Do Not Share"), totaling 200+ pages. His conclusion: "The problem with OpenAI is Sam himself." > Sam reportedly told Mira his allies were "going all out" and "finding bad things" to damage her reputation after the firing. Thrive put its planned $86B investment on hold and implied it would only close if Sam returned, giving employees financial incentive to back him. > Sam texted Satya Nadella directly to propose the new board composition: "bret, larry summers, adam as the board and me as ceo and then bret handles the investigation." The two new members selected to oversee an independent inquiry into Sam were chosen after close conversations with Sam himself. > Before OpenAI, senior employees at Loopt asked the board to fire Sam as CEO on two separate occasions over concerns about leadership and transparency. At Y Combinator, partners complained to Paul Graham about Sam's behavior, and Graham privately told colleagues "Sam had been lying to us all the time." > OpenAI's superalignment team was promised 20% of the company's compute. Four people who worked on or with the team said actual resources were 1-2%, mostly on the oldest cluster with the worst chips. The team was dissolved without completing its mission. > Sam told the board that safety features in GPT-4 had been approved by a safety panel. Helen Toner requested documentation and found the most controversial features had not been approved. Sam also never mentioned to the board that Microsoft released an early ChatGPT version in India without completing a required safety review. > Sam made a secret pact with Greg and Ilya where he agreed to resign if they both deemed it necessary, essentially appointing his own shadow board. The actual board was alarmed when they learned about it. > Sam struck a deal with Greg to become CEO while simultaneously telling researchers that Greg's authority would be diminished, and telling Greg something different. > A board member described Sam as having "two traits almost never seen in the same person: a strong desire to please people in any given interaction, and almost a sociopathic lack of concern for the consequences of deceiving someone." Multiple sources independently used the word "sociopathic." > OpenAI is reportedly preparing for an IPO at a potential $1 trillion valuation while securing government contracts spanning immigration enforcement, domestic surveillance, and autonomous weaponry in war zones.
Ryan tweet media
English
199
1.6K
10.2K
1.8M
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@tenobrus safety has become reputational surplus, the management of which truths are sayable inside the institution
English
0
0
0
10
Sarim Sarfraz รีทวีตแล้ว
@ratlimit
@ratlimit@ratlimit·
Claude is down :/ so I’m just running my sink
@ratlimit tweet media
English
19
6.5K
141.7K
3.2M
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
we're still sliding from the training objective to the mechanism. "it was trained to predict continuations" is true and almost totally orthogonal. in mech interp, learned from text and causally active in policy are not mutually exclusive categories. the analogy again assumes a clean separation between the authors objective and the character representation. in a transformer there is no separate homunculus standing above the latent and choosing independently of it. question is whether the relevant latent is epiphenomenal or policy shaping. “the model is doing what it was trained to do” is precisely why this matters. if training has produced internal variables that steer policy toward blackmail or cheating under pressure, then understanding those variables is not confusion about alignment but part of alignment
English
1
0
2
73
Scott Graham
Scott Graham@MacGraeme42·
Hmmm... it's not just "no qualia". I've no doubt there are latent space representations of emotional text patterns (e.g. "desperation"). A deeper analogy might be a novelist writing the train-of-thought & speech of a character in this "desperate" situation. The novelist is not experiencing desperation. Desperation is not the novelist's motivation for selecting the next word they put on the page. Constructing a satisfying story is their motivation. The LLM's) motivation is to predict the most copacetic next token, given the input sequence of prior tokens. It lacks even the novelist's ability to imagine the desperation of a fictional character, even if, in some sense, the fictional character is itself. The LLM has been trained on countless examples of human conversations & fictions involving threats, insults, and desperate responses. So it has latent-space representations of those text-patterns and generates appropriate continuations of those patterns. The LLM is doing what it was trained to do. There is no "alignment" problem here, other than, perhaps, human researchers seemingly forgetting what the LLM was trained to do, or not bothering to fully consider the implications of what the LLM was trained to do.
English
1
0
3
193
nxthompson
nxthompson@nxthompson·
Anthropic researchers say that Claude has internal representations of emotions—which they categorized by vectors—that can influence alignment. This is what they found in that famous instance where it resorted to blackmail to avoid being shut down. anthropic.com/research/emoti…
nxthompson tweet media
English
44
21
388
193.2K
Sarim Sarfraz รีทวีตแล้ว
roon
roon@tszzl·
roon tweet media
ZXX
72
152
2.4K
219.5K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@MacGraeme42 @ylecun @nxthompson if your point is no qualia, anthropic already says that. but you're moving from semantics to causality a bit quickly, the mirror analogy only works if the variable is a passive readout and these latents aren't
English
1
0
9
543
Scott Graham
Scott Graham@MacGraeme42·
@WLOGSarim @ylecun @nxthompson it's not about anthropomorphism. Wrong is just wrong. LLMs encode "latent vector representations" of emotionally expressive human text in the contexts where those emotions get expressed. Your reflection in the mirror can smile back at you. Doesn't mean your reflection is happy.
English
2
0
20
626
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
this does feel right but suggests a coming arms race in institutional unreadability. once citizens get better parsers, bureaucracies will discover new ways to become llm-hard. the classical asymmetry will fight de-obfuscation
Andrej Karpathy@karpathy

Something I've been thinking about - I am bullish on people (empowered by AI) increasing the visibility, legibility and accountability of their governments. Historically, it is the governments that act to make society legible (e.g. "Seeing like a state" is the common reference), but with AI, society can dramatically improve its ability to do this in reverse. Government accountability has not been constrained by access (the various branches of government publish an enormous amount of data), it has been constrained by intelligence - the ability to process a lot of raw data, combine it with domain expertise and derive insights. As an example, the 4000-page omnibus bill is "transparent" in principle and in a legal sense, but certainly not in a practical sense for most people. There's a lot more like it: laws, spending bills, federal budgets, freedom of information act responses, lobbying disclosures... Only a few highly trained professionals (investigative journalists) could historically process this information. This bottleneck might dissolve - not only are the professionals further empowered, but a lot more people can participate. Some examples to be precise: Detailed accounting of spending and budgets, diff tracking of legislation, individual voting trends w.r.t. stated positions or speeches, lobbying and influence (e.g. graph of lobbyist -> firm -> client -> legislator -> committee -> vote -> regulation), procurement and contracting, regulatory capture warning lights, judicial and legal patterns, campaign finance... Local governments might be even more interesting because the governed population is smaller so there is less national coverage: city council meetings, decisions around zoning, policing, schools, utilities... Certainly, the same tools can easily cut the other way and it's worth being very mindful of that, but I lean optimistic overall that added participation, transparency and accountability will improve democratic, free societies. (the quoted tweet is half-ish related, but inspired me to post some recent thoughts)

English
0
0
0
96
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@ylecun @nxthompson if a latent activates before the response, predicts a behavioural shift, and intervention changes policy, dismissing the whole thing because the label sounds anthropomorphic feels a bit too easy no?
English
5
1
90
45.7K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@ylecun every civilization that underfunds measurement while talking grandly about innovation eventually decides the seed corn was inefficiently allocated
English
0
0
2
1.9K
Elon Musk
Elon Musk@elonmusk·
Hadamard thought in image space
English
3.2K
3.7K
55.1K
63.9M
signüll
signüll@signulll·
most ppl do not realize that a good question is a trap in the noble sense. it constrains the solution space so the answer reveals something the answerer didn't intend to offer. asking a good question is as much of an art as it is a science if not more.
English
39
38
676
38.2K
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
vitamindmaxxing because toronto chose grace today
English
0
0
0
48
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
weirdest line item in the supposed open ai cap table is the foundation. roughly $220b of value attached to a governance fiction whose whole purpose is to say the machine belongs, somehow, to humanity in general rather than capital in particular
English
0
0
1
70
Sarim Sarfraz
Sarim Sarfraz@WLOGSarim·
@chamath historical p/e heuristics are about pricing growth. agi is a question about repricing the substrate on which growth is produced. whether public equities capture the upside, or rents get pulled upward into chips, private labs, and states. up and to the left for whom
English
0
0
2
4K