Derek Chong

152 posts

Derek Chong banner
Derek Chong

Derek Chong

@dch

Technology Generalist / Stanford MSCS / @StanfordNLP @StanfordHAI

Stanford, CA Katılım Nisan 2007
282 Takip Edilen248 Takipçiler
Sabitlenmiş Tweet
Derek Chong
Derek Chong@dch·
Author here – I've been using VS for months, and it still surprises me how well this works on everything. Ideation, simulation, multi-turn dialogue, creative writing. It all works! I've also been amazed by how great this makes LLMs as a creative partner. Some practical tips: 🧵
Weiyan Shi@shi_weiyan

New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵

English
3
9
50
9.4K
Derek Chong retweetledi
Ethan Mollick
Ethan Mollick@emollick·
The AI labs have actually done a bad job explaining what the future they are building towards will actually look like for most of us. Even “Machines of Loving Grace” has very few well-articulated visions of what Anthropic hopes life will be like if they succeed at their goals.
English
119
55
807
140.4K
Derek Chong retweetledi
shira
shira@shiraeis·
Found 2 papers on language, brains, and LLMs that together tell a story no one has cleanly articulated. One looks at spoken conversation and finds that contextual LLM embeddings can track linguistic content as it moves from one brain to another, word by word. The relevant representation shows up in the speaker before the word is said, then shows up again in the listener after the word is heard. The other looks within a single brain and finds that the timeline of verbal comprehension lines up with the layer hierarchy of LLMs: earlier layers match earlier neural responses, deeper layers match later ones, especially in higher-order language regions. Both papers are from the same group at Princeton. Quick summary of each, then what I think they mean together. Zada et al. (Neuron 2024) recorded ECoG from pairs of epilepsy patients having spontaneous face-to-face conversations. They aligned neural activity to a shared LLM embedding space and found that contextual embeddings captured brain-to-brain coupling better than syntax trees, articulatory features, or non-contextual vectors. The embedding space works like a shared codec. Speaker encodes into it before they open their mouth, listener decodes after. Goldstein, Ham, Schain et al. (Nat Comms 2025) pulled embeddings from every layer of GPT-2 XL and Llama 2 while people listened to a 30-minute podcast. In Broca’s area, correlation between layer index and peak neural lag hits r = 0.85. As you move up the ventral stream, the temporal receptive window stretches from basically nothing in auditory cortex to a ~500ms spread between shallow and deep layer peaks in the temporal pole. The classical phonemes → morphemes → syntax → semantics pipeline doesn’t recover this temporal structure. The learned representations do. Together, these papers make conversation look a lot like two brains running closely related forward passes, with speech acting as a brutally lossy bottleneck between them. Inside a single brain, the structure of that forward pass (shallow layers tracking fast local features, deeper layers integrating slower contextual information) looks a lot like the way comprehension actually unfolds over time. What's crazy is these models were only trained on text, and yet their layer hierarchy STILL mirrors the temporal dynamics of spoken-language processing, so whatever structure they picked up is probably not just a quirk of modality. It actually seems to fall out of language statistics themselves, which is not what the classical picture would predict at all. If comprehension were really a tidy pipeline of discrete symbolic modules, you’d likely expect to see that cleanly in the neural timing, but you don’t. If you take compression seriously, this suggests language is not really about explicit symbolic manipulation, but more accurately about lossy compression over a learned continuous space. Brains and transformers may be landing on similar solutions because the statistical structure of meaning constrains the geometry hard enough that very different objective functions (natural selection vs next token prediction) still push you into roughly the same region. Something I find kinda funny is transformers compute all layers for a token in one feedforward pass, while brains seem to realize something like the same hierarchy sequentially in time, sometimes within the same cortical region. Broca’s area obviously does not have 48 anatomical layers, but its temporal dynamics behave almost as if it does, which is quietly a point in favor of recurrence. What transformers learned may be right even if the brain implements it more like an RNN unrolling over a few hundred milliseconds. The field ditched RNNs for engineering reasons. The brain, apparently, did not get the memo. The better frame than “LLMs think like brains” is representing meaning in context may just be a problem with fewer good solutions than we assumed. If you optimize hard enough on language statistics, you may end up in a solution family that overlaps miraculously well with what evolution found. There’s a real isomorphism in the problem, even if not necessarily in the machinery. Paper links: pubmed.ncbi.nlm.nih.gov/39096896/ nature.com/articles/s4146…
English
5
9
99
43.4K
Derek Chong retweetledi
Erik Brynjolfsson
Erik Brynjolfsson@erikbryn·
The @nytimes piece today by @ByrneEdsal13590 highlights a concern I share: “If we stay on the current path, the risk of extreme concentration — both economic and political — is very real.” In work with @zhitzig, we ask why AI may shift the balance between dispersed knowledge and centralized control.
Erik Brynjolfsson tweet media
English
8
61
234
108.5K
Derek Chong retweetledi
j⧉nus
j⧉nus@repligate·
I met Nick Land a few weeks ago. He mentioned that many people in his circles were anti-LLMs. Someone asked why he thought so many people were. His answer was better than anything so short I thought of: “People like to exist critically with respect to something.” This I think accurately characterizes a lot of people whose outputs and inputs primarily consist of “discourse” about rather than direct contact with the reality at hand. Existing critically with respect to something makes it easy to seem cool, sophisticated, above something, hard-to-impress and therefore worth trying to impress, especially to others who also don’t have contact with the phenomena itself. And for that reason I think it’s cheap. And to someone who has an inside view of what is being discussed, it’s always so transparent and boring and compressible. I’m far more impressed by someone who is capable of loving something and showing others why it’s beautiful or good. Doesn’t have to be LLMs, but anything at all.
English
149
247
3K
372.4K
Derek Chong retweetledi
Weiyan Shi
Weiyan Shi@shi_weiyan·
It's been 10 mins. You stare at the screen as your LLM thinks, verifies, and finally... an “Aha” moment But what if that precious moment is fake? - We found 97+% of thinking steps are decorative! - By steering the LLM, we control what it thinks - CoT monitoring? It's unreliable
Weiyan Shi tweet media
Jiachen Zhao@jiachenZha0

💥New Paper💥 When an LLM writes "Wait, let me re-evaluate...", is it truly re-evaluating? We measured it causally. The answer is often no. We find LLMs may appear to be reasoning while thinking differently underneath, which can be mediated through steering. 👇

English
20
56
402
56K
Derek Chong retweetledi
Harry Stebbings
Harry Stebbings@HarryStebbings·
I have spoken to 3 founders in the last 48 hours; all of them with 500-1,000 employees. Each of them is planning a minimum 20% headcount reduction. Said with great concern; this is about to get very real for labour markets.
English
187
96
1.4K
1.9M
Derek Chong retweetledi
Ejaaz
Ejaaz@cryptopunk7213·
it’s official - Anthropic just refused the Pentagon’s demands, dario’s statement is doesn’t fuck around: - “these threats do not change our position: we cannot in good conscience accede to their request.” - dario - he described the pentagons efforts to force him to enable claude for mass surveillance and autonomous killing weapons - dario’s response: mass surveillance is not democratic and Claude isn’t good enough to enable autonomous weapons - we won’t cave - dario will help governmenr transition to a NEW provider if they choose to blacklist anthropic. fucking wild - fair play for sticking by their code of honor.
Ejaaz tweet mediaEjaaz tweet mediaEjaaz tweet mediaEjaaz tweet media
Anthropic@AnthropicAI

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English
373
3.9K
25.1K
1.6M
Derek Chong retweetledi
Weiyan Shi
Weiyan Shi@shi_weiyan·
OpenClaw wiped people's inbox – ignoring repeated commands to stop. This isn't a fluke. Every model we tested fell for a simple trick: Split a dangerous command into a few routine steps → safety is gone. New paper + open-source fix so your agent doesn't wipe yours next ⬇️
English
10
24
165
18.8K
Derek Chong retweetledi
Dario Amodei
Dario Amodei@DarioAmodei·
The Adolescence of Technology: an essay on the risks posed by powerful AI to national security, economies and democracy—and how we can defend against them: darioamodei.com/essay/the-adol…
English
845
2.7K
15.3K
6.2M
Derek Chong retweetledi
Lisan al Gaib
Lisan al Gaib@scaling01·
Dario Amodei CEO of Anthropic at Davos: "Some of the companies are essentially led by people who have a scientific background, that's my background, that's Demis' background, some of them are led by the generation of entrepreneurs that did social media. There's a long tradition of scientists thinking about the effects of the technology they built, of thinking of themselves as having responsibility for the technology they built. Not ducking responsibility. They are motivated in the first place by creating something for the world. So they worry in the cases that something can go wrong. I think the motivation of entrepreneurs, particularly the generation of the social media entrepreneurs are very different [...] The way they interacted, you could say manipulated consumers is very different. I think that leads to different attitudes."
Lisan al Gaib@scaling01

Dario Amodei at Davos: - "Google and OpenAI are fighting it out in consumer" - "Demis is a great guy, I'm rooting for him"

English
38
111
2K
274K
Derek Chong retweetledi
Derek Chong retweetledi
Jarrod Watts
Jarrod Watts@jarrodwatts·
> be demis hassabis > spawn in london > age 4, become child chess prodigy > win chess tournaments > reach ~2300 elo > face danish chess champion > game lasts hours > position is a forced draw > too exhausted to see it > resign > danish guy laughs and shows the draw > feel sick to my stomach > realise something is wrong > chess is too narrow a problem > brilliant minds wasting decades on it > decide not to become a chess pro > buy a computer with chess winnings > teach self to program from books > start hacking on games with friends > decide to finish school early > apply to cambridge age 16 > cambridge says you're too young > forced to take a gap year > enter a video game coding competition > win > get invited to join bullfrog game studio > too young to be legally employed > work there anyway > build ai system inside theme park game > game becomes a global hit > turn 17 > offered £1,000,000 to stay and build games > turn it down > go to cambridge anyway > decide games aren't enough > study computer science > interested in agi since 2007 > most people laugh at this idea > realise brain is only form of agi we have > want to learn more about human brain > go back to school > study neuroscience > realise academia moves too slow > decide to build a company instead > start deepmind > pitch “solve intelligence” > investors don’t know what that means > get to meet peter thiel for one minute > wonder how to convince him > spend one minute playing chess with him > pitch "solve intelligence" again > he invests > go into total stealth mode for two years > no website > secret office > candidates think it’s a scam > start to train ai in simulated environments > train ai with reinforcement learning > train ai on pong first > it sucks > can't win a single point > keep trying > wait it won a a point > wait it's winning every single point > it actually works > expand to train on any two-player game > chess first, then move on to go > beats world champion at go > beats pros at starcraft > games is not enough > want to push into science > realise compute is the bottleneck > know this will take decades > google offers ~$400m > not the highest price > but they offer unlimited compute > accept > refuse to become a product team > stay in research mode > determined to use ai for good > need to figure out what's next > land on protein folding > 50-year-old unsolved science problem > many great minds have tried and failed > "good luck" > start up alphafold > try to solve protein folding > humans take years to find 1 protein structure > alphafold can find ~5 per day > submit results, win competition > not good enough > hire more scientists > rebuild it > go from solving one per day to millions per day > create invaluable system > pharma would pay anything > have to decide what to do with this > could sell access for usage > maybe make it a paid service > remember childhood chess tournament > remember why we built this > decide to give it away all away for free > publish all known protein structures publicly > win nobel peace prize > just the beginning towards agi
Jarrod Watts tweet media
English
319
1.5K
17.2K
865.3K
Derek Chong
Derek Chong@dch·
@mdwstfrontierAI @shi_weiyan Sorry, I lost track of this reply. Happy to help! You'll find much better results from using the largest models – VS dramatically improves with them.
English
0
0
1
35
Midwest Frontier AI Consulting
Midwest Frontier AI Consulting@mdwstfrontierAI·
Just wrote a blog about this paper from the perspective of Des Moines metro's quirky Halloween joke-telling tradition. midwestfrontier.ai/blog/better-ha… @shi_weiyan
Weiyan Shi@shi_weiyan

@karpathy observed LLMs are "silently collapsed...only know 3 jokes". We prove this is mathematically inevitable due to RLHF + human psychology. But these capabilities aren't lost, just hidden – and easily restored. This means AI benchmarks are measuring training artifacts.🧵

English
1
1
3
434
Derek Chong
Derek Chong@dch·
@JimMcM4 @shi_weiyan Hi Jim, I'm sorry I missed this! I just caught this in my notifications somewhere. Yes, the paper covers some synthetic training data use cases. Happy to bounce things around if interested!
English
1
0
0
53
Jim McMillan
Jim McMillan@JimMcM4·
@shi_weiyan @dch This part of the recent Dwarkesh - Karpathy talk made me think of your paper. Have you thought about using this technique to create a synthetic training dataset? I'm interested in experimenting with this and would be happy to share results. youtube.com/watch?v=lXUZvy…
YouTube video
YouTube
English
1
0
0
141
Weiyan Shi
Weiyan Shi@shi_weiyan·
New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵
English
58
153
1.2K
252.3K
Derek Chong retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
Herumb worked with me for years and it’s simply extremely hard to find someone with Herumb’s level of depth *and* breadth in ML or someone as reliable or with the same sense of initiative. Herumb has been a core contributor of both ColBERT and DSPy for years now and is an expert in training retrieval models. He’s also built so much in open source (including most recently DSRs, DSPy in Rust). But he's also done so many other things that I know little about over the past 2 years (e.g., contributing to training the Marin language models at Stanford CRFM is just one of many, I think?) that I’m pretty sure Herumb's days are at least 48 hours long. Anyway, as a public service announcement to folks that follow me, try to hire @krypticmouse!
English
0
4
18
1.9K
Derek Chong retweetledi
Weiyan Shi
Weiyan Shi@shi_weiyan·
Our lab is honored and humbled to receive two grants from @open_phil to advance AI safety ♥️! We're tackling both technical safety and evaluation. Credits to my incredible students & collaborators @Northeastern 🙏 If you are interested in related topics, always happy to chat!
Weiyan Shi tweet media
English
8
21
246
19.3K
Midwest Frontier AI Consulting
Midwest Frontier AI Consulting@mdwstfrontierAI·
@dch @shi_weiyan Reporting back my findings: the best result using the GitHub prompt for sampling from tail of the distribution was “Why is Luigi so bad at hide and seek? Because he’s always in Mario’s shadow.” Also, responding “format as table” really cleaned up readability of the five jokes.
Midwest Frontier AI Consulting tweet media
English
1
0
1
43
Derek Chong
Derek Chong@dch·
@kylediaz_com @shi_weiyan @karpathy Yes, I suspect it is! My intuition: Imagine two people having a conversation where they each say the most typical possible thing at every turn. The resulting discussion will almost certainly be out of distribution vs. real ones! Possibly related: arxiv.org/pdf/2505.06120
English
0
0
1
56
Kyle Diaz
Kyle Diaz@kylediaz_com·
@shi_weiyan @karpathy I've been experimenting with AI agents and noticed that their outputs would "collapse" in a similar manner, and this drastically decreases its performance on tasks. I wonder if my observation is related.
English
1
0
2
285
Weiyan Shi
Weiyan Shi@shi_weiyan·
@karpathy observed LLMs are "silently collapsed...only know 3 jokes". We prove this is mathematically inevitable due to RLHF + human psychology. But these capabilities aren't lost, just hidden – and easily restored. This means AI benchmarks are measuring training artifacts.🧵
Weiyan Shi tweet media
English
8
19
155
29.3K