Rob

157 posts

Rob

Rob

@rob_kopel

Explaining AI to institutions and institutions to AI. Partner @ PwC. Opinions are my own.

Katılım Ocak 2013
149 Takip Edilen36 Takipçiler
Sabitlenmiş Tweet
Rob
Rob@rob_kopel·
“AI will mass replace jobs” - if so, I've built a tool to find where you should already see it 👉 Automation Risk Explorer: feed it a company → it infers an org, maps roles to O*NET, overlays Anthropic’s task‑usage data, and rolls up exposure across the workforce, industry, and country. automationrisk.app
Rob tweet mediaRob tweet mediaRob tweet media
English
1
0
1
400
Rob
Rob@rob_kopel·
ahh - so if small models can do more latent steps when taught, it suggests the layer constraint I'm seeing is more so about circuit selection and boosting Perhaps this is supported by some of the logit lens results I had. Hops all along the target path would consistently be lifted, including in depths where accuracy < noise - all while nodes off the path were consistently suppressed Curious whether you saw any qualitative difference in how models fail at the depth boundary when FTed? Wonder if you'd still see this "partial execution" where early steps drown out later ones, or perhaps a more binary result, where beyond the boundary the model doesn't attempt a latent strategy at all?
Rob tweet mediaRob tweet media
English
0
0
0
9
Laura Ruis
Laura Ruis@LauraRuis·
@rob_kopel I think FT in this case is barely helping over few-shot, because the supervision is sparse and doesn't seem to be helping much when the bottleneck is finding the right strategy at all (regardless of whether you give labels in ft or the prompt)
English
1
0
1
18
Laura Ruis
Laura Ruis@LauraRuis·
Exciting new finding: LLMs struggle to *discover* a latent planning strategy for a task that is trivial when taught step-by-step. Scaling helps surprisingly little: from an 8-layer model to GPT-5.4 buys only 4 extra steps. We argue this is good news, for CoT monitoring ⤵️
Laura Ruis tweet media
English
6
49
352
38.3K
Yi Xu
Yi Xu@_yixu·
🔬Attention analysis on the from-scratch transformer suggests that successful models learn a backtracking strategy: attention concentrates along the target-to-source path. When planning fails, attention is diffuse and uniform.
Yi Xu tweet mediaYi Xu tweet media
English
2
1
13
670
Yi Xu
Yi Xu@_yixu·
🚀Scale a 1.6M-param transformer to one of today's strongest models, GPT-5.4, and it plans... 4 steps further. In our latest work, we measure how many steps of latent planning LLMs can discover and execute without chain-of-thought and find a persistent depth ceiling that resists scale. 🧵👇
Yi Xu tweet media
English
4
26
193
16.8K
Rob
Rob@rob_kopel·
Super interesting! I've recently been looking at similar and have found mechanistically (in open-weight models) that the serial layerwise computations limit planning depth Training does seem to have an effect e.g. sonnet 4.5 can perform more steps than 4.0 (assuming they're the same base model) but it seems primariarly mediated by there only a thin layer range that contains circuits required to perform strategy steps Also fine tuning is a great idea, what's your thinking on if you could achieve equivalent strategy depths between a perfect few-shot and FT? Do you think FT is enabling more strategy step circuits, or potentially even broadening the layer range? x.com/rob_kopel/stat…
English
1
0
1
18
Laura Ruis
Laura Ruis@LauraRuis·
We need more work on latent reasoning, and specifically strategy discovery. We don't know much about how well LLMs can reason latently, and how those abilities scale.
English
4
0
12
1.4K
Rob
Rob@rob_kopel·
@lu_sichu Want to be the 35th person to see my pretty mech interp diagrams?
English
0
0
0
37
Rob
Rob@rob_kopel·
@imitationlearn You probably would enjoy reading on weak-to-strong generalisation and easy-to-hard generalisation In short there’s some evidence suggesting yes
English
1
0
1
22
imit
imit@imitationlearn·
does alignment of sub-superhuman language models actually tell us worthwhile things about alignment of models out of that regime?
English
4
0
6
558
Rob
Rob@rob_kopel·
Out-of-context reasoning doesn't just happen at one token. It can leave hidden "mid-thought" signals that steer later answers Interestingly I find these signals can also be capability-gated i.e. affect Opus but not Sonnet on the same prompt Short thread including tracing this circuit in an open model below:
Rob tweet media
English
1
0
2
29
Rob
Rob@rob_kopel·
x.com/RyanPGreenblat… Turns out this capability gap between models lets you hide messages in problems that only the stronger model can solve We show in stylized cases you can use this to give hidden messages to Opus 4.5 but not Sonnet 4.5 in the same prompt
Ryan Greenblatt@RyanPGreenblatt

Older LLMs perform poorly at 2-hop reasoning without Chain-of-Thought (e.g. "What element has atomic number [age Louis XVI died]?") and many predicted this would persist. I find that more recent LLMs are much better at 2-hop and even 3-hop no-CoT reasoning. Post below:

English
0
0
1
26
Rob
Rob@rob_kopel·
x.com/rohinmshah/sta… We empirically measure a proxy for this! Testing serial tasks across 16+ models and showing models do exhibit large differences in "time." We also measure "width," how many parallel tasks the model can handle at once
Rob tweet media
Rohin Shah@rohinmshah

"Just read the chain of thought" is one of our best safety techniques. Why does it work? Because models can only think opaquely for a short time, long thinking must be transparent Can we quantify this? Yes! In our new paper, we show how to measure "time" for arbitrary networks.

English
0
0
0
12
Rob
Rob@rob_kopel·
@gleech what's the justification for this behaviour - "it's a game"?
English
1
0
1
54
Rob
Rob@rob_kopel·
@_simonsmith Or maybe it’s just that adaptive reasoning is far better at pacing its own usage? Bringing down the average?
English
0
0
0
9
Rob
Rob@rob_kopel·
@_simonsmith What’s the right way to read this chart? That only ~2% of cases go beyond 1M tokens with Mythos? Also worried about confounders, this is a computer use chart - what’s to say there’s no new video mode like Gemini which has 5x lower token usage per frame?
English
1
0
0
49