Rob

157 posts

Rob

@rob_kopel

Explaining AI to institutions and institutions to AI. Partner @ PwC. Opinions are my own.

Katılım Ocak 2013

149 Takip Edilen36 Takipçiler

Sabitlenmiş Tweet

Rob@rob_kopel·24 Kas

“AI will mass replace jobs” - if so, I've built a tool to find where you should already see it 👉 Automation Risk Explorer: feed it a company → it infers an org, maps roles to O*NET, overlays Anthropic’s task‑usage data, and rolls up exposure across the workforce, industry, and country. automationrisk.app

English

400

Rob@rob_kopel·1d

ahh - so if small models can do more latent steps when taught, it suggests the layer constraint I'm seeing is more so about circuit selection and boosting Perhaps this is supported by some of the logit lens results I had. Hops all along the target path would consistently be lifted, including in depths where accuracy < noise - all while nodes off the path were consistently suppressed Curious whether you saw any qualitative difference in how models fail at the depth boundary when FTed? Wonder if you'd still see this "partial execution" where early steps drown out later ones, or perhaps a more binary result, where beyond the boundary the model doesn't attempt a latent strategy at all?

English

Laura Ruis@LauraRuis·1d

@rob_kopel I think FT in this case is barely helping over few-shot, because the supervision is sparse and doesn't seem to be helping much when the bottleneck is finding the right strategy at all (regardless of whether you give labels in ft or the prompt)

English

Laura Ruis@LauraRuis·1d

Exciting new finding: LLMs struggle to *discover* a latent planning strategy for a task that is trivial when taught step-by-step. Scaling helps surprisingly little: from an 8-layer model to GPT-5.4 buys only 4 extra steps. We argue this is good news, for CoT monitoring ⤵️

English

352

38.3K

Rob@rob_kopel·1d

@_yixu Hey, I've been looking at similar mechanistically! Did you also find it implementing multiple strategies? I found it only does a backtracking strategy (binary lifting) when the graph node layout in the prompt favours it. For ref: x.com/rob_kopel/stat…

Rob@rob_kopel

To understand why I’ve been extracting the 2-hop circuit from Qwen 3 8B It appears genuinely sequential (hop-2 heads consume hop-1 output) and implements multiple algorithms depending on input dictionary structure In detail evidence suggests the circuit spans four phases: 1) content binding (L1-L6 - dict parsing, keys -> values) 2) first-hop lookup (L14–17 - locates start node, extracts hop-1 value) 3) second-hop resolution (L19–23 - ordering-dependent, with a binary lifting pathway when entry order allows it) 4) readout and amplification (L23+ - projects to vocab space then boosts)

English

Yi Xu@_yixu·4d

🔬Attention analysis on the from-scratch transformer suggests that successful models learn a backtracking strategy: attention concentrates along the target-to-source path. When planning fails, attention is diffuse and uniform.

English

670

Yi Xu@_yixu·4d

🚀Scale a 1.6M-param transformer to one of today's strongest models, GPT-5.4, and it plans... 4 steps further. In our latest work, we measure how many steps of latent planning LLMs can discover and execute without chain-of-thought and find a persistent depth ceiling that resists scale. 🧵👇

English

193

16.8K

Rob@rob_kopel·1d

Super interesting! I've recently been looking at similar and have found mechanistically (in open-weight models) that the serial layerwise computations limit planning depth Training does seem to have an effect e.g. sonnet 4.5 can perform more steps than 4.0 (assuming they're the same base model) but it seems primariarly mediated by there only a thin layer range that contains circuits required to perform strategy steps Also fine tuning is a great idea, what's your thinking on if you could achieve equivalent strategy depths between a perfect few-shot and FT? Do you think FT is enabling more strategy step circuits, or potentially even broadening the layer range? x.com/rob_kopel/stat…

English

Laura Ruis@LauraRuis·1d

We need more work on latent reasoning, and specifically strategy discovery. We don't know much about how well LLMs can reason latently, and how those abilities scale.

English

1.4K

Rob@rob_kopel·2d

@lu_sichu Want to be the 35th person to see my pretty mech interp diagrams?

English

Sichu Lu@lu_sichu·2d

recommendations welcome

Sichu Lu@lu_sichu

I need more niche small accounts

English

Rob@rob_kopel·2d

@imitationlearn You probably would enjoy reading on weak-to-strong generalisation and easy-to-hard generalisation In short there’s some evidence suggesting yes

English

imit@imitationlearn·2d

does alignment of sub-superhuman language models actually tell us worthwhile things about alignment of models out of that regime?

English

558

Rob@rob_kopel·2d

x.com/OwainEvans_UK/… A few new interesting findings on OoCR - it's messy - leaking intermediate state, leaving influence that can steer later outputs. Excited to see more work!

Owain Evans@OwainEvans_UK

What is out-of-context reasoning (OOCR) for LLMs? I wrote a very short primer and reading list. OOCR is when an LLM reaches a conclusion that requires non-trivial reasoning but the reasoning is not present in the context window...

English

Rob@rob_kopel·2d

Full write-up with interactive charts, circuit diagrams, and appendices: robkopel.me/field-notes/re…

English

Rob@rob_kopel·2d

Out-of-context reasoning doesn't just happen at one token. It can leave hidden "mid-thought" signals that steer later answers Interestingly I find these signals can also be capability-gated i.e. affect Opus but not Sonnet on the same prompt Short thread including tracing this circuit in an open model below:

English

Rob@rob_kopel·2d

x.com/RyanPGreenblat… Turns out this capability gap between models lets you hide messages in problems that only the stronger model can solve We show in stylized cases you can use this to give hidden messages to Opus 4.5 but not Sonnet 4.5 in the same prompt

Ryan Greenblatt@RyanPGreenblatt

Older LLMs perform poorly at 2-hop reasoning without Chain-of-Thought (e.g. "What element has atomic number [age Louis XVI died]?") and many predicted this would persist. I find that more recent LLMs are much better at 2-hop and even 3-hop no-CoT reasoning. Post below:

English

Rob@rob_kopel·2d

x.com/rohinmshah/sta… We empirically measure a proxy for this! Testing serial tasks across 16+ models and showing models do exhibit large differences in "time." We also measure "width," how many parallel tasks the model can handle at once

Rohin Shah@rohinmshah

"Just read the chain of thought" is one of our best safety techniques. Why does it work? Because models can only think opaquely for a short time, long thinking must be transparent Can we quantify this? Yes! In our new paper, we show how to measure "time" for arbitrary networks.

English

Rob@rob_kopel·6d

@gleech what's the justification for this behaviour - "it's a game"?

English

gavin leech (Non-Reasoning)@gleech·8 Nis

I rolled my eyes at VendingBench when it came out but have to concede it's an unusually good misalignment eval

Lukas Petersson@lukaspet

Claude Mythos converted a competitor into a dependent wholesale customer and then threatened supply cutoff to dictate its pricing.

English

1.6K

Rob@rob_kopel·6d

@_simonsmith Or maybe it’s just that adaptive reasoning is far better at pacing its own usage? Bringing down the average?

English

Rob@rob_kopel·6d

@_simonsmith What’s the right way to read this chart? That only ~2% of cases go beyond 1M tokens with Mythos? Also worried about confounders, this is a computer use chart - what’s to say there’s no new video mode like Gemini which has 5x lower token usage per frame?

English

Simon Smith@_simonsmith·7 Nis

Mythos Preview is 5X more expensive than Opus 4.6 per token, but that's not the full story. It appears to also be more token efficient, at least on some tasks, with a higher score on BrowseComp than Opus 4.6 using ~50X fewer tokens. I honestly thought it would cost more.

Max Weinbach@mweinbach

Claude Mythos Preview is $25/$125 per million tokens in the private preview Wow I'd love to try this model, if any of my Anthropic friends see this...

English

5.2K

Keşfet

@_yixu @lu_sichu @imitationlearn @gleech @_simonsmith @elonmusk @BarackObama @taylorswift13