Charles Foster

6.1K posts

Charles Foster banner
Charles Foster

Charles Foster

@CFGeek

Excels at reasoning & tool use🪄 Tensor-enjoyer 🧪 @METR_Evals. My COI policy is available under “Disclosures” at https://t.co/bihrMIUKJq

Oakland, CA Katılım Haziran 2020
551 Takip Edilen3.4K Takipçiler
Sabitlenmiş Tweet
Charles Foster
Charles Foster@CFGeek·
Running list of conjectures about neural networks 📜:
English
6
13
167
40K
Herbie Bradley
Herbie Bradley@herbiebradley·
The first AI system capable of acting as a "drop in knowledge worker" will have continual learning via:
English
5
1
18
2.6K
Charles Foster
Charles Foster@CFGeek·
> Recursive embraces the logical conclusion: the fastest path to superintelligence will be realized by AI that recursively improves itself… Throughout, we will prioritize safety. We must make sure the system helps humanity flourish by maximizing the benefits while reducing risks
Recursive@Recursive_SI

x.com/i/article/2054…

English
1
0
14
2.2K
Yafah Edelman
Yafah Edelman@YafahEdelman·
The new METR time horizon graph is pretty bad imo. It's a great benchmark, but the time horizon estimation isn't reasonable rn. I think something like this would be more justified:
Yafah Edelman tweet media
English
5
6
136
17.4K
davinci
davinci@leothecurious·
my problem with rewards is that they fundamentally operate over behaviors, not outcomes. when u formulate a reward function, u have a goal in mind, a goal which u'd like the AI to always try and achieve, and u make that goal implicit in the reward. the reward function is a proxy, u don't care much about it, the rewards themselves don't really mean much to u, they're just a means to an end, but the outcomes do matter. reinforcement learning is imo a very crude way to indirectly surface desirable outcomes in an autonomous agent.
English
4
2
43
2.3K
Charles Foster
Charles Foster@CFGeek·
@thkostolansky I think it’s still useful. Though I wish we used “task gaming” for the generic phenomenon and “reward hacking” for one specific mechanism that causes it.
English
0
0
4
119
Charles Foster
Charles Foster@CFGeek·
We can do better activation steering by using a flow model (over in-distribution activations) to regularize against OOD drift: take a steering step, regularize, repeat. As a way to follow the contours of the latent space while steering, rather than heading to “nonsense” areas.
English
0
0
0
161
Charles Foster
Charles Foster@CFGeek·
There are simple task-reframing methods (similar to inoculation prompting) for LLM-based RL agents to learn from very off-policy or off-dynamics rollouts.
English
1
0
0
334
Charles Foster
Charles Foster@CFGeek·
Running list of conjectures about neural networks 📜:
English
6
13
167
40K
Charles Foster
Charles Foster@CFGeek·
@GaryMarcus Also note: if you look at the raw data for any of the time horizon numbers, you’ll see they’re closer to measuring “At what point does the agent succeed on all attempts for 50% of tasks?” than “At what point does the agent succeed on 50% of attempts for every task?”
METR@METR_Evals

Of the 228 tasks in our suite, only 5 are estimated as 16+ hours long, making measurements at this range unstable and less meaningful than at ranges with better task coverage. Thus, we are not highlighting exact estimates for models above 16 hours measured with our current suite.

English
1
0
8
418
Charles Foster
Charles Foster@CFGeek·
@GaryMarcus I largely agree. Could quibble with you on what to count as neuro-symbolic and how far another $1T would go. But beyond those I think the caveats are correct.
English
2
0
7
637
Gary Marcus
Gary Marcus@GaryMarcus·
Hot take on METR’s new graph that so many people are flipping about today. • Claude Code is a real advance; Mythos probably builds on some of what is learned there. But… • If you read the graph carefully, it is about achieving *50%* success. Not 100 or 99 or even 90. The key problem with GenAI has been reliability; this graph does not address reliable performance. At all. • If you read carefully, it is only about software tasks. Not general intelligence. • It certainly doesn’t tell you that *most* (let alone) all things that humans can do in 16 hours can be done in Mythos, let alone reliably • Aside from this, the graph doesn’t show you *how* the improvements have been made. As noted in my newsletter a lot of the advance in recent months is likely from the incorporation of symbolic tools (like code interpreters, verification, and harnesses) rather than from model scaling per se. As such this a vindication of neurosymbolic AI – but not a proof that LLMs themselves can be perpetually scaled. As such it’s not a proof that another trillion dollars will continue the graph. •  Per @ramez, Mythos is not actually off trend on the ECI benchmark, which is a broader measure.
METR@METR_Evals

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

English
40
20
182
91.5K
Charles Foster retweetledi
Parker Whitfill
Parker Whitfill@whitfill_parker·
New post on the difference between 3 notions of productivity gain from AI (AKA uplift). Uplift on old tasks (AI-speedup on tasks you do in avg 2022 day) Uplift on new tasks (AI-speedup on tasks you do in avg 2026 day) Uplift in value (AI increasing your goals accomplished)
Parker Whitfill tweet media
English
5
27
123
28.6K
Charles Foster
Charles Foster@CFGeek·
@thkostolansky I don’t get it. If we can measure whether current models are [unable to do X without a scratchpad / prone to mentioning X in a scratchpad / unable to collude to do X without being noticed], why couldn’t we just keep checking the same for future more capable models?
English
1
0
1
49
Tim Kostolansky
Tim Kostolansky@thkostolansky·
also its kinda interesting to say "the models are monitorable cus look at how their cots have looked in the past wrt their answers -- its all lining up and its monitorable! now if we keep this property of cots being monitorable and the answers lining up with eval scores, this should continue to be good, right?" while it seems true to me that stronger/more capable models can probably just put closer and closer to ~whatever in their cots and remain undetected (or they could collude with the monitors cus the monitors are possibly amenable to this)?
English
2
0
1
84
Charles Foster
Charles Foster@CFGeek·
@thkostolansky It doesn’t seem unknowable if CoT is useful-to-us! We can examine this directly, observationally (like by asking whether developers and users pay attention to CoT) and experimentally (like by comparing downstream performance on matched tasks with and without CoT access).
English
0
0
1
13
Tim Kostolansky
Tim Kostolansky@thkostolansky·
@CFGeek i guess its unknowable/very hard to know if its not useful and just not showing us (of "its" "volition" or not) too tho, which is what im pointing to mostly
English
2
0
3
113
Charles Foster
Charles Foster@CFGeek·
@tokenbender I found this really hard to follow. The style of AI-assisted writing obscures (what seems like) a legit result.
English
1
0
2
152
tokenbender
tokenbender@tokenbender·
Ever wondered if you could extract capabilities and behaviors from neural networks and reuse/update/route it as needed? We introduce low-rank circuit conditioning, a novel approach that preserves the model's output behavior while reshaping how an existing capability is represented. In the base model, standard compact recovery stalls at 29%. After conditioning, the same extraction pipeline reaches 91.33% autoregressive full-answer recovery from 5.05% of MLP channels. The evidence points to a possibility of extracting and using isolated capabilities saving cost, latency and high adaptability. Read our work to understand more - tokenbender.com/posts/honey-i-…
English
26
66
394
40.4K