Eric Gan

@ejcgan

Katılım Ekim 2025

2 Takip Edilen59 Takipçiler

Paper: arxiv.org/abs/2604.16286 Code: github.com/ejcgan/auditin… Blog post: redwoodresearch.substack.com/p/research-sab…

Nederlands

171

Eric Gan@ejcgan·3d

Most surprising finding: the hardest red-team skill isn't making sabotages, it's predicting which single change actually flips a result. Many of my attempted sabotages didn't move the outcome of the experiment.

English

186

Eric Gan@ejcgan·3d

Can frontier LLMs and humans catch sabotage in ML research code? In Auditing Sabotage Bench, I added subtle sabotages to 9 existing ML codebases which change a key finding of the research. Neither LLMs nor LLM-assisted humans reliably caught them.

English

6.2K

Eric Gan@ejcgan·1 Eki

If Tinker didn't exist I likely wouldn't work on this project at all. I'd tried renting GPUs before but scaling multi-node training is always a pain. I was basically waiting for something reliable to exist before diving into RL work

English

474

Eric Gan@ejcgan·1 Eki

What I really like about Tinker is that you can directly look at the code, understand it, and modify it. It's much more flexible than other RL APIs, and you don't have to deal with the infra yourself. It's also just really nicely written.

English

492

Eric Gan@ejcgan·1 Eki

I've been using Tinker at Redwood Research to RL-train long-context models like Qwen3-32B on difficult AI control tasks - specifically teaching models to write unsuspicious backdoors in code similar to the AI control paper. Early stages but seeing some interesting backdoors 👀

English

15.8K

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry