Eric Gan

13 posts

Eric Gan

Eric Gan

@ejcgan

Katılım Ekim 2025
2 Takip Edilen59 Takipçiler
Eric Gan
Eric Gan@ejcgan·
Most surprising finding: the hardest red-team skill isn't making sabotages, it's predicting which single change actually flips a result. Many of my attempted sabotages didn't move the outcome of the experiment.
English
1
1
7
186
Eric Gan
Eric Gan@ejcgan·
Can frontier LLMs and humans catch sabotage in ML research code? In Auditing Sabotage Bench, I added subtle sabotages to 9 existing ML codebases which change a key finding of the research. Neither LLMs nor LLM-assisted humans reliably caught them.
Eric Gan tweet media
English
2
10
58
6.2K
Eric Gan
Eric Gan@ejcgan·
If Tinker didn't exist I likely wouldn't work on this project at all. I'd tried renting GPUs before but scaling multi-node training is always a pain. I was basically waiting for something reliable to exist before diving into RL work
English
1
0
3
474
Eric Gan
Eric Gan@ejcgan·
What I really like about Tinker is that you can directly look at the code, understand it, and modify it. It's much more flexible than other RL APIs, and you don't have to deal with the infra yourself. It's also just really nicely written.
English
1
0
1
492
Eric Gan
Eric Gan@ejcgan·
I've been using Tinker at Redwood Research to RL-train long-context models like Qwen3-32B on difficult AI control tasks - specifically teaching models to write unsuspicious backdoors in code similar to the AI control paper. Early stages but seeing some interesting backdoors 👀
English
2
3
50
15.8K