Todd Nief

3.4K posts

Todd Nief banner
Todd Nief

Todd Nief

@toddknife

CS PhD student @uchicago, Gym owner, Chicago Rationality organizer, Like Rats, Hate Force, etc.

Chicago Katılım Mart 2009
804 Takip Edilen455 Takipçiler
Todd Nief retweetledi
Xiaoyan Bai
Xiaoyan Bai@Elenal3ai·
📖 ≠ 🧪 The Story is Not the Science. Code is submitted but rarely executed during peer review--an issue likely to worsen with research agents.🧑‍🔬 We introduce MechEvalAgent, an execution-grounded evaluation of narrative + execution. Verify the science, not just the story. 1/n
Xiaoyan Bai tweet media
English
5
19
82
13K
Todd Nief retweetledi
Yichen (Zach) Wang
Yichen (Zach) Wang@YichenZW·
Lack of diversity in your LLM generation? (also noted by Artificial Hivemind, best paper @NeurIPSConf) Time to bring your base model back! An inference-time, token-level collaboration between a base and an aligned model can optimize and control diversity and quality!
Yichen (Zach) Wang tweet media
English
2
15
51
10.2K
Todd Nief
Todd Nief@toddknife·
@EkdeepL @universeinanegg 2. The "meaning" of a specific direction in the residual stream changing based on context. I think this can also happen based on the local geometry given a context.
English
0
0
2
29
Todd Nief
Todd Nief@toddknife·
@EkdeepL @universeinanegg Maybe useful to disentangle two ideas: 1. Changing context fundamentally changes downstream computation of a direction in the residual stream (Seems like what this paper is doing, also certainly happens with polysemy)
English
1
0
1
47
Ari Holtzman
Ari Holtzman@universeinanegg·
Can we find a direction in the residual stream that clearly has two very different interventional effects in different context or at different layers? This seems inevitable, since there aren't enough directions to encode all aspects of reality, but I haven't seen it yet
English
3
0
17
2.2K
Todd Nief
Todd Nief@toddknife·
@IkhlasulHanif0 I'm not sure if I fully understand your point, but steering can also overwrite previous information — this is potentially fine, but can impact off-target concepts and alter behavior
English
0
0
0
14
Hanif | AI NOT FOR PRODUCTIVITY
Hanif | AI NOT FOR PRODUCTIVITY@IkhlasulHanif0·
I haven’t really worked with activation patching myself, but I’ve done more with steering. I’ve been wondering whether the same idea applies to steering, in the sense that the steering vector we get for a certain layer assumes that the layer hasn’t already been affected by any steering.
English
1
0
0
73
Todd Nief
Todd Nief@toddknife·
Most mech interp work relies on activation patching, but patching activations destroys previous computation. What if we want to use a different mechanism on the same residual stream? We propose dynamic weight grafting to interpret finetuned model weights. 🧵 1/n
Todd Nief tweet media
English
2
7
31
5.7K
Todd Nief
Todd Nief@toddknife·
To conclude: 1. Dynamic weight grafting is a new technique that allows localization of finetuned model behavior to specific token positions and model components 13/n
English
1
0
2
160