Lawrence Feng

26 posts

Lawrence Feng

Lawrence Feng

@lawrencefeng17

Undergrad @ CMU

Katılım Ağustos 2024
130 Takip Edilen77 Takipçiler
Lawrence Feng retweetledi
Aditi Raghunathan
Aditi Raghunathan@AdtRaghunathan·
Does it matter if specialized data comes early or late in training? The specialized loss will tell you no but it's hiding the underlying mechanics. We show that early exposure radically improves retention after any subsequent fine-tuning. A deeper look at learning dynamics, and why curriculum is a foundational piece of the continual learning puzzle. 👇
Lawrence Feng@lawrencefeng17

1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.

English
0
7
64
8.9K
Lawrence Feng
Lawrence Feng@lawrencefeng17·
@jkminder Cool! It would be interesting to explore if post-training could escape this sort of "representational freeze" or provide empirical evidence to the contrary (reinforcing the claim that we need to embed safety behaviors from the start of training).
English
1
0
3
54
Julian Minder
Julian Minder@jkminder·
@lawrencefeng17 Aa awesome, didn’t make the connection but we‘ve just discussed this paper yesterday! Very cool stuff!! Also connected to this and my intuition about the phenomenon - just saw the following: x.com/jkminder/statu…
Julian Minder@jkminder

Given our recent blogpost on Alignment from Token Zero, the finding below aligns exactly with my intuitions about why safety needs to be considered early. Representational geometry locks in early (and may be hard to change later?)

English
1
0
2
71
Lawrence Feng retweetledi
Julian Minder
Julian Minder@jkminder·
New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵
Julian Minder tweet media
English
17
39
299
44.2K
Julian Minder
Julian Minder@jkminder·
@lawrencefeng17 Good question. We have a direct comparison: our midtraining variant of SPP (closest in spirit to their SDF) underperforms the token-zero version on the same data. The question remains whether we can really find and modify all those unwanted behaviors? So: easier to start right.
English
1
0
2
305
Lawrence Feng
Lawrence Feng@lawrencefeng17·
@bearseascape @nsubramani23 Do you think that these four criterion are satisfiable? If superposition is true, why should we hope that we can find circuits with these properties at any granularity above activations?
English
1
0
1
91
Lawrence Feng retweetledi
Michael Li
Michael Li@bearseascape·
Do the circuits we extract to explain a model's behavior actually tell us how it solves a specific task? In new work w/ @nsubramani23, we find that circuits fail a basic check: ablating one task's circuit hurts another task about as much as ablating that task's own circuit. 🧵
Michael Li tweet mediaMichael Li tweet media
English
2
6
25
2.5K
Lawrence Feng
Lawrence Feng@lawrencefeng17·
@hvngo8 Yes, but we come to different conclusions! They argue that replay is no longer useful when the target data is present in pretraining. We argue that it is, once you consider a subsequent training stage!
English
0
0
4
245
Lawrence Feng
Lawrence Feng@lawrencefeng17·
1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.
English
6
24
86
26.6K
Gaurav Ghosal
Gaurav Ghosal@gaurav_ghosal·
Had a great time working on this project exploring how to proactively prevent forgetting of capabilities during subsequent training! All credit goes to @lawrencefeng17 for leading it so skillfully!
Lawrence Feng@lawrencefeng17

1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.

English
1
3
13
1.6K
sudarsh 🌱🔶
sudarsh 🌱🔶@sudarshk_·
what are some torture methods for torture beginners? looking to dip my toes into experiencing suffering
English
7
0
7
390