Lawrence Feng (@lawrencefeng17) - Twitter Profili

Lawrence Feng retweetledi

Does it matter if specialized data comes early or late in training? The specialized loss will tell you no but it's hiding the underlying mechanics. We show that early exposure radically improves retention after any subsequent fine-tuning. A deeper look at learning dynamics, and why curriculum is a foundational piece of the continual learning puzzle. 👇

Lawrence Feng@lawrencefeng17

1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.

English

0

7

64

8.9K

Lawrence Feng@lawrencefeng17·21 May

@jkminder Cool! It would be interesting to explore if post-training could escape this sort of "representational freeze" or provide empirical evidence to the contrary (reinforcing the claim that we need to embed safety behaviors from the start of training).

English

1

0

3

54

Julian Minder@jkminder·21 May

@lawrencefeng17 Aa awesome, didn’t make the connection but we‘ve just discussed this paper yesterday! Very cool stuff!! Also connected to this and my intuition about the phenomenon - just saw the following: x.com/jkminder/statu…

Julian Minder@jkminder

Given our recent blogpost on Alignment from Token Zero, the finding below aligns exactly with my intuitions about why safety needs to be considered early. Representational geometry locks in early (and may be hard to change later?)

English

1

0

2

71

Lawrence Feng retweetledi

Julian Minder@jkminder·20 May

New blog! Synthetic Persona Pretraining (SPP): Alignment from Token Zero Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵

English

17

39

299

44.2K

Lawrence Feng@lawrencefeng17·21 May

@jkminder Yep! Totally agreed! Just wanted to learn your perspective on this. We released a similar finding yesterday: x.com/lawrencefeng17…

Lawrence Feng@lawrencefeng17

1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.

English

1

0

2

86

Julian Minder@jkminder·20 May

@lawrencefeng17 Good question. We have a direct comparison: our midtraining variant of SPP (closest in spirit to their SDF) underperforms the token-zero version on the same data. The question remains whether we can really find and modify all those unwanted behaviors? So: easier to start right.

English

1

0

2

305

Lawrence Feng@lawrencefeng17·20 May

@bearseascape @nsubramani23 Do you think that these four criterion are satisfiable? If superposition is true, why should we hope that we can find circuits with these properties at any granularity above activations?

English

1

0

1

91

Lawrence Feng retweetledi

Michael Li@bearseascape·20 May

Do the circuits we extract to explain a model's behavior actually tell us how it solves a specific task? In new work w/ @nsubramani23, we find that circuits fail a basic check: ablating one task's circuit hurts another task about as much as ablating that task's own circuit. 🧵

English

2

6

25

2.5K

Lawrence Feng@lawrencefeng17·19 May

@hvngo8 Yes, but we come to different conclusions! They argue that replay is no longer useful when the target data is present in pretraining. We argue that it is, once you consider a subsequent training stage!

English

0

4

245

huong@hvngo8·19 May

@lawrencefeng17 this is going into my list along with arxiv.org/abs/2603.04964!

English

1

0

5

314

Lawrence Feng@lawrencefeng17·19 May

1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.

English

6

24

86

26.6K

Lawrence Feng@lawrencefeng17·19 May

@gaurav_ghosal @jacspringer @fjzzq2002 @AdtRaghunathan Cc: @OwainEvans_UK @_julianmichael_

0

5

496

Lawrence Feng@lawrencefeng17·19 May

9/ This was joint work with Gaurav Ghosal (@gaurav_ghosal), Jacob Mitchell Springer (@jacspringer), Ziqian Zhong (@fjzzq2002), and Aditi Raghunathan (@AdtRaghunathan) @ CMU. Paper: arxiv.org/pdf/2605.12705 Website: ar-forum.github.io/earlyexposure-…

English

1

13

643

Lawrence Feng@lawrencefeng17·19 May

@gaurav_ghosal Thanks Gaurav! Couldn’t have done it without you!

English

0

2

117

Gaurav Ghosal@gaurav_ghosal·19 May

Had a great time working on this project exploring how to proactively prevent forgetting of capabilities during subsequent training! All credit goes to @lawrencefeng17 for leading it so skillfully!

Lawrence Feng@lawrencefeng17

1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.

English

1

3

13

1.6K

Lawrence Feng@lawrencefeng17·23 Nis

@sudarshk_ sign up for a marathon

English

0

1

31