


Junting Pan
140 posts

@junting9
Research Scientist @ Apple AIML | Prev: PhD@MMLab CUHK, Intern @AIatMeta (FAIR) and @samsungresearch. Working on foundation models.




Knowledge doesn't always flow downhill. We find that in LLM pretraining, a weaker teacher can improve a stronger student, and pushing the teacher further can actually hurt. New paper: Strong Teacher Not Needed? On Distillation in LLM Pretraining.










Honorable Mentions Data Shapley in One Training Run. Jiachen T. Wang, et al. SAM 2: Segment Anything in Images and Videos. Nikhila Ravi, et al. Faster Cascades via Speculative Decoding. Harikrishna Narasimhan, et al.








