

Xiao Liang
203 posts

@MasterVito0601
Ph.D. student @UCLA, @uclanlp. Research Intern @MSFTResearch. LLMs, RL. Prev. @Tsinghua_Uni.














to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)








Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

Nemotron-CLIMBMix is now becoming the default recipe in nanochat speedrun. During the Time-to-GPT-2 Leaderboard experiments started by @karpathy, the community revisited CLIMBMix and found that it delivers by far the single biggest improvement to nanochat’s GPT-2 speedrun time. It’s incredibly rewarding to see the idea validated and adopted by the community. Huge thanks to everyone who experimented with it and pushed it forward 🚀 #L42" target="_blank" rel="nofollow noopener">github.com/karpathy/nanoc…