Lizhang Chen
130 posts

Lizhang Chen
@lzchen_ut
Student researcher @Google @GoogleDeepMind | Prev @aws @ Seed I will be graduating in 2026 and am actively seeking full-time positions in industry.

1/ nanogpt slowrun 🐢 update: we're focusing on occasional big data efficiency updates, but we had a lot of interesting additions in the last few weeks, here's the rundown: · multi-token prediction (@clark_kev) · looped transformers (@cs_serdar) · test-time training (TTT) (@Sam_Acqua) · stochastic logits averaging (@bishmdl76, @ShmuelBerman) · stochastic depth (@ChinmayKak) · IHA attention (@madhavsinghal_) · tuning for gradients norm (@zhiweixux) · probability averaging + scaling ensembles (@lzchen_ut) · MuonEq-R (@clark_kev)



🧠 Moving from compute-bound to data-bound! 🚀 Data-Run is a new open-source competition testing data efficiency in language model training. The goal? Hit a validation loss of <= 3.2 with the absolute minimum number of unique tokens. 📊 Curate, distill, or synthesize your best dataset and climb the leaderboard. 🏆 How few tokens do you need? 🏆🏆🏆 Show us what you've got! 👇 github.com/L-z-Chen/data-…







