Lizhang Chen

130 posts

Lizhang Chen banner
Lizhang Chen

Lizhang Chen

@lzchen_ut

Student researcher @Google @GoogleDeepMind | Prev @aws @ Seed I will be graduating in 2026 and am actively seeking full-time positions in industry.

Austin, TX Katılım Kasım 2019
414 Takip Edilen186 Takipçiler
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Spud and Mythos are a reminder that pretraining still matters, a lot. RL is the cherry, not the cake.
English
28
23
492
29.9K
Lizhang Chen
Lizhang Chen@lzchen_ut·
Pre-training is definitely not dead--there are still lots of promising directions worth exploring, and it’s exciting to see Slowrun pushing on so many of them. thanks, @industriaalist.
Samip@industriaalist

1/ nanogpt slowrun 🐢 update: we're focusing on occasional big data efficiency updates, but we had a lot of interesting additions in the last few weeks, here's the rundown: · multi-token prediction (@clark_kev) · looped transformers (@cs_serdar) · test-time training (TTT) (@Sam_Acqua) · stochastic logits averaging (@bishmdl76, @ShmuelBerman) · stochastic depth (@ChinmayKak) · IHA attention (@madhavsinghal_) · tuning for gradients norm (@zhiweixux) · probability averaging + scaling ensembles (@lzchen_ut) · MuonEq-R (@clark_kev)

English
1
1
14
1.8K
Lizhang Chen retweetledi
Samip
Samip@industriaalist·
1/ nanogpt slowrun 🐢 update: we're focusing on occasional big data efficiency updates, but we had a lot of interesting additions in the last few weeks, here's the rundown: · multi-token prediction (@clark_kev) · looped transformers (@cs_serdar) · test-time training (TTT) (@Sam_Acqua) · stochastic logits averaging (@bishmdl76, @ShmuelBerman) · stochastic depth (@ChinmayKak) · IHA attention (@madhavsinghal_) · tuning for gradients norm (@zhiweixux) · probability averaging + scaling ensembles (@lzchen_ut) · MuonEq-R (@clark_kev)
English
1
17
115
9.6K
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
If you had two software engineering offers: > One pays you $500k/year salary, but covers zero LLM tokens. > One pays you $400k/year salary, but gives you $500/day free LLM tokens. Which one are you taking?
English
394
18
2.1K
539.9K
Lizhang Chen
Lizhang Chen@lzchen_ut·
It really feels like Opus 4.6 has gotten dumber.
English
0
0
1
318
Lizhang Chen retweetledi
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
Blog Post - Lion-K CCWD: Corrected Cautious Weight Decay and Hyperparameter Transfer Derivation of Lion-K with Corrected Cautious Weight Decay (CCWD) and transformation rules for hyperparameter transfer fixing Complete(d)P momentum
Ji-Ha tweet media
English
5
10
56
10.2K
Lizhang Chen
Lizhang Chen@lzchen_ut·
@claudeai API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"req_011CZ9Fwi7zGuT7Sfkg5ipRs"} @claudeai
Dansk
0
0
0
60
Lizhang Chen
Lizhang Chen@lzchen_ut·
@claudeai API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"req_011CZ9EfzB5ByTWKVenQ8trK"}
Dansk
1
0
0
50
Lizhang Chen
Lizhang Chen@lzchen_ut·
🧠 Moving from compute-bound to data-bound! 🚀 Data-Run is a new open-source competition testing data efficiency in language model training. The goal? Hit a validation loss of <= 3.2 with the absolute minimum number of unique tokens. 📊 Curate, distill, or synthesize your best dataset and climb the leaderboard. 🏆 How few tokens do you need? 🏆🏆🏆 Show us what you've got! 👇 github.com/L-z-Chen/data-…
Lizhang Chen tweet media
English
0
1
13
2.6K