Lizhang Chen (@lzchen_ut) - Twitter Profili | Zamantika Mersobahis Locabet

Lizhang Chen@lzchen_ut·23 Nis

@Yuchenj_UW cannot agree more!!! x.com/lzchen_ut/stat…

Pre-training is definitely not dead--there are still lots of promising directions worth exploring, and it’s exciting to see Slowrun pushing on so many of them. thanks, @industriaalist.

English

0

1

925

Yuchen Jin@Yuchenj_UW·23 Nis

Spud and Mythos are a reminder that pretraining still matters, a lot. RL is the cherry, not the cake.

English

28

23

492

29.9K

Lizhang Chen@lzchen_ut·21 Nis

Pre-training is definitely not dead--there are still lots of promising directions worth exploring, and it’s exciting to see Slowrun pushing on so many of them. thanks, @industriaalist.

Samip@industriaalist

1/ nanogpt slowrun 🐢 update: we're focusing on occasional big data efficiency updates, but we had a lot of interesting additions in the last few weeks, here's the rundown: · multi-token prediction (@clark_kev) · looped transformers (@cs_serdar) · test-time training (TTT) (@Sam_Acqua) · stochastic logits averaging (@bishmdl76, @ShmuelBerman) · stochastic depth (@ChinmayKak) · IHA attention (@madhavsinghal_) · tuning for gradients norm (@zhiweixux) · probability averaging + scaling ensembles (@lzchen_ut) · MuonEq-R (@clark_kev)

English

1

14

1.8K

Lizhang Chen retweetledi

Samip@industriaalist·20 Nis

1/ nanogpt slowrun 🐢 update: we're focusing on occasional big data efficiency updates, but we had a lot of interesting additions in the last few weeks, here's the rundown: · multi-token prediction (@clark_kev) · looped transformers (@cs_serdar) · test-time training (TTT) (@Sam_Acqua) · stochastic logits averaging (@bishmdl76, @ShmuelBerman) · stochastic depth (@ChinmayKak) · IHA attention (@madhavsinghal_) · tuning for gradients norm (@zhiweixux) · probability averaging + scaling ensembles (@lzchen_ut) · MuonEq-R (@clark_kev)

English

1

17

115

9.6K

Lizhang Chen@lzchen_ut·16 Nis

Excited to share that 3 papers will be presented at ICLR 2026: 📌 Cautious Weight Decay arxiv.org/abs/2510.12402 📌 Cautious Optimizers arxiv.org/abs/2411.16085 📌 DeMo: Decoupled Momentum Optimization arxiv.org/abs/2411.19870 Looking forward to presenting these works at #ICLR2026.

English

0

4

317

Lizhang Chen@lzchen_ut·6 Nis

@goodhunt @fujikanaeda @goodhunt x.com/lzchen_ut/stat… github.com/L-z-Chen/data-…

Lizhang Chen@lzchen_ut

🧠 Moving from compute-bound to data-bound! 🚀 Data-Run is a new open-source competition testing data efficiency in language model training. The goal? Hit a validation loss of <= 3.2 with the absolute minimum number of unique tokens. 📊 Curate, distill, or synthesize your best dataset and climb the leaderboard. 🏆 How few tokens do you need? 🏆🏆🏆 Show us what you've got! 👇 github.com/L-z-Chen/data-…

QME

0

3

1.7K

Hunter Bown@goodhunt·5 Nis

more people should be talking about this github.com/NVIDIA-NeMo/Da…

English

28

186

1.5K

153.1K

Lizhang Chen@lzchen_ut·30 Mar

@Yuchenj_UW I would ask my agent which one I should pick.

English

0

210

Yuchen Jin@Yuchenj_UW·29 Mar

If you had two software engineering offers: > One pays you $500k/year salary, but covers zero LLM tokens. > One pays you $400k/year salary, but gives you $500/day free LLM tokens. Which one are you taking?

English

394

18

2.1K

539.9K

Lizhang Chen@lzchen_ut·26 Mar

It really feels like Opus 4.6 has gotten dumber. @claudeai

English

0

1

309

Lizhang Chen@lzchen_ut·26 Mar

@claudeai

Lizhang Chen@lzchen_ut

It really feels like Opus 4.6 has gotten dumber.

QAM

0

78

Lizhang Chen@lzchen_ut·26 Mar

It really feels like Opus 4.6 has gotten dumber.

English

0

1

318

Lizhang Chen retweetledi

Ji-Ha@Ji_Ha_Kim·23 Mar

Blog Post - Lion-K CCWD: Corrected Cautious Weight Decay and Hyperparameter Transfer Derivation of Lion-K with Corrected Cautious Weight Decay (CCWD) and transformation rules for hyperparameter transfer fixing Complete(d)P momentum

English

5

10

56

10.2K

Lizhang Chen@lzchen_ut·23 Mar

🔥🔥🔥

Ji-Ha@Ji_Ha_Kim

jiha-kim.github.io/posts/lion-k-c…

ART

0

1

242

Lizhang Chen@lzchen_ut·17 Mar

@claudeai API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"req_011CZ9Fwi7zGuT7Sfkg5ipRs"} @claudeai

Dansk

0

60

Lizhang Chen@lzchen_ut·17 Mar

@claudeai API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"req_011CZ9EfzB5ByTWKVenQ8trK"}

Dansk

1

0

50

Lizhang Chen@lzchen_ut·12 Mar

@claudeai again

English

1

0

77

Lizhang Chen@lzchen_ut·16 Mar

🧠 Moving from compute-bound to data-bound! 🚀 Data-Run is a new open-source competition testing data efficiency in language model training. The goal? Hit a validation loss of <= 3.2 with the absolute minimum number of unique tokens. 📊 Curate, distill, or synthesize your best dataset and climb the leaderboard. 🏆 How few tokens do you need? 🏆🏆🏆 Show us what you've got! 👇 github.com/L-z-Chen/data-…

English

0

1

13

2.6K

Lizhang Chen

Keşfet