Zichen Liu

580 posts

Zichen Liu banner
Zichen Liu

Zichen Liu

@zzlccc

Gemini RL @GoogleDeepMind

Singapore شامل ہوئے Ekim 2021
459 فالونگ6K فالوورز
پن کیا گیا ٹویٹ
Zichen Liu
Zichen Liu@zzlccc·
🪂Understanding R1-Zero-Like Training: A Critical Perspective * DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning?? * The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO?? * Getting GRPO Done Right, we achieve a 7B AIME sota! 🧵 📜Full details: github.com/sail-sg/unders… 🛠️Code: github.com/sail-sg/unders…
Zichen Liu tweet media
English
29
186
1.4K
329.8K
Yi Tay
Yi Tay@YiTayML·
@zzlccc Vagueposting skill unlocked haha
English
1
0
6
956
Zichen Liu
Zichen Liu@zzlccc·
rl intuition (up-scaled by the correctness of infra) is all you need when cooking with a strong base model such as gemini✨
English
3
1
69
4.9K
Zichen Liu ری ٹویٹ کیا
xiaoying
xiaoying@Emma_bazinga·
🚀 Excited to share RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback! Instead of training LLM agents just to solve isolated tasks, we designed them to continuously evolve. By using retrospective dual intrinsic feedback (Numerical for exploration + Language for explicit memory) & SimUtil-UCB retrieval, we achieve massive SOTA gains over GRPO baselines: 🔥 +18.3% on ALFWorld 🛒 +15.4% on WebShop 📦 +27.1% on Sokoban 💣 +8.9% on MineSweeper Strong test-time adaptation & OOD generalization included! 🧠👇 📄 Paper: arxiv.org/abs/2603.08561💻 Code: github.com/zhangxy-2019/R…
xiaoying tweet media
English
1
1
7
1.1K
Zichen Liu ری ٹویٹ کیا
Fanqing Meng
Fanqing Meng@FanqingMengAI·
Text agents have their Gym. Vision agents? Not until now. Introducing Gym-V — a unified gym-style platform for agentic vision research, with 179 procedurally generated environments across 10 domains. One API to rule them all: 📦 Offline dataset 🤖 Agentic RL training 🔧 Tool-use training 👥 Multi-agent training 📊 VLM & T2I model evaluation All under the same reset/step interface. Key findings: 1. Observation scaffolding matters MORE than RL algorithm choice 2. Broad curricula transfer well; narrow training causes negative transfer 3. Multi-turn interaction amplifies everything 📄 Paper: arxiv.org/abs/2603.15432 💻 Code: github.com/ModalMinds/gym… Open the thread for a deep dive! 🧵
Fanqing Meng tweet media
English
8
17
110
9.8K
Zichen Liu
Zichen Liu@zzlccc·
🦎🦎 Happy to see two of our works (DrGRPO & DPPO) are highlighted here! I don’t think changing a few terms is worth a new branding, so we respectfully kept predecessors’ name while highlighting the correction/improvement on top of them. Hopefully they inspire RL algo designs.
Alex Weers@a_weers

Finally finished! If you're interested in an overview of recent methods in reinforcement learning for reasoning LLMs, check out this blog post: aweers.de/blog/2026/rl-f… It summarizes ten methods, tries to highlight differences and trends, and has a collection of open problems

English
2
4
40
3.8K
Zichen Liu ری ٹویٹ کیا
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
961
2.1K
19.5K
3.6M
elie
elie@eliebakouch·
today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)
English
116
10
746
33.1K
Zichen Liu ری ٹویٹ کیا
Tinker
Tinker@tinkerapi·
GEM is a standardized environment suite for training agentic LLMs with RL. It handles tool use, multi-environment benchmarking, and plugs directly into Tinker as a training backend — giving researchers a modular way to test RL algorithms on agentic tasks. x.com/zzlccc/status/…
Zichen Liu@zzlccc

GEM❤️Tinker GEM, an environment suite with a unified interface, works perfectly with Tinker, the API by @thinkymachines that handles the heavy lifting of distributed training. In our latest release of GEM, we 1. supported Tinker and 5 more RL training frameworks 2. reproduced deepseek-r1 length increasing with LoRA 3. benchmarked PPO, GRPO, REINFORCE and showed their tradeoffs 4. added Terminal, MCP, visual and multi-agent environments … Open the thread for a deep dive!

English
0
9
43
4.4K
Zichen Liu ری ٹویٹ کیا
koray kavukcuoglu
koray kavukcuoglu@koraykv·
Gemini 3.1 Flash-Lite is available now! It takes an unbelievable amount of complex engineering to make AI feel instantaneous, enabling exciting new frontiers for experimentation!
English
20
42
280
186.2K
Zichen Liu
Zichen Liu@zzlccc·
@JustinLin610 Thanks for your great contribution and all the best Junyang…
English
0
0
1
640
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
730
13.6K
6.6M
Zichen Liu ری ٹویٹ کیا
Yi Tay
Yi Tay@YiTayML·
Congrats and welcome @NiJinjie to the center of AGI (🇸🇬 branch). Taking highly technical and capable researchers like this and giving them a chance to be at the frontier to make tons of impact for Gemini has been one of the most rewarding things of founding & building GDM in SG.
Jinjie Ni@NiJinjie

Life update: I’ve joined @GoogleDeepMind as a research scientist to work on ✨gemini scaling and RL, under the leadership of Yi Tay (@YiTayML) and Quoc Le (@quocleix). I feel extremely fortunate to be on the critical path towards AGI and can't wait to help push the frontier of gemini capabilities! 🚀

English
4
9
118
17K
yi
yi@agihippo·
My friends ask me why I bought another car. I told them this is my sidecar. It's for eval.
English
1
0
33
2.8K
Zichen Liu
Zichen Liu@zzlccc·
@TalSchuster @agihippo same here transitioning from beginner to intermediate, let’s game together next time we visit US :)
English
0
0
2
39
Tal Schuster
Tal Schuster@TalSchuster·
@agihippo What's the tennis level? I might seek collaborations 😂
English
3
0
3
215
yi
yi@agihippo·
I have the coolest team in the world. Normal google teams: what's your level? L5? L6? My team: what's your badminton level? What's your tennis level. 😂 This is what a cracked team looks like.
English
2
0
39
3.4K
Jiayi Pan
Jiayi Pan@jiayi_pirate·
Today is my last day at xAI. It's been an intense, memorable year building in the frontier — I'm grateful to have worked with and learned from such talented and supportive colleagues. The journey itself is the reward. So long, and thanks for all the fish! 🐬
English
104
23
1.3K
106.3K
Zichen Liu ری ٹویٹ کیا
Shunyu Yao
Shunyu Yao@ShunyuYao14·
"And, what's next?" New Gemini 3.1 pro is here blog.google/innovation-and… Gemini is not only a good model, but better models coming in an unstopable way.
Shunyu Yao tweet media
English
12
8
109
8.5K