Zichen Liu

580 posts

Zichen Liu

@zzlccc

Gemini RL @GoogleDeepMind

Singapore شامل ہوئے Ekim 2021

459 فالونگ6K فالوورز

پن کیا گیا ٹویٹ

Zichen Liu@zzlccc·21 Mar

🪂Understanding R1-Zero-Like Training: A Critical Perspective * DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning?? * The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO?? * Getting GRPO Done Right, we achieve a 7B AIME sota! 🧵 📜Full details: github.com/sail-sg/unders… 🛠️Code: github.com/sail-sg/unders…

English

186

1.4K

329.8K

Zichen Liu@zzlccc·11 Nis

@YiTayML exp++

145

Yi Tay@YiTayML·10 Nis

@zzlccc Vagueposting skill unlocked haha

English

956

Zichen Liu@zzlccc·10 Nis

rl intuition (up-scaled by the correctness of infra) is all you need when cooking with a strong base model such as gemini✨

English

4.9K

Zichen Liu ری ٹویٹ کیا

xiaoying@Emma_bazinga·4 Nis

🚀 Excited to share RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback! Instead of training LLM agents just to solve isolated tasks, we designed them to continuously evolve. By using retrospective dual intrinsic feedback (Numerical for exploration + Language for explicit memory) & SimUtil-UCB retrieval, we achieve massive SOTA gains over GRPO baselines: 🔥 +18.3% on ALFWorld 🛒 +15.4% on WebShop 📦 +27.1% on Sokoban 💣 +8.9% on MineSweeper Strong test-time adaptation & OOD generalization included! 🧠👇 📄 Paper: arxiv.org/abs/2603.08561💻 Code: github.com/zhangxy-2019/R…

English

1.1K

Zichen Liu@zzlccc·21 Mar

@bingyikang @amilabs @sainingxie @ylecun Huge congrats!!

English

490

Bingyi Kang@bingyikang·20 Mar

Excited to finally share that I’m part of the founding team at @amilabs. After many detours along the way, it’s always been about intelligence. Grateful to be doing this with @sainingxie, @ylecun, and many old and new friends.

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

308

67.7K

Zichen Liu ری ٹویٹ کیا

Fanqing Meng@FanqingMengAI·17 Mar

Text agents have their Gym. Vision agents? Not until now. Introducing Gym-V — a unified gym-style platform for agentic vision research, with 179 procedurally generated environments across 10 domains. One API to rule them all: 📦 Offline dataset 🤖 Agentic RL training 🔧 Tool-use training 👥 Multi-agent training 📊 VLM & T2I model evaluation All under the same reset/step interface. Key findings: 1. Observation scaffolding matters MORE than RL algorithm choice 2. Broad curricula transfer well; narrow training causes negative transfer 3. Multi-turn interaction amplifies everything 📄 Paper: arxiv.org/abs/2603.15432 💻 Code: github.com/ModalMinds/gym… Open the thread for a deep dive! 🧵

English

110

9.8K

Zichen Liu@zzlccc·17 Mar

🦎🦎 Happy to see two of our works (DrGRPO & DPPO) are highlighted here! I don’t think changing a few terms is worth a new branding, so we respectfully kept predecessors’ name while highlighting the correction/improvement on top of them. Hopefully they inspire RL algo designs.

Alex Weers@a_weers

Finally finished! If you're interested in an overview of recent methods in reinforcement learning for reasoning LLMs, check out this blog post: aweers.de/blog/2026/rl-f… It summarizes ten methods, tries to highlight differences and trends, and has a collection of open problems

English

3.8K

Zichen Liu@zzlccc·11 Mar

Huge congrats to Min!

Min Lin@mavenlin

The most exciting breakthroughs in intelligence are yet to come. I’m super excited to start this journey with mes amis to make them happen together.

English

3.6K

Zichen Liu ری ٹویٹ کیا

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

961

2.1K

19.5K

3.6M

Zichen Liu@zzlccc·10 Mar

@eliebakouch All the best!

English

181

elie@eliebakouch·9 Mar

today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)

English

116

746

33.1K

Zichen Liu ری ٹویٹ کیا

Tinker@tinkerapi·4 Mar

GEM is a standardized environment suite for training agentic LLMs with RL. It handles tool use, multi-environment benchmarking, and plugs directly into Tinker as a training backend — giving researchers a modular way to test RL algorithms on agentic tasks. x.com/zzlccc/status/…

Zichen Liu@zzlccc

GEM❤️Tinker GEM, an environment suite with a unified interface, works perfectly with Tinker, the API by @thinkymachines that handles the heavy lifting of distributed training. In our latest release of GEM, we 1. supported Tinker and 5 more RL training frameworks 2. reproduced deepseek-r1 length increasing with LoRA 3. benchmarked PPO, GRPO, REINFORCE and showed their tradeoffs 4. added Terminal, MCP, visual and multi-agent environments … Open the thread for a deep dive!

English

4.4K

Zichen Liu ری ٹویٹ کیا

koray kavukcuoglu@koraykv·3 Mar

Gemini 3.1 Flash-Lite is available now! It takes an unbelievable amount of complex engineering to make AI feel instantaneous, enabling exciting new frontiers for experimentation!

English

280

186.2K

Zichen Liu@zzlccc·4 Mar

@JustinLin610 Thanks for your great contribution and all the best Junyang…

English

640

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

730

13.6K

6.6M

Zichen Liu@zzlccc·2 Mar

Welcome @NiJinjie !

Jinjie Ni@NiJinjie

Life update: I’ve joined @GoogleDeepMind as a research scientist to work on ✨gemini scaling and RL, under the leadership of Yi Tay (@YiTayML) and Quoc Le (@quocleix). I feel extremely fortunate to be on the critical path towards AGI and can't wait to help push the frontier of gemini capabilities! 🚀

English

4.4K

Zichen Liu ری ٹویٹ کیا

Yi Tay@YiTayML·2 Mar

Congrats and welcome @NiJinjie to the center of AGI (🇸🇬 branch). Taking highly technical and capable researchers like this and giving them a chance to be at the frontier to make tons of impact for Gemini has been one of the most rewarding things of founding & building GDM in SG.

Jinjie Ni@NiJinjie

English

118

17K

Zichen Liu@zzlccc·1 Mar

@agihippo 😂 lol

376

yi@agihippo·1 Mar

My friends ask me why I bought another car. I told them this is my sidecar. It's for eval.

English

2.8K

Zichen Liu@zzlccc·1 Mar

@TalSchuster @agihippo same here transitioning from beginner to intermediate, let’s game together next time we visit US :)

English

Tal Schuster@TalSchuster·28 Şub

@agihippo What's the tennis level? I might seek collaborations 😂

English

215

yi@agihippo·27 Şub

I have the coolest team in the world. Normal google teams: what's your level? L5? L6? My team: what's your badminton level? What's your tennis level. 😂 This is what a cracked team looks like.

English

3.4K

Zichen Liu@zzlccc·26 Şub

@jiayi_pirate All the best Jiayi!

English

501

Jiayi Pan@jiayi_pirate·25 Şub

Today is my last day at xAI. It's been an intense, memorable year building in the frontier — I'm grateful to have worked with and learned from such talented and supportive colleagues. The journey itself is the reward. So long, and thanks for all the fish! 🐬

English

104

1.3K

106.3K

Zichen Liu ری ٹویٹ کیا

Shunyu Yao@ShunyuYao14·19 Şub

"And, what's next?" New Gemini 3.1 pro is here blog.google/innovation-and… Gemini is not only a good model, but better models coming in an unstopable way.

English

109

8.5K

Zichen Liu ری ٹویٹ کیا

Richard Sutton@RichardSSutton·19 Şub

David Silver's new $4bn company, Ineffable Intelligence, will fulfil the promise of the Era of Experience. youtube.com/watch?v=zzXyPG… ft.com/content/dffe72…

YouTube

English

721

61.3K

دریافت کریں

@YiTayML @bingyikang @amilabs @sainingxie @ylecun @eliebakouch @JustinLin610 @NiJinjie