Lin Guan

41 posts

Lin Guan

Lin Guan

@GuanSuns

Research Scientist at Meta GenAI | PhD@ASU, BS@UT Austin | RL and planning for conversational models and agents.

Katılım Şubat 2017
113 Takip Edilen63 Takipçiler
Nathan Lambert
Nathan Lambert@natolambert·
RL research in academia is shaping up to be much more exciting in 2026 than the last half of 2025. Here's why I see it as a healthier transition from a bit of benchmaxing to more interesting and robust research (dare I say RL generalizing across many domains?) 2025 was largely a setup year, where due to the simplicity of the environments, insignificant algorithmic changes could appear more valuable than they were. Today, substantial work is focusing on algorithms for tool-use, procedural environment generation, and richer data systems. I was noticing this with how there are many "environment" papers back in 2025, but thinking about what the difference is -- today they're more about generalization, diverse tasks, more complex environments. The first environments were something along the lines of "here's some existing tooling, lets adapt it to be a binary RL problem." Today, papers like endless terminals are setting up very interesting RL data pipelines. This is at least an interesting complement to the environments industry, where labs will buy 10-20 environments for millions of dollars and get benefit out of a few of them. Academics working on methods that create RL data and meaningful generalization (robust behaviors) will be way, way more fun to study and integrate than "another algorithm to increase AIME". Anyways, I'm happy to have figured this out, and if you see more papers understanding generalization across more environments or how clever design can make specialized small models with RL, tag me!
English
10
38
447
27.9K
Lin Guan
Lin Guan@GuanSuns·
🤔 Motivation: Just like AlphaGo, given enough compute and proper configurations, RL agents can achieve superhuman performance in almost all training tasks ⚠️ However, LLM agents today are being used in tasks far beyond what limited post-training data can cover — a challenge highlighted by many researchers like @ilyasut, @MiniMax_AI (in platform.minimax.io/docs/guides/te…) and @Kimi_Moonshot in K2.5 tech report.
English
1
2
3
447
Lin Guan
Lin Guan@GuanSuns·
Our latest work sheds light on what to "scale" when building generalizable agents: 👉 Prefer training envs with 📚 high state information richness (high perception load & info volume) + 🧠high planning complexity (long task horizon & branching factors) 👉 The *complexity structure* matters more than realism: Hard problems in "toy" domains like Sokoban / BlocksWorld can be more useful than easy problems in more realistic domains like ALFWorld. ‼️📉 Be careful about your mid-training datamix and strength: warmup and mid-training help prevent catastrophic forgetting during RL but undermines generalization to domains that are not covered 💡Applying lightweight state randomization/augmentation helps! 📄 Paper: huggingface.co/papers/2601.18…
English
1
4
9
1K
Lin Guan retweetledi
Zhaoran Wang
Zhaoran Wang@zhaoran_wang·
People are putting lots of efforts into building diverse RL environments. But what kinds of environments are more useful for building **generalist** agents that are able to solve tasks beyond their training tasks? 🚀 Instead of optimizing one single benchmark, we look for drivers of transfer in our latest paper: huggingface.co/papers/2601.18… Joint work with MSL @MetaAI (@ZhihanLiu21628 @GuanSuns @EasonNie @KaiZhang_CS @nazzhang) where @ZhihanLiu21628 interned. (1/n)
Zhaoran Wang tweet mediaZhaoran Wang tweet mediaZhaoran Wang tweet mediaZhaoran Wang tweet media
English
4
24
112
7.5K
Lin Guan
Lin Guan@GuanSuns·
While LLMs cannot verify the correctness of agent plans, they can be good at capturing the "style" of desirable behaviors. Come stop by #COLM2024 poster #34 Wednesday Morning to see how we utilize VLMs as a knowledge source of common human preferences for embodied agents!
Lin Guan tweet media
GIF
English
1
4
16
6.2K
Lin Guan
Lin Guan@GuanSuns·
This work is also part of an ongoing effort in our lab (supervised by @rao2z ) to identify constructive roles that LLMs/VLMs can serve in planning tasks. 👉 Relevant paper: arxiv.org/abs/2402.01817
English
0
0
1
155
Lin Guan
Lin Guan@GuanSuns·
While LLMs cannot verify the correctness of agent behaviors, they can be good at capturing the "style" of desirable behaviors. We set out to find the effectiveness of VLM critics of undesirable behaviors.
GIF
English
1
1
14
5.2K
Lin Guan retweetledi
Yantian Zha
Yantian Zha@YantianZha·
Robots now understand decisions via SERLfD – Self-Explanation for Reinforcement Learning from Demonstrations, transcending traditional robot learning from human demos. Check out our article and slides; find more insights at poster 409 during AAAI-24 on 2/23, 7-9 PM!🤖🌐 #AAAI2024
English
1
1
9
3.9K
Lin Guan retweetledi
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
📢📢 Come stop by #AAAI2024 poster spots 644 & 645 (the very last spots) this evening to hear me explain the neat work of 👉@ZahraZ__ & @sailiks on explaining allocations to humans 👉@YantianZha & @GuanSuns on using self explanations to learn from ambiguous demonstrations [Note that at my request, the posters were placed in contiguous spots instead of their original spots 409/435 so I didn't have to master the art of being in two places at one time.. 😋]
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet mediaSubbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
English
1
8
23
13.2K
Lin Guan
Lin Guan@GuanSuns·
Interested in building #LLMAgent but tired of endlessly fixing flawed plans? 😫🤦‍♀️ Our #NeurIPS paper reveals LLMs aren’t designed for action sequencing! Instead, they can be useful for extracting planning knowledge and generating codified models that drive external planners!💡1/
Lin Guan tweet mediaLin Guan tweet mediaLin Guan tweet media
English
1
0
4
1.9K
Lin Guan
Lin Guan@GuanSuns·
Having missed attending #ICLR2023 in-person, we are also re-presenting the paper at the #ICML2023 ILHF workshop on Saturday, July 29th. Feel free to drop by if you are on Waikiki ... 🏖️ Mahalo 🙏 2/
English
0
0
1
93