Lin Guan

41 posts

Lin Guan

@GuanSuns

Research Scientist at Meta GenAI | PhD@ASU, BS@UT Austin | RL and planning for conversational models and agents.

Katılım Şubat 2017

113 Takip Edilen63 Takipçiler

Lin Guan@GuanSuns·3 Mar

Our latest work demonstrates how industry-scale Flywheel systems can iteratively drive model improvements even when the objectives are not easily tractable!

Yixin Nie@EasonNie

1/5 🤔 LLMs can solve olympiad math and write production code. But can they hold a conversation that's actually fun — one that people want to keep coming back to? 💬✨ We present CharacterFlywheel— an iterative process optimizing LLMs for real human engagement and character steerability, while maintaining rigorous safety protocols 🔒. Tested across Instagram, WhatsApp & Messenger 📱with millions of users — where they can create, share, and chat with their own AI characters 🤖. 📄 paper: arxiv.org/abs/2603.01973 huggingface.co/papers/2603.01…

English

Lin Guan@GuanSuns·17 Şub

@natolambert Totally agree! Also sharing our recent paper that shows RL environments with high state information richness & high planning complexity are crucial for generalization: x.com/GuanSuns/statu…

Lin Guan@GuanSuns

Our latest work sheds light on what to "scale" when building generalizable agents: 👉 Prefer training envs with 📚 high state information richness (high perception load & info volume) + 🧠high planning complexity (long task horizon & branching factors) 👉 The *complexity structure* matters more than realism: Hard problems in "toy" domains like Sokoban / BlocksWorld can be more useful than easy problems in more realistic domains like ALFWorld. ‼️📉 Be careful about your mid-training datamix and strength: warmup and mid-training help prevent catastrophic forgetting during RL but undermines generalization to domains that are not covered 💡Applying lightweight state randomization/augmentation helps! 📄 Paper: huggingface.co/papers/2601.18…

English

101

Nathan Lambert@natolambert·17 Şub

RL research in academia is shaping up to be much more exciting in 2026 than the last half of 2025. Here's why I see it as a healthier transition from a bit of benchmaxing to more interesting and robust research (dare I say RL generalizing across many domains?) 2025 was largely a setup year, where due to the simplicity of the environments, insignificant algorithmic changes could appear more valuable than they were. Today, substantial work is focusing on algorithms for tool-use, procedural environment generation, and richer data systems. I was noticing this with how there are many "environment" papers back in 2025, but thinking about what the difference is -- today they're more about generalization, diverse tasks, more complex environments. The first environments were something along the lines of "here's some existing tooling, lets adapt it to be a binary RL problem." Today, papers like endless terminals are setting up very interesting RL data pipelines. This is at least an interesting complement to the environments industry, where labs will buy 10-20 environments for millions of dollars and get benefit out of a few of them. Academics working on methods that create RL data and meaningful generalization (robust behaviors) will be way, way more fun to study and integrate than "another algorithm to increase AIME". Anyways, I'm happy to have figured this out, and if you see more papers understanding generalization across more environments or how clever design can make specialized small models with RL, tag me!

English

447

27.9K

Lin Guan@GuanSuns·9 Şub

Thanks to @ZhihanLiu21628 for leading this systematic study. Joint work with @EasonNie @KaiZhang_CS @zhaoran_wang @nazzhang 📄 Paper: huggingface.co/papers/2601.18…📷 Feedback & collaborations welcome

English

Lin Guan@GuanSuns·9 Şub

🤔 Motivation: Just like AlphaGo, given enough compute and proper configurations, RL agents can achieve superhuman performance in almost all training tasks ⚠️ However, LLM agents today are being used in tasks far beyond what limited post-training data can cover — a challenge highlighted by many researchers like @ilyasut, @MiniMax_AI (in platform.minimax.io/docs/guides/te…) and @Kimi_Moonshot in K2.5 tech report.

English

447

Lin Guan@GuanSuns·9 Şub

English

Lin Guan retweetledi

Zhaoran Wang@zhaoran_wang·7 Şub

People are putting lots of efforts into building diverse RL environments. But what kinds of environments are more useful for building **generalist** agents that are able to solve tasks beyond their training tasks? 🚀 Instead of optimizing one single benchmark, we look for drivers of transfer in our latest paper: huggingface.co/papers/2601.18… Joint work with MSL @MetaAI (@ZhihanLiu21628 @GuanSuns @EasonNie @KaiZhang_CS @nazzhang) where @ZhihanLiu21628 interned. (1/n)

English

112

7.5K

Lin Guan retweetledi

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z·21 Eki

📢 Yochanite Lin Guan (@GuanSuns; guansuns.github.io), currently a Research Scientist with @Meta GenAI, will be defending his @SCAI_ASU PhD dissertation tomorrow 10/21 👇. He developed several 🔥 techniques for taming the notorious sample complexity of #RL systems..

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media

English

7.3K

Lin Guan@GuanSuns·8 Eki

This is a joint work with @yfzhoucs @YantianZha @asurobot and @rao2z 👉Paper: openreview.net/forum?id=otKo4… 🎞️ Homepage: guansuns.github.io/pages/vlm-crit…

English

296

Lin Guan@GuanSuns·8 Eki

While LLMs cannot verify the correctness of agent plans, they can be good at capturing the "style" of desirable behaviors. Come stop by #COLM2024 poster #34 Wednesday Morning to see how we utilize VLMs as a knowledge source of common human preferences for embodied agents!

GIF

English

6.2K

Lin Guan@GuanSuns·16 Nis

This work is also part of an ongoing effort in our lab (supervised by @rao2z ) to identify constructive roles that LLMs/VLMs can serve in planning tasks. 👉 Relevant paper: arxiv.org/abs/2402.01817

English

155

Lin Guan@GuanSuns·16 Nis

💡Paper: arxiv.org/abs/2402.04210 👉Homepage (with visualizations): guansuns.github.io/pages/vlm-crit…

English

163

Lin Guan@GuanSuns·16 Nis

While LLMs cannot verify the correctness of agent behaviors, they can be good at capturing the "style" of desirable behaviors. We set out to find the effectiveness of VLM critics of undesirable behaviors.

GIF

English

5.2K

Lin Guan retweetledi

Yantian Zha@YantianZha·23 Şub

Robots now understand decisions via SERLfD – Self-Explanation for Reinforcement Learning from Demonstrations, transcending traditional robot learning from human demos. Check out our article and slides; find more insights at poster 409 during AAAI-24 on 2/23, 7-9 PM!🤖🌐 #AAAI2024

English

3.9K

Lin Guan retweetledi

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z·23 Şub

📢📢 Come stop by #AAAI2024 poster spots 644 & 645 (the very last spots) this evening to hear me explain the neat work of 👉@ZahraZ__ & @sailiks on explaining allocations to humans 👉@YantianZha & @GuanSuns on using self explanations to learn from ambiguous demonstrations [Note that at my request, the posters were placed in contiguous spots instead of their original spots 409/435 so I didn't have to master the art of being in two places at one time.. 😋]

English

13.2K

Lin Guan retweetledi

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z·27 Oca

📣Chalk it to mad masochism, but we will be updating and reprising our tutorial on LLMs and Planning at #AAAI2024 Vancouver, BC. (Wed Feb 21st, 2-6PM 👉#th20" target="_blank" rel="nofollow noopener">aaai.org/aaai-conferenc…. We are told that ~300 folks signed up already..😱). w/ @karthikv792 @GuanSuns

English

39.3K

Lin Guan@GuanSuns·12 Ara

📢Don't miss out our poster session tomorrow (Dec 12) starting at 5:15 p.m. CST in the Great Hall & Hall B1+B2 #1525 This is a joint work with @karthikv792 @sarath_ssreedh and @rao2z Paper homepage: guansuns.github.io/pages/llm-dm/ Presentation at NeurIPS 2023: neurips.cc/virtual/2023/p…

English

205

Lin Guan@GuanSuns·12 Ara

Interested in building #LLMAgent but tired of endlessly fixing flawed plans? 😫🤦‍♀️ Our #NeurIPS paper reveals LLMs aren’t designed for action sequencing! Instead, they can be useful for extracting planning knowledge and generating codified models that drive external planners!💡1/

English

1.9K

Lin Guan@GuanSuns·20 Tem

Having missed attending #ICLR2023 in-person, we are also re-presenting the paper at the #ICML2023 ILHF workshop on Saturday, July 29th. Feel free to drop by if you are on Waikiki ... 🏖️ Mahalo 🙏 2/

English

Lin Guan@GuanSuns·20 Tem

📢 Our #ICLR2023 paper on learning and leveraging relative behavioral attributes to tame the sample complexity of #RLHF (joint with @karthikv792 & @rao2z ) is now featured on @mtlaiethics ! 1/ 👉 montrealethics.ai/relative-behav…

English

1.1K

Keşfet

@natolambert @ZhihanLiu21628 @EasonNie @KaiZhang_CS @zhaoran_wang @nazzhang @ilyasut @MiniMax_AI