Kangrui Wang

44 posts

Kangrui Wang

@James_KKW

PhD @ Northwestern

Katılım Ekim 2022

51 Takip Edilen136 Takipçiler

Kangrui Wang retweetledi

Qineng Wang@qineng_wang·21 Nis

📢 Friendly reminder: the Foundation Models Meet Embodied Agents (FMEA) Workshop @ CVPR 2026 is hosting 4 embodied AI challenges, and submissions are OPEN! 💰 Each challenge offers $500 / $300 / $200 cash prizes for the top 3 teams. Whether you're working on VLMs, LLM planning, or VLA policies, there's a challenge waiting for you! 🧠 ENACT: Evaluating Embodied Cognition Do VLMs understand how the world evolves from egocentric interaction? Forward & inverse world modeling in a POMDP framework. Challenge page: enact-embodied-cognition.github.io/challenge/ Submission portal: eval.ai/web/challenges… Test set submission deadline: May 25, 2026 🏠 EmbodiedBench: Vision-Driven Embodied Agents Benchmark MLLMs on household task planning (EB-ALFRED) & spatial navigation (EB-Navigation). Open-source models <10B params. Challenge page: embodiedbench.github.io/challenge.html Submission portal: eval.ai/web/challenges… Test set submission deadline: May 25, 2026 🧩 Embodied Agent Interface: From Thought to Action Evaluate LLMs on Goal Interpretation, Subgoal Decomposition, Action Sequencing & Transition Modeling across BEHAVIOR & VirtualHome. Challenge page: eai-challenge-cvpr2026.github.io Submission portal: eval.ai/web/challenges… Test set submission deadline: May 25, 2026 🤖 RoboMME: Memory for Robotic Generalist Policies 16 manipulation tasks testing temporal, spatial, object & procedural memory. Can your model remember and act? Challenge page: robomme.github.io Submission portal: docs.google.com/forms/d/e/1FAI… Test set submission deadline: May 23, 2026 All rules, evaluation details, and starter kits can be found on each challenge's page. If you have any questions or need help getting started, join the Slack channel linked on each challenge page. That's our official communication channel where organizers are actively responding. There's still plenty of time to jump in. We'd love to see what the community builds. Looking forward to your submissions, and see you at CVPR! 🚀

English

11K

Kangrui Wang@James_KKW·14 Mar

@zhouzhibo2008 @wzenus 🤪

QME

andrew chou@zhouzhibo2008·13 Mar

@James_KKW @wzenus KR哥哥好强

日本語

Kangrui Wang@James_KKW·13 Mar

I’ve been applying RV filtering to the agentic tasks I’m working on and it boosts performance almost everywhere 🚀🚀 Surprisingly effective and highly recommend people give it a try.💪💪

Zihan "Zenus" Wang@wzenus

In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵

English

1.8K

Kangrui Wang retweetledi

Stanford AI Lab@StanfordAILab·10 Mar

Check out our latest SAIL blog post on VAGEN, a reinforcement learning framework that trains VLM agents to build internal world models through explicit visual state reasoning! ai.stanford.edu/blog/vagen/

English

15.5K

Kangrui Wang retweetledi

Shiqi Chen@shiqi_chen17·9 Mar

📍 Can LLMs discover, abstract, and reuse higher-level tool skills across tasks? Existing tool-use benchmarks test solving tasks with fixed tools. But real workflows contain recurring structures where efficiency comes from reusable tool compositions, not isolated calls. We introduce SkillCraft: 126 tasks across 6 domains designed to test whether LLM agents can acquire compositional skills, not just call atomic tools. We also propose Skill Mode, a lightweight protocol with four MCP primitives that let agents compose, verify, cache, and reuse tool chains at test time. Our Key findings across evaluating 8 SOTA models: ⚡Skill Mode enables agents to self-discover and reuse skills, leading to higher success and efficiency than agents without it. The gains are larger for stronger models. 🧠 Stronger models (e.g., Claude) discover more generalizable skills, which transfer across tasks and even across models. 🔍 Deeper composition ≠ better — shallow, well-tested skills generalize best. 🔗 Paper: arxiv.org/abs/2603.00718 💻 Code: github.com/shiqichen17/Sk… 🏠 Page: skillcraft-website.github.io/page (1/7)

English

200

71.2K

Kangrui Wang@James_KKW·25 Şub

Welcome submissions to our FMEA workshop @ CVPR 2026 on foundation models for embodied agents 🚀

Qineng Wang@qineng_wang

🚀 Announcing the 2nd Workshop on Foundation Models Meet Embodied Agents (FMEA) @ CVPR 2026! How can we leverage foundation models to help perceive, reason, plan, and act in the physical world? 👉 FMEA brings together researchers across vision, language, robotics, and ML to push the frontier of foundation models for embodied agents. 📣 Call for Papers is now open! We invite submissions on LLMs, VLMs, Video Action (VA), and Vision–Language–Action (VLA) models for embodied agents, including: - Long-horizon reasoning & planning - Spatial intelligence & physical understanding - World models, memory, and interaction - Vision–language–action learning and evaluation - Benchmarks, datasets, and evaluation protocols for embodied agents 🏆 Challenges @ FMEA 2026 🔹 ENACT — evaluating embodied cognition of VLMs with world modeling of egocentric interaction enact-embodied-cognition.github.io 🔹 EmbodiedBench — benchmarking VLM-based embodied agents across perception, reasoning, and action embodiedbench.github.io 🔹 Embodied Agent Interface (EAI) — evaluating LLM-based agents on goal interpretation, subgoal decomposition, action sequencing, and transition modeling embodied-agent-interface.github.io 📝 OpenReview submission portal (deadline: May 1st, 2026): openreview.net/group?id=thecv… 🌐 Workshop website: …models-meet-embodied-agents.github.io/cvpr2026/ 📍 Join us at CVPR 2026 — excited to see what you’ll build and submit!

English

187

Kangrui Wang retweetledi

Manling Li@ManlingLi_·16 Şub

📍Theory of Space (accepted at #ICLR2026) Theory of Mind → hidden mental states Theory of Space → hidden spatial beliefs from passive observers “What do I know?” to active explorers “What don’t I know, and how do I reduce that uncertainty?” Theory of Space is to evaluate if foundation models can actively construct, revise, and exploit internal spatial beliefs. We quantify Active-Passive Gap. Not just measure task accuracy, but how much uncertainty is reduced per step, and how many steps are needed in total for agents to build stable spatial beliefs. Exploration should prioritize information gain and reduce uncertainty per step. Instead, we observe LLMs/VLMs explore redundantly with stalled belief updates. Key findings: 1. Active agents perform worse than rule based programs 2. Cognitive Map Failures & Belief Drift (beliefs about previously observed objects degrades over time; new updates corrupt earlier correct perceptions) 3. Poor Vision Identification & Belief Inertia in Belief Revision Website: theory-of-space.github.io Code: github.com/mll-lab-nu/The… Data: huggingface.co/datasets/MLL-L… Theory of Space is a joint effort of @NorthwesternEng, @StanfordAILab, @uwcse, @Cornell_CS. Led by the amazing @WilliamZhangNU, jointly done with @zihanhuang66, @YueYuew8314, @JieyuZhang20, @XLe41402, @wzihanw, @qineng_wang, @keshigeyan, @RuohanZhang76, @YejinChoinka, @RanjayKrishna, @jiajunwu_cs, @drfeifei

English

496

52.5K

Kangrui Wang retweetledi

Manling Li@ManlingLi_·7 Ara

Today is the day! Cohosted with BEHAVIOR challenge, come and check the winning team solutions!

Manling Li@ManlingLi_

🔥Our #NeurIPS challenge on Foundation Models meet Embodied Agents released the final eval for “Embodied Agent Interface". 🚀Come test your LLMs for Embodied Agent tasks! ⚒️We've newly annotated ~5000 data points for: - Goal Interpretation - Subgoal Decomposition - Action Sequencing - Transition Modeling 🏅Multiple prizes: generously sponsored by @AIX_Foundation. 🙌Looking forward to talking with you at San Diego! Come chat with our organizers @jiajunwu_cs @drfeifei @YejinChoinka @percyliang @maojiayuan @Weiyu_Liu_ @RuohanZhang76 ErranLi, and huge thanks to Tianwei Bao @qineng_wang @James_KKW @yu_bryan_zhou for their incredible efforts!

English

10K

Kangrui Wang retweetledi

Manling Li@ManlingLi_·19 Kas

While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper: We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B" - 90% of tokens are visual, yet they get only ~10% of the attention!! also, there's classic "attention sink" but for vision tokens - Just giving vision more attention? → Doesn't help - Sharpening attention to excite the correct part works: seeing more → seeing more on "right" part - Adaptive sharpening (sharpen when model is confident, smooth when unsure): controlled_A on What's Up benchmark: 48.2 → 98.2% This simple intervention is not a robust method, but more a close look to the attention behavior of VLMs in Spatial Reasoning from a mechanism interpretability lens (1/n): 1. What causes these failures? 2. How do these failures manifest through internal patterns? 3. Can we mitigate these errors by leveraging the identified signals Work done with @shiqi_chen17 @tongyao_zhu @ruochenz1018 @jinghan23 @jcniebles @megamor2 @junxian_he @jiajunwu_cs Siyang Gao.

Shiqi Chen@shiqi_chen17

🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper: arxiv.org/pdf/2503.01773 Code: github.com/shiqichen17/Ad… Website: shiqichen17.github.io/AdaptVis/

English

573

75.5K

Kangrui Wang retweetledi

Manling Li@ManlingLi_·17 Kas

Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric and spatial failures. Then I have learned so much about 3D and physical understanding from @jiajunwu_cs @drfeifei ever since. Now in 2025, spatial intelligence has become one of the center topics for VLM and multimodal community! MLL lab collected an Github Repo of the latest advances in Spatial Intelligence for VLMs. This is a pure community effort. If you are working on spatial intelligence in VLMs, please feel free to contribute! 🙌 Let us build this resource together! #SpatialIntelligence #VLM #MultimodalAI #AI

Fei-Fei Li@drfeifei

AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on building and using world models to unlock spatial intelligence in this essay below. 1/n

English

127

677

121.7K

Kangrui Wang@James_KKW·25 Eki

Fantastic work on self-play agents and world models, excited to be part of it!

Shiqi Chen@shiqi_chen17

Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing with the env to simply learn about “what if I do this?” Below, we show our findings on: What is wrong with OOD environments? What are the key factors that allow self-play to succeed? (1/8)

English

211

Kangrui Wang@James_KKW·18 Eki

Happy to contribute to the work😎😎

Manling Li@ManlingLi_

World Model Reasoning for VLM Agents (NeurIPS 2025, Score 5544) We release VAGEN to teach VLMs to build internal world models via visual state reasoning: - StateEstimation: what is the current state? - TransitionModeling: what is next? MDP → POMDP shift to handle the partial observability from visual states! mll.lab.northwestern.edu/VAGEN/ 🙌Led by @James_KKW @WilliamZhangNU @wzihanw @yaning_gao @LINJIEFUN @qineng_wang @hc81Jeremy @w4nanch1 @2prime_PKU @zhengyuan_yang lijuanwang,@RanjayKrishna @jiajunwu_cs @drfeifei @YejinChoinka 👍Grateful for the joint effort of @northwesterncs @uwcse @StanfordAILab @microsoft @WisconsinCS @siebelschool.

English

5.7K

Kangrui Wang@James_KKW·1 Tem

Special thanks to @qineng_wang for leading this awesome project!

English

100

Kangrui Wang@James_KKW·1 Tem

How do AI models 'think' about space? Our latest research dives into the mental models that VLMs use to tackle spatial problems 🤖🧩

Manling Li@ManlingLi_

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐mll-lab-nu.github.io/mind-cube/ 📰arxiv.org/pdf/2506.21458 🤗huggingface.co/datasets/MLL-L… 👩‍💻github.com/mll-lab-nu/Min…

English

485

Kangrui Wang retweetledi

Manling Li@ManlingLi_·30 Haz

English

282

41.1K

Kangrui Wang retweetledi

Zihan "Zenus" Wang@wzenus·25 Haz

Finally available on arxiv! arxiv.org/abs/2506.18945

Zihan "Zenus" Wang@wzenus

🚀 Introducing Chain-of-Experts (CoE), A Free-lunch optimization method for DeepSeek-like MoE models! within $200, we explore to train MoEs that enables 17.6-42% efficiency boost in memory! Code: github.com/ZihanWang314/c… Blog: notion.so/Chain-of-Exper… 博客：notion.so/Chain-of-Exper… 1/7🧵

English

107

11.8K

Kangrui Wang@James_KKW·25 Haz

Special thanks to @Sanjana for the great leadership!

English

Kangrui Wang@James_KKW·25 Haz

Excited to be part of this Best Paper Award-winning project! See our demos of natural language-controlled Franka robot stacking cube pyramids!

Sanjana Srivastava@sanjana__z

🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️

English

265

Kangrui Wang retweetledi

Manling Li@ManlingLi_·26 Nis

We are very excited announcing our MLL lab! We are looking for collaborators on RAGEN, VAGEN, Chain-of-experts, T*, LongVideoHaystack, foundation models for embodied agents, etc mll-lab-nu.github.io

English

338

59.5K

Kangrui Wang retweetledi

Zihan "Zenus" Wang@wzenus·23 Nis

Why does your RL training always collapse? In our new paper of RAGEN, we explore what breaks when you train LLM *Agents* with multi-turn reinforcement learning—and possibly how to fix it. 📄 github.com/RAGEN-AI/RAGEN… 🌐 ragen-ai.github.io 1/🧵👇

English

436

100.6K

Keşfet

@zhouzhibo2008 @wzenus @NorthwesternEng @StanfordAILab @uwcse @Cornell_CS @WilliamZhangNU @zihanhuang66