Kangrui Wang

44 posts

Kangrui Wang

Kangrui Wang

@James_KKW

PhD @ Northwestern

Katılım Ekim 2022
51 Takip Edilen136 Takipçiler
Kangrui Wang retweetledi
Qineng Wang
Qineng Wang@qineng_wang·
📢 Friendly reminder: the Foundation Models Meet Embodied Agents (FMEA) Workshop @ CVPR 2026 is hosting 4 embodied AI challenges, and submissions are OPEN! 💰 Each challenge offers $500 / $300 / $200 cash prizes for the top 3 teams. Whether you're working on VLMs, LLM planning, or VLA policies, there's a challenge waiting for you! 🧠 ENACT: Evaluating Embodied Cognition Do VLMs understand how the world evolves from egocentric interaction? Forward & inverse world modeling in a POMDP framework. Challenge page: enact-embodied-cognition.github.io/challenge/ Submission portal: eval.ai/web/challenges… Test set submission deadline: May 25, 2026 🏠 EmbodiedBench: Vision-Driven Embodied Agents Benchmark MLLMs on household task planning (EB-ALFRED) & spatial navigation (EB-Navigation). Open-source models <10B params. Challenge page: embodiedbench.github.io/challenge.html Submission portal: eval.ai/web/challenges… Test set submission deadline: May 25, 2026 🧩 Embodied Agent Interface: From Thought to Action Evaluate LLMs on Goal Interpretation, Subgoal Decomposition, Action Sequencing & Transition Modeling across BEHAVIOR & VirtualHome. Challenge page: eai-challenge-cvpr2026.github.io Submission portal: eval.ai/web/challenges… Test set submission deadline: May 25, 2026 🤖 RoboMME: Memory for Robotic Generalist Policies 16 manipulation tasks testing temporal, spatial, object & procedural memory. Can your model remember and act? Challenge page: robomme.github.io Submission portal: docs.google.com/forms/d/e/1FAI… Test set submission deadline: May 23, 2026 All rules, evaluation details, and starter kits can be found on each challenge's page. If you have any questions or need help getting started, join the Slack channel linked on each challenge page. That's our official communication channel where organizers are actively responding. There's still plenty of time to jump in. We'd love to see what the community builds. Looking forward to your submissions, and see you at CVPR! 🚀
Qineng Wang tweet media
English
0
10
27
11K
Kangrui Wang retweetledi
Stanford AI Lab
Stanford AI Lab@StanfordAILab·
Check out our latest SAIL blog post on VAGEN, a reinforcement learning framework that trains VLM agents to build internal world models through explicit visual state reasoning! ai.stanford.edu/blog/vagen/
English
4
15
91
15.5K
Kangrui Wang retweetledi
Shiqi Chen
Shiqi Chen@shiqi_chen17·
📍 Can LLMs discover, abstract, and reuse higher-level tool skills across tasks? Existing tool-use benchmarks test solving tasks with fixed tools. But real workflows contain recurring structures where efficiency comes from reusable tool compositions, not isolated calls. We introduce SkillCraft: 126 tasks across 6 domains designed to test whether LLM agents can acquire compositional skills, not just call atomic tools. We also propose Skill Mode, a lightweight protocol with four MCP primitives that let agents compose, verify, cache, and reuse tool chains at test time. Our Key findings across evaluating 8 SOTA models: ⚡Skill Mode enables agents to self-discover and reuse skills, leading to higher success and efficiency than agents without it. The gains are larger for stronger models. 🧠 Stronger models (e.g., Claude) discover more generalizable skills, which transfer across tasks and even across models. 🔍 Deeper composition ≠ better — shallow, well-tested skills generalize best. 🔗 Paper: arxiv.org/abs/2603.00718 💻 Code: github.com/shiqichen17/Sk… 🏠 Page: skillcraft-website.github.io/page (1/7)
English
9
39
200
71.2K
Kangrui Wang
Kangrui Wang@James_KKW·
Welcome submissions to our FMEA workshop @ CVPR 2026 on foundation models for embodied agents 🚀
Qineng Wang@qineng_wang

🚀 Announcing the 2nd Workshop on Foundation Models Meet Embodied Agents (FMEA) @ CVPR 2026! How can we leverage foundation models to help perceive, reason, plan, and act in the physical world? 👉 FMEA brings together researchers across vision, language, robotics, and ML to push the frontier of foundation models for embodied agents. 📣 Call for Papers is now open! We invite submissions on LLMs, VLMs, Video Action (VA), and Vision–Language–Action (VLA) models for embodied agents, including: - Long-horizon reasoning & planning - Spatial intelligence & physical understanding - World models, memory, and interaction - Vision–language–action learning and evaluation - Benchmarks, datasets, and evaluation protocols for embodied agents 🏆 Challenges @ FMEA 2026 🔹 ENACT — evaluating embodied cognition of VLMs with world modeling of egocentric interaction enact-embodied-cognition.github.io 🔹 EmbodiedBench — benchmarking VLM-based embodied agents across perception, reasoning, and action embodiedbench.github.io 🔹 Embodied Agent Interface (EAI) — evaluating LLM-based agents on goal interpretation, subgoal decomposition, action sequencing, and transition modeling embodied-agent-interface.github.io 📝 OpenReview submission portal (deadline: May 1st, 2026): openreview.net/group?id=thecv… 🌐 Workshop website: …models-meet-embodied-agents.github.io/cvpr2026/ 📍 Join us at CVPR 2026 — excited to see what you’ll build and submit!

English
0
1
3
187
Kangrui Wang retweetledi
Manling Li
Manling Li@ManlingLi_·
📍Theory of Space (accepted at #ICLR2026) Theory of Mind → hidden mental states Theory of Space → hidden spatial beliefs from passive observers “What do I know?” to active explorers “What don’t I know, and how do I reduce that uncertainty?” Theory of Space is to evaluate if foundation models can actively construct, revise, and exploit internal spatial beliefs. We quantify Active-Passive Gap. Not just measure task accuracy, but how much uncertainty is reduced per step, and how many steps are needed in total for agents to build stable spatial beliefs. Exploration should prioritize information gain and reduce uncertainty per step. Instead, we observe LLMs/VLMs explore redundantly with stalled belief updates. Key findings: 1. Active agents perform worse than rule based programs 2. Cognitive Map Failures & Belief Drift (beliefs about previously observed objects degrades over time; new updates corrupt earlier correct perceptions) 3. Poor Vision Identification & Belief Inertia in Belief Revision Website: theory-of-space.github.io Code: github.com/mll-lab-nu/The… Data: huggingface.co/datasets/MLL-L… Theory of Space is a joint effort of @NorthwesternEng, @StanfordAILab, @uwcse, @Cornell_CS. Led by the amazing @WilliamZhangNU, jointly done with @zihanhuang66, @YueYuew8314, @JieyuZhang20, @XLe41402, @wzihanw, @qineng_wang, @keshigeyan, @RuohanZhang76, @YejinChoinka, @RanjayKrishna, @jiajunwu_cs, @drfeifei
English
7
94
496
52.5K
Kangrui Wang retweetledi
Manling Li
Manling Li@ManlingLi_·
Today is the day! Cohosted with BEHAVIOR challenge, come and check the winning team solutions!
Manling Li@ManlingLi_

🔥Our #NeurIPS challenge on Foundation Models meet Embodied Agents released the final eval for “Embodied Agent Interface". 🚀Come test your LLMs for Embodied Agent tasks! ⚒️We've newly annotated ~5000 data points for: - Goal Interpretation - Subgoal Decomposition - Action Sequencing - Transition Modeling 🏅Multiple prizes: generously sponsored by @AIX_Foundation. 🙌Looking forward to talking with you at San Diego! Come chat with our organizers @jiajunwu_cs @drfeifei @YejinChoinka @percyliang @maojiayuan @Weiyu_Liu_ @RuohanZhang76 ErranLi, and huge thanks to Tianwei Bao @qineng_wang @James_KKW @yu_bryan_zhou for their incredible efforts!

English
1
12
59
10K
Kangrui Wang retweetledi
Manling Li
Manling Li@ManlingLi_·
While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper: We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B" - 90% of tokens are visual, yet they get only ~10% of the attention!! also, there's classic "attention sink" but for vision tokens - Just giving vision more attention? → Doesn't help - Sharpening attention to excite the correct part works: seeing more → seeing more on "right" part - Adaptive sharpening (sharpen when model is confident, smooth when unsure): controlled_A on What's Up benchmark: 48.2 → 98.2% This simple intervention is not a robust method, but more a close look to the attention behavior of VLMs in Spatial Reasoning from a mechanism interpretability lens (1/n): 1. What causes these failures? 2. How do these failures manifest through internal patterns? 3. Can we mitigate these errors by leveraging the identified signals Work done with @shiqi_chen17 @tongyao_zhu @ruochenz1018 @jinghan23 @jcniebles @megamor2 @junxian_he @jiajunwu_cs Siyang Gao.
Manling Li tweet media
Shiqi Chen@shiqi_chen17

🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper: arxiv.org/pdf/2503.01773 Code: github.com/shiqichen17/Ad… Website: shiqichen17.github.io/AdaptVis/

English
10
87
573
75.5K
Kangrui Wang retweetledi
Manling Li
Manling Li@ManlingLi_·
Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric and spatial failures. Then I have learned so much about 3D and physical understanding from @jiajunwu_cs @drfeifei ever since. Now in 2025, spatial intelligence has become one of the center topics for VLM and multimodal community! MLL lab collected an Github Repo of the latest advances in Spatial Intelligence for VLMs. This is a pure community effort. If you are working on spatial intelligence in VLMs, please feel free to contribute! 🙌 Let us build this resource together! #SpatialIntelligence #VLM #MultimodalAI #AI
Manling Li tweet media
Fei-Fei Li@drfeifei

AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on building and using world models to unlock spatial intelligence in this essay below. 1/n

English
14
127
677
121.7K
Kangrui Wang retweetledi
Manling Li
Manling Li@ManlingLi_·
We are very excited announcing our MLL lab! We are looking for collaborators on RAGEN, VAGEN, Chain-of-experts, T*, LongVideoHaystack, foundation models for embodied agents, etc mll-lab-nu.github.io
Manling Li tweet media
English
7
53
338
59.5K
Kangrui Wang retweetledi
Zihan "Zenus" Wang
Zihan "Zenus" Wang@wzenus·
Why does your RL training always collapse? In our new paper of RAGEN, we explore what breaks when you train LLM *Agents* with multi-turn reinforcement learning—and possibly how to fix it. 📄 github.com/RAGEN-AI/RAGEN… 🌐 ragen-ai.github.io 1/🧵👇
Zihan "Zenus" Wang tweet media
English
8
86
436
100.6K