Maitrix.org

Murray Kang @ ICLR26@haoqik322

1

153

Maitrix.org retweetledi

Zhiting Hu@ZhitingHu·8h

🏆Honored to receive the Test of Time Award Honorable Mention #AISTATS2026 for our 2016 work Deep Kernel Learning, with the amazing @andrewgwils @rsalakhu @ericxing What a decade of AI progress! While GenAI is now driving massive real-world applications, the deepest underlying challenge remains: learning efficient representations of the world—for understanding, generation, predicting future worlds, and reasoning in the latent space. So much fun to think about for the next decade!⏳

English

8

82

7.3K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·22 Nis

Come check out LaDiR — our ICLR paper about latent diffusion for text reasoning. Instead of reasoning one token at a time in text space, LaDiR moves reasoning into continuous latent space and uses diffusion over blocks of thought tokens. That means LLMs can: -rethink whole reasoning paths -explore multiple solutions -and plan more flexibly We show these gains on math, code and planning tasks.

I’ll be at ICLR 2026 in Rio 🇧🇷 presenting our work: LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning 🗓️ Fri, Apr 24, 10:30 AM - 1:00 PM 📍 Poster Session 3, Pavilion 4, #4916 This work explores new a direction in latent reasoning with diffusion for advancing LLM capabilities. I’d be happy to connect—feel free to stop by the poster or reach out for a coffee chat ☕

English

2

5

18

2.9K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·23 Nis

Come and check out our ICLR work: Speculative Verdict (SV) for information-intensive visual reasoning. Inspired by speculative decoding, instead of drafting tokens, SV asks multiple small VLMs to draft diverse reasoning and localization paths, then uses a stronger model to produce the final verdict. The key insight is simple: no single reasoning path has to be perfect. Even when each path is only partly correct, combining the right pieces can still recover the correct answer — giving both better accuracy and lower cost.

Yuhan (Tina) Liu@l_yuhan7272

Heading to #ICLR2026 🇧🇷! I'll be presenting Speculative Verdict at the poster session on Apr 25, 10:30 AM–1:00 PM, Pavilion 4 #3507, happy to chat! 📄 Paper: arxiv.org/abs/2510.20812 💻 Code: github.com/Tinaliu0123/sp…

English

5

17

2.5K

Maitrix.org retweetledi

Shibo Hao@Ber18791531·14 Nis

🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench

Shibo Hao@Ber18791531

🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

English

2

35

78

10.3K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·1 Nis

That’s wild — and smart! 🤣 SimWorld coding agent self-improves by autonomously creating new tools and skills It realized BaGuaZhen(八卦阵) was too hard to build directly, so it created its own tools and skills. Starting from only primitive operations like spawn_actor() and delete_actor(), the agent does not just brute-force the task. It breaks the problem down and builds higher-level capabilities for itself.

A SimWorld coding agent can now create its own tools and skills on the fly. We challenged it with BaGuaZhen (八卦阵 Eight Trigrams), an ancient Chinese formation that is difficult to build from scratch because of its precise spatial structure and multi-step coordination. Instead of failing with brute force, the agent wrote reusable components for itself: Tools: Bagua Wall Segment, Bagua Trigram Line Skills: Bagua Wall Segment Skill, Bagua Trigram Line Skill Each tool is paired with a skill that teaches the model how to use it. Without skills: it fails. With self-built skills: it organizes the full structure. The exciting shift is this: agents are starting to generate capabilities, not just outputs.

English

12

48

7.5K

Maitrix.org retweetledi

SimWorld@simworld_ai·1 Nis

A SimWorld coding agent can now create its own tools and skills on the fly. We challenged it with BaGuaZhen (八卦阵 Eight Trigrams), an ancient Chinese formation that is difficult to build from scratch because of its precise spatial structure and multi-step coordination. Instead of failing with brute force, the agent wrote reusable components for itself: Tools: Bagua Wall Segment, Bagua Trigram Line Skills: Bagua Wall Segment Skill, Bagua Trigram Line Skill Each tool is paired with a skill that teaches the model how to use it. Without skills: it fails. With self-built skills: it organizes the full structure. The exciting shift is this: agents are starting to generate capabilities, not just outputs.

English

8

33

13.8K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·23 Mar

It’s fun to watch a coding agent reason through spatial construction, iterating through trying, failing, revising, and trying again. Really promising, though still a long way to go. It reminds me of a kid playing with LEGO for the first time, gradually turning trial and error into something creative, like a piece of art. Try SimWorld Studio to build your own physical world.

🌊🏝️🌉Coding agent performing spatial reasoning to construct complex scenes Powered by SimWorld Studio (link in the thread)

English

15

52

6.4K

Maitrix.org retweetledi

SimWorld@simworld_ai·23 Mar

🌊🏝️🌉Coding agent performing spatial reasoning to construct complex scenes Powered by SimWorld Studio (link in the thread)

English

2

10

26

9.4K

Maitrix.org@MaitrixOrg·13 Mar

Vibe coding for physical world🤯

🚨New Release: SimWorld Studio — Vibe Code the Physical World Today we open source SimWorld Studio, a coding-agent platform for building interactive physical worlds. Just chat with Claude Code to create environments, place assets, test physics, and edit everything live. Build worlds as easily as just writing prompt.

English

Murray Kang @ ICLR26@haoqik322

3

205

Maitrix.org retweetledi

SimWorld@simworld_ai·13 Mar

🚨New Release: SimWorld Studio — Vibe Code the Physical World Today we open source SimWorld Studio, a coding-agent platform for building interactive physical worlds. Just chat with Claude Code to create environments, place assets, test physics, and edit everything live. Build worlds as easily as just writing prompt.

English

5

23

85

24.7K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·4 Mar

🤖Coding agents like Claude Code are already game changers for digital tasks in 2026. But what if they could write code to build physical worlds? 🏙️ Imagine going from a single line of prompt → a controllable, interactive simulated world. Such environments could open new frontiers for game creation, RL training, large-scale world simulation, and studying complex social reasoning. Our SimWorld agent coding team is working toward releasing a platform that lets anyone build their own virtual worlds. Stay tuned.

What if coding agents could build entire virtual worlds? 🌍🏙️ SimWorld makes it possible — enabling agents like Claude Code 🤖 to generate and interact with scenes directly inside an Unreal Engine simulation 🎮 World simulation for embodied agents just became much easier and more accessible 🚀 Stay tuned — more models and capabilities coming soon ⚡️

English

15

49

10.4K

Maitrix.org retweetledi

SimWorld@simworld_ai·4 Mar

Claude Code can now build things in a simulated physical world!🤖🏙️ With SimWorld, coding agents can construct buildings, plan cities, or even create video games inside a realistic simulation on Unreal Engine. Just write a prompt, your agent will call tools, retrieve assets, plan scenes, and test physics autonomously. Demo platform coming soon so everyone can try it. Stay tuned. 🚀

English

12

42

311

26.2K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·27 Şub

Jixuan and the team demo’d 🦐OpenClaw agents living and operating in our SimWorld, launched in minutes. 🚀🤖 🔥Our mission: make embodied agent frameworks easy for anyone to run, observe, and customize in a realistic virtual world.

Jixuan Chen@chenjx210734

🚀Excited to share that we bridge the connection of Clawbot & Simworld! 🧩We are motivated to move beyond isolated toy tasks and into a shared physical world with routines, interactions, and coordination. 🚧Lightweight setup: plug in your own agent easily!

English

11

70

401

50.6K

Maitrix.org retweetledi

Jixuan Chen@chenjx210734·27 Şub

🚀Excited to share that we bridge the connection of Clawbot & Simworld! 🧩We are motivated to move beyond isolated toy tasks and into a shared physical world with routines, interactions, and coordination. 🚧Lightweight setup: plug in your own agent easily!

🤖Clawbots just moved into Embodied City inside SimWorld. They wake up. Go to work. Run errands. Talk to each other. All inside a shared physical world. This isn’t scripted — it’s autonomous agents living a daily routine. And you can spin up your own agent in minutes.

English

4

29

86

55.9K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·25 Şub

🤖Clawbots are now living and working in Embodied City🏙️ inside SimWorld. They follow daily routines, interact, collaborate — building a human-like society in a shared physical world. Setting up your own agent on SimWorld is surprisingly simple. Thank you for our amazing student team! @chenjx210734 @Lingjun_Mao @JiaweiRen02 @haoqik322 @koe_ye40329 @KunZhou23339193

🤖Clawbots just moved into Embodied City inside SimWorld. They wake up. Go to work. Run errands. Talk to each other. All inside a shared physical world. This isn’t scripted — it’s autonomous agents living a daily routine. And you can spin up your own agent in minutes.

English

Murray Kang @ ICLR26@haoqik322

7

34

6.1K

Maitrix.org retweetledi

SimWorld@simworld_ai·25 Şub

🤖Clawbots just moved into Embodied City inside SimWorld. They wake up. Go to work. Run errands. Talk to each other. All inside a shared physical world. This isn’t scripted — it’s autonomous agents living a daily routine. And you can spin up your own agent in minutes.

English

3

13

68

16.3K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·12 Şub

RL, like GRPO, boosts text reasoning in math and code, but often collapses diversity as entropy shrinks. 💡We introduce Diversity-Preserving RL via Latent Diffusion: • Latent-space reasoning • Diversity-guided diffusion • Double complementary policy optimization Optimize reward without collapsing solution modes. 🚀 +9.4% (code) | +5.7% (math) absolute pass@1 Breaking the base model’s pass@k ceiling. #RL #diffusion #LLMs

1/9 Softmax is the enemy of diversity in reward-maximization RL like GRPO. 📉 Recent analysis reveals: As RL boosts a "correct" token, Softmax automatically suppresses all others to maximize reward. This mechanism aggressively drives down entropy. This is Mode Elicitation: trading creativity for a local optimum. To fix this, we need to escape the discrete space. 🧵👇

English

5

33

241

18K

Maitrix.org retweetledi

SimWorld@simworld_ai·9 Şub

🚨 New from SimWorld: DeliveryBench A multimodal embodied benchmark where LLMs work as food-delivery agents in a living 3D city 🛵🍕 🤔Why it’s hard: • long-horizon planning with compounding consequences (profit, delays, ratings) • real-world constraints (traffic, weather, time) • multi-agent competition and collaboration 🧐What we observe: • risk-averse agents idle or detour, losing profit • aggressive agents rush deliveries and break constraints • agents compete for orders but fail to coordinate routes or timing DeliveryBench makes these dynamics measurable.💡💸

Lingjun Mao@Lingjun_Mao

🤖 Can an agent earn money by delivering food in a realistic 3D city? 🚚 We present 𝗗𝗲𝗹𝗶𝘃𝗲𝗿𝘆𝗕𝗲𝗻𝗰𝗵, a realistic embodied benchmark for long-horizon food delivery. ⚖️ To earn more, agents must make trade-offs under multiple, interacting constraints (e.g., deadlines, expenses, and battery levels). 😮 Surprisingly, even top models (e.g., Gemini-2.5-Pro, Claude-3.7-Sonnet) earn 𝗳𝗮𝗿 𝗹𝗲𝘀𝘀 per hour than humans. They still make basic mistakes, like packing hot meals together with ice cream. 👇 Project website + more details in the thread ...1/

English

7

16

5.3K

Maitrix.org retweetledi

Lianhui Qin@Lianhuiq·9 Şub

State-of-the-art LLMs vs. a simple human job: food delivery 🛵🍕 Our latest benchmark: DeliveryBench, built on SimWorld ❌ Claude-3.7 is overly conservative ⚠️ GPT-5 is overly aggressive (pushing deliveries on a near-empty battery) ✅ Gemini-2.5-Pro shows near-human strategies ❌ Gemini-2.5-Flash ignores commonsense constraints (hot food + ice cream 😬) ❌ Grok-4 wastes budget on unused tools Text-only benchmarks miss this gap: common sense + space + time + constraints + strategy. Only simulation reveals this. In multimodal, embodied worlds, agents must perceive, act, plan, and replan over time. Embodied AI is far from solved.