Prior @ AI2 (@Ai2Prior) - โปรไฟล์ Twitter

Prior @ AI2 รีทวีตแล้ว

Introducing WildDet3D, a grounding model for monocular 3D object detection in the wild. A question I keep coming back to is: what is the right backbone for robotics foundation models? Should it be a video model, a language model, or perhaps a grounding model? WildDet3D is our first step in exploring that direction.

Ai2@allen_ai

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

English

3

18

97

11.7K

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·6d

We have fully open-sourced all of our code, and you can now test our VLA on CALVIN and LIBERO-Pro.: github.com/Vision-Languag… Have fun building and steering your pre-trained policy.

Jiafei Duan@DJiafei

Why do generalist robotic models fail when a cup is moved just two inches to the left? It’s not a lack of motor skill, it’s an alignment problem. Today, we introduce VLS: Vision-Language Steering of Pretrained Robot Policies, a training-free framework that guides robot behavior in real time. Check out the project: vision-language-steering.github.io/webpage/ 👇🧵 (Watch till the end: VLS runs uncut, steering pretrained policies across long-horizon tasks.)

English

2

15

98

10K

Prior @ AI2 รีทวีตแล้ว

MichiganAI@michigan_AI·6d

📢 Upcoming #Robotics Seminar! Excited for Jiafei Duan @DJiafei's talk on "Building Robotics Foundation Models with Reasoning in the Loop"⤵️ 🗓 APRIL 7 @ 10:30 am 🔗cse.engin.umich.edu/event/building…

English

0

2

16

1.9K

Prior @ AI2 รีทวีตแล้ว

Haoquan Fang@hq_fang·30 Mar

I’m excited to share that I’ve decided to join @Stanford @StanfordSVL as a CS PhD student, advised by @drfeifei! I feel very fortunate for all the opportunities I’ve had so far, and I’m genuinely thrilled for this next chapter. I’m eager to dive deeper into robot learning in such an inspiring environment, and to continue developing as a researcher alongside people I deeply admire. I want to sincerely thank @RanjayKrishna, Ali Farhadi, @JenqH, @DJiafei, and everyone who has guided, encouraged, and believed in me along the way. I’m also especially grateful to @uwcse and @allen_ai for providing such a wonderful community and so many meaningful opportunities. I also truly appreciate the time and support from @drfeifei, @jiajunwu_cs, @RuohanZhang76, @ManlingLi_, @wenlong_huang, @YunfanJiang, @wensi_ai, and many others throughout both my application and decision process. I’m really looking forward to learning from and working with you all at Stanford! Stay tuned for more exciting updates!

English

31

10

396

31.3K

Prior @ AI2 รีทวีตแล้ว

Ai2@allen_ai·24 Mar

Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf. Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵

English

21

115

804

128.1K

Prior @ AI2 รีทวีตแล้ว

Yunzhu Li@YunzhuLiYZ·6 Mar

Wow, love the results here! Also a really exciting direction for general reward modeling. The fact that things can be plug-and-play here could unlock a lot of potential for using our Interactive World Simulator for RL and policy learning.

Jiafei Duan@DJiafei

Really cool interactive simulator! Love that you can also added a reward model to it , excited to see what we can build with this.

English

1

6

35

6.1K

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·6 Mar

Really cool interactive simulator! Love that you can also added a reward model to it , excited to see what we can build with this.

Yixuan Wang@YXWangBot

1/ World models are getting popular in robotics 🤖✨ But there’s a big problem: most are slow and break physical consistency over long horizons. 2/ Today we’re releasing Interactive World Simulator: An action-conditioned world model that supports stable long-horizon interaction. 3/ Key result: ✅ 10+ minutes of interactive prediction ✅ 15 FPS ✅ on a single RTX 4090🔥 4/ Why this matters: it unlocks two critical robotics applications: 🚀 Scalable data generation for policy training 🧪 Faithful policy evaluation 5/ You can play with our world model NOW at #interactive-demo" target="_blank" rel="nofollow noopener">yixuanwang.me/interactive_wo…. NO git clone, NO pip install, NO python. Just click and play! NOTE ⚠️ ALL videos here are generated purely by our model in pixel space! They are **NOT** from a real camera More details coming 👇 (1/9) #Robotics #AI #MachineLearning #WorldModels #RobotLearning #ImitationLearning

English

2

8

47

14.9K

Prior @ AI2 รีทวีตแล้ว

pfung@philfung·4 Mar

Inspired by the TopReward paper, I made a lil web tool to test these robot manipulation rewards on your own videos. Try: philfung.github.io/rewardscope Record yourself folding a towel, upload it, and compare: 1. TopReward (this paper) 2. GVL (Deepmind) 3. Brute Force (i.e. at each frame, ask LLM to reply with a probability) TopReward (Qwen3VL-8B) holds its own surprisingly well against the others, even if those use ChatGPT! Great work @DJiafei, UW, AllenAI, thanks for pushing @VilleKuosmanen.

pfung@philfung

I read this paper and its awesome - it creates a high-performing, smooth reward function (far superior to GVL) that is SUPER simple to implement with an LLM. IMPLEMENTATION: 1. SELECT A MODEL: Pick an open-weight, multimedia LLM (ie Qwen3-VL). 2. PROMPT THE MODEL: Send the LLM the following prompt: "The above video shows a robot manipulation trajectory that completes the following task: {INSTRUCTION}. Decide whether the above statement is True or not. The answer is: " [where INSTRUCTION is any task like "fold the towel" or "pour coffee into the cup"] 3. EXTRACT THE REWARD: Find the *logit probability* for the specific token "True" and use that as your reward signal. [The logit probability is the raw, unnormalized score assigned by the model to the "True" token before it passes through the softmax layer. This logit prob is available for open-source models and some closed-source models - for example, ChatGPT exposes log probs, whereas Claude does not] That's it!! Obviously the logit prob and using the term "True" are key insights. It is quite elegant. Congrats to the brilliant authors at @UW and @allen_ai !

Burlingame, CA 🇺🇸 English

8

22

152

31.5K

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·4 Mar

Really excited to see the community starting to build on our TOPReward. This is a really awesome interface that democratizes reward model for robotics and more! Check it out and try it out! philfung.github.io/rewardscope/

pfung@philfung

Inspired by the TopReward paper, I made a lil web tool to test these robot manipulation rewards on your own videos. Try: philfung.github.io/rewardscope Record yourself folding a towel, upload it, and compare: 1. TopReward (this paper) 2. GVL (Deepmind) 3. Brute Force (i.e. at each frame, ask LLM to reply with a probability) TopReward (Qwen3VL-8B) holds its own surprisingly well against the others, even if those use ChatGPT! Great work @DJiafei, UW, AllenAI, thanks for pushing @VilleKuosmanen.

English

0

3

24

3.2K

Prior @ AI2 รีทวีตแล้ว

Stone Tao@Stone_Tao·4 Mar

good work is easily exemplified by how many people try to reproduce it and are willing to share it online, been seeing a lot of reproductions of this!

Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English

2

5

43

5.4K

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·4 Mar

One of the most unique aspects of TOPReward is that it requires no fine-tuning on task-specific data, no in-context prompting, and no reward training on custom datasets. Because of this, the method can naturally generalize beyond robotics, enabling it to function as a universal reward function across domains. For example, when applied to a non-robotics scenario, it can still detect subtle failures, recognizing that while the person successfully landed the bike jump, the landing was not executed gracefully (as seen from the reward value dropping).

Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English

2

3

44

4.4K

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·3 Mar

After a short wait, and with several teams already reproducing our results and reporting strong performance, we are excited to officially release the complete codebase for TOPReward.: github.com/TOPReward/TOPR… Have fun building on it!

Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English

2

8

69

8.9K

Prior @ AI2 รีทวีตแล้ว

pfung@philfung·2 Mar

I read this paper and its awesome - it creates a high-performing, smooth reward function (far superior to GVL) that is SUPER simple to implement with an LLM. IMPLEMENTATION: 1. SELECT A MODEL: Pick an open-weight, multimedia LLM (ie Qwen3-VL). 2. PROMPT THE MODEL: Send the LLM the following prompt: "The above video shows a robot manipulation trajectory that completes the following task: {INSTRUCTION}. Decide whether the above statement is True or not. The answer is: " [where INSTRUCTION is any task like "fold the towel" or "pour coffee into the cup"] 3. EXTRACT THE REWARD: Find the *logit probability* for the specific token "True" and use that as your reward signal. [The logit probability is the raw, unnormalized score assigned by the model to the "True" token before it passes through the softmax layer. This logit prob is available for open-source models and some closed-source models - for example, ChatGPT exposes log probs, whereas Claude does not] That's it!! Obviously the logit prob and using the term "True" are key insights. It is quite elegant. Congrats to the brilliant authors at @UW and @allen_ai !

Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

San Francisco, CA 🇺🇸 English

7

25

220

39.4K

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·28 Şub

Beyond robotic manipulation, TOPReward can also be extended with a stage-aware setup to estimate long-horizon temporal rewards directly from human videos (this example video comes from EgoDex) .

Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English

3

7

65

9.6K

Prior @ AI2@Ai2Prior·27 Şub

Check out our new reward model approach simple yet elegant.

Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English

0

4

357

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·26 Şub

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English

12

66

363

106.9K

Prior @ AI2 รีทวีตแล้ว

AK@_akhaliq·24 Şub

TOPReward Token Probabilities as Hidden Zero-Shot Rewards for Robotics huggingface.co/papers/2602.19…

English

2

6

48

11.8K

Prior @ AI2 รีทวีตแล้ว

Ai2@allen_ai·11 Şub

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English

10

100

714

95.7K

Prior @ AI2 รีทวีตแล้ว

Jiafei Duan@DJiafei·11 Şub

iTHOR from @allen_ai was one of the main reasons I got into Embodied AI and maybe even UW. It’s incredibly exciting to see the journey from iTHOR in Unity to a large-scale, fully open robotics platform—now supporting MuJoCo, ManiSkill, and Isaac. Grateful to contribute to this project! 👇This room brings back so many memories, hopefully it can create more memories since it is in MuJoCo, Isaac, and ManiSkill now!

Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English

0

2

32

4K

Prior @ AI2@Ai2Prior·11 Şub

Our iconic iThor has been revamped for robotics, available now on MuJoCo, ManiSkill and Isaac!

English

0

3

81

Prior @ AI2

ค้นพบ