Prior @ AI2

86 posts

Prior @ AI2 banner
Prior @ AI2

Prior @ AI2

@Ai2Prior

Tackling the boldest computer vision problems @allen_ai

Seattle, WA เข้าร่วม Kasım 2024
29 กำลังติดตาม239 ผู้ติดตาม
Prior @ AI2 รีทวีตแล้ว
Jiafei Duan
Jiafei Duan@DJiafei·
Introducing WildDet3D, a grounding model for monocular 3D object detection in the wild. A question I keep coming back to is: what is the right backbone for robotics foundation models? Should it be a video model, a language model, or perhaps a grounding model? WildDet3D is our first step in exploring that direction.
Ai2@allen_ai

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

English
3
18
97
11.7K
Prior @ AI2 รีทวีตแล้ว
Jiafei Duan
Jiafei Duan@DJiafei·
We have fully open-sourced all of our code, and you can now test our VLA on CALVIN and LIBERO-Pro.: github.com/Vision-Languag… Have fun building and steering your pre-trained policy.
Jiafei Duan@DJiafei

Why do generalist robotic models fail when a cup is moved just two inches to the left? It’s not a lack of motor skill, it’s an alignment problem. Today, we introduce VLS: Vision-Language Steering of Pretrained Robot Policies, a training-free framework that guides robot behavior in real time. Check out the project: vision-language-steering.github.io/webpage/ 👇🧵 (Watch till the end: VLS runs uncut, steering pretrained policies across long-horizon tasks.)

English
2
15
98
10K
Prior @ AI2 รีทวีตแล้ว
Haoquan Fang
Haoquan Fang@hq_fang·
I’m excited to share that I’ve decided to join @Stanford @StanfordSVL as a CS PhD student, advised by @drfeifei! I feel very fortunate for all the opportunities I’ve had so far, and I’m genuinely thrilled for this next chapter. I’m eager to dive deeper into robot learning in such an inspiring environment, and to continue developing as a researcher alongside people I deeply admire. I want to sincerely thank @RanjayKrishna, Ali Farhadi, @JenqH, @DJiafei, and everyone who has guided, encouraged, and believed in me along the way. I’m also especially grateful to @uwcse and @allen_ai for providing such a wonderful community and so many meaningful opportunities. I also truly appreciate the time and support from @drfeifei, @jiajunwu_cs, @RuohanZhang76, @ManlingLi_, @wenlong_huang, @YunfanJiang, @wensi_ai, and many others throughout both my application and decision process. I’m really looking forward to learning from and working with you all at Stanford! Stay tuned for more exciting updates!
English
31
10
396
31.3K
Prior @ AI2 รีทวีตแล้ว
Ai2
Ai2@allen_ai·
Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf. Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵
Ai2 tweet media
English
21
115
804
128.1K
Prior @ AI2 รีทวีตแล้ว
Jiafei Duan
Jiafei Duan@DJiafei·
Really cool interactive simulator! Love that you can also added a reward model to it , excited to see what we can build with this.
Yixuan Wang@YXWangBot

1/ World models are getting popular in robotics 🤖✨ But there’s a big problem: most are slow and break physical consistency over long horizons. 2/ Today we’re releasing Interactive World Simulator: An action-conditioned world model that supports stable long-horizon interaction. 3/ Key result: ✅ 10+ minutes of interactive prediction ✅ 15 FPS ✅ on a single RTX 4090🔥 4/ Why this matters: it unlocks two critical robotics applications: 🚀 Scalable data generation for policy training 🧪 Faithful policy evaluation 5/ You can play with our world model NOW at #interactive-demo" target="_blank" rel="nofollow noopener">yixuanwang.me/interactive_wo…. NO git clone, NO pip install, NO python. Just click and play! NOTE ⚠️ ALL videos here are generated purely by our model in pixel space! They are **NOT** from a real camera More details coming 👇 (1/9) #Robotics #AI #MachineLearning #WorldModels #RobotLearning #ImitationLearning

English
2
8
47
14.9K
Prior @ AI2 รีทวีตแล้ว
pfung
pfung@philfung·
Inspired by the TopReward paper, I made a lil web tool to test these robot manipulation rewards on your own videos. Try: philfung.github.io/rewardscope Record yourself folding a towel, upload it, and compare: 1. TopReward (this paper) 2. GVL (Deepmind) 3. Brute Force (i.e. at each frame, ask LLM to reply with a probability) TopReward (Qwen3VL-8B) holds its own surprisingly well against the others, even if those use ChatGPT! Great work @DJiafei, UW, AllenAI, thanks for pushing @VilleKuosmanen.
pfung@philfung

I read this paper and its awesome - it creates a high-performing, smooth reward function (far superior to GVL) that is SUPER simple to implement with an LLM. IMPLEMENTATION: 1. SELECT A MODEL: Pick an open-weight, multimedia LLM (ie Qwen3-VL). 2. PROMPT THE MODEL: Send the LLM the following prompt: "The above video shows a robot manipulation trajectory that completes the following task: {INSTRUCTION}. Decide whether the above statement is True or not. The answer is: " [where INSTRUCTION is any task like "fold the towel" or "pour coffee into the cup"] 3. EXTRACT THE REWARD: Find the *logit probability* for the specific token "True" and use that as your reward signal. [The logit probability is the raw, unnormalized score assigned by the model to the "True" token before it passes through the softmax layer. This logit prob is available for open-source models and some closed-source models - for example, ChatGPT exposes log probs, whereas Claude does not] That's it!! Obviously the logit prob and using the term "True" are key insights. It is quite elegant. Congrats to the brilliant authors at @UW and @allen_ai !

Burlingame, CA 🇺🇸 English
8
22
152
31.5K
Prior @ AI2 รีทวีตแล้ว
Jiafei Duan
Jiafei Duan@DJiafei·
Really excited to see the community starting to build on our TOPReward. This is a really awesome interface that democratizes reward model for robotics and more! Check it out and try it out! philfung.github.io/rewardscope/
pfung@philfung

Inspired by the TopReward paper, I made a lil web tool to test these robot manipulation rewards on your own videos. Try: philfung.github.io/rewardscope Record yourself folding a towel, upload it, and compare: 1. TopReward (this paper) 2. GVL (Deepmind) 3. Brute Force (i.e. at each frame, ask LLM to reply with a probability) TopReward (Qwen3VL-8B) holds its own surprisingly well against the others, even if those use ChatGPT! Great work @DJiafei, UW, AllenAI, thanks for pushing @VilleKuosmanen.

English
0
3
24
3.2K
Prior @ AI2 รีทวีตแล้ว
Prior @ AI2 รีทวีตแล้ว
Jiafei Duan
Jiafei Duan@DJiafei·
One of the most unique aspects of TOPReward is that it requires no fine-tuning on task-specific data, no in-context prompting, and no reward training on custom datasets. Because of this, the method can naturally generalize beyond robotics, enabling it to function as a universal reward function across domains. For example, when applied to a non-robotics scenario, it can still detect subtle failures, recognizing that while the person successfully landed the bike jump, the landing was not executed gracefully (as seen from the reward value dropping).
Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

English
2
3
44
4.4K
Prior @ AI2 รีทวีตแล้ว
Prior @ AI2 รีทวีตแล้ว
pfung
pfung@philfung·
I read this paper and its awesome - it creates a high-performing, smooth reward function (far superior to GVL) that is SUPER simple to implement with an LLM. IMPLEMENTATION: 1. SELECT A MODEL: Pick an open-weight, multimedia LLM (ie Qwen3-VL). 2. PROMPT THE MODEL: Send the LLM the following prompt: "The above video shows a robot manipulation trajectory that completes the following task: {INSTRUCTION}. Decide whether the above statement is True or not. The answer is: " [where INSTRUCTION is any task like "fold the towel" or "pour coffee into the cup"] 3. EXTRACT THE REWARD: Find the *logit probability* for the specific token "True" and use that as your reward signal. [The logit probability is the raw, unnormalized score assigned by the model to the "True" token before it passes through the softmax layer. This logit prob is available for open-source models and some closed-source models - for example, ChatGPT exposes log probs, whereas Claude does not] That's it!! Obviously the logit prob and using the term "True" are key insights. It is quite elegant. Congrats to the brilliant authors at @UW and @allen_ai !
Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

San Francisco, CA 🇺🇸 English
7
25
220
39.4K
Prior @ AI2 รีทวีตแล้ว
Prior @ AI2 รีทวีตแล้ว
Jiafei Duan
Jiafei Duan@DJiafei·
Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇
English
12
66
363
106.9K
Prior @ AI2 รีทวีตแล้ว
AK
AK@_akhaliq·
TOPReward Token Probabilities as Hidden Zero-Shot Rewards for Robotics huggingface.co/papers/2602.19…
English
2
6
48
11.8K
Prior @ AI2 รีทวีตแล้ว
Ai2
Ai2@allen_ai·
Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.
English
10
100
714
95.7K
Prior @ AI2 รีทวีตแล้ว
Jiafei Duan
Jiafei Duan@DJiafei·
iTHOR from @allen_ai was one of the main reasons I got into Embodied AI and maybe even UW. It’s incredibly exciting to see the journey from iTHOR in Unity to a large-scale, fully open robotics platform—now supporting MuJoCo, ManiSkill, and Isaac. Grateful to contribute to this project! 👇This room brings back so many memories, hopefully it can create more memories since it is in MuJoCo, Isaac, and ManiSkill now!
Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English
0
2
32
4K
Prior @ AI2
Prior @ AI2@Ai2Prior·
Our iconic iThor has been revamped for robotics, available now on MuJoCo, ManiSkill and Isaac!
English
0
0
3
81