Ranjay Krishna

2K posts

Ranjay Krishna banner
Ranjay Krishna

Ranjay Krishna

@RanjayKrishna

Assistant Professor @ University of Washington, Co-Director of RAIVN lab (https://t.co/f0BWKyjoeA), Director of PRIOR team (https://t.co/l9RzTesMSM)

California, USA Katılım Ağustos 2011
435 Takip Edilen6K Takipçiler
Ranjay Krishna retweetledi
Jae Sung Park
Jae Sung Park@jjaesungpark·
VLMs today—including our own Molmo—point via raw text strings (e.g. ""). What if pointing meant directly selecting the visual tokens instead? 🤔 Introducing MolmoPoint: Better Pointing for VLMs with Grounding Tokens 🎯 🔓models, code, data, demo all OPEN 🧵👇 Paper: allenai.org/papers/molmopo…
English
10
34
342
44.8K
Ranjay Krishna retweetledi
Yue Yang
Yue Yang@YueYangAI·
🎯 We release MolmoPoint, the best open model in GUI grounding 💻 by training on purely synthetic screenshots. We open-source all our models, data, and generation code. Plug it into your agents! Demo: huggingface.co/spaces/allenai… Model: huggingface.co/allenai/MolmoP… Data: huggingface.co/datasets/allen… Code: github.com/allenai/MolmoP…
Yue Yang tweet media
Ai2@allen_ai

Grounding lets vision-language models do more than describe—they can point to where a robot should grasp, which button to click, or which object to track across video frames. Today we're releasing MolmoPoint, a better way for models to point. 🧵

English
0
12
84
7K
Ranjay Krishna retweetledi
Zixian Ma
Zixian Ma@zixianma02·
MolmoPoint now points and grounds to visual tokens directly, instead of naively outputting coordinates in text 🎯 It can also do GUI grounding very well, in addition to better image and video pointing 💻 Check out our super neat new release!
Ai2@allen_ai

VLMs already have visual tokens. Letting them point by selecting those tokens turns out to be simpler, faster, & better. 🤖 Models: huggingface.co/collections/al… 📦 Data: huggingface.co/collections/al… 💻 Code: github.com/allenai/molmo2 📖 Blog: allenai.org/blog/molmopoint

English
0
2
31
4.8K
Ranjay Krishna retweetledi
Ai2
Ai2@allen_ai·
Grounding lets vision-language models do more than describe—they can point to where a robot should grasp, which button to click, or which object to track across video frames. Today we're releasing MolmoPoint, a better way for models to point. 🧵
Ai2 tweet media
English
4
30
205
37.1K
Ranjay Krishna retweetledi
Ai2
Ai2@allen_ai·
"We trained our very first [Molmo] model and were surprised to find that it outperformed GPT. Scale wasn't everything in vision language — clearly there was a key role for data." @RanjayKrishna on today's open model panel at #NVIDIAGTC
Ai2 tweet media
English
1
8
59
10.2K
Ranjay Krishna retweetledi
Ainaz Eftekhar
Ainaz Eftekhar@ainaz_eftekhar·
Excited to share MolmoBot! 🤖 A big milestone for sim-to-real robotics!🚀 We show that training manipulation policies on massive, diverse simulation data can transfer zero-shot to the real world—for both static and mobile manipulation tasks🦾
Ai2@allen_ai

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

English
0
3
17
1.8K
Ranjay Krishna retweetledi
Linxin Song
Linxin Song@linxins2·
🚀 Introducing ExeVRM — a video-based reward model that judges whether a computer-use agent actually completed your task, just by watching the screen recording. Our 8B model hits 84.7% accuracy & 87.7% recall, outperforming GPT-5.2 and Gemini-3 Pro on execution video assessment across Ubuntu, macOS, Windows & Android. No access to agent internals needed. Just the video. 🎬 📄 Paper: arxiv.org/abs/2603.10178 💻 Code: github.com/limenlp/ExeVRM 🤗 Model: huggingface.co/lime-nlp/ExeVR… 📦 Data: huggingface.co/datasets/lime-…
English
2
9
46
8.4K
Ranjay Krishna
Ranjay Krishna@RanjayKrishna·
We are releasing MolmoBot! We challenge the assumption that sim-to-real requires real-world finetuning. Our robot models beat strong baselines with no real world data. With enough diversity and scale in simulation, zero-shot transfer can actually work—across both static and mobile manipulation. Similar to all our projects, everything is open sourced.
Ai2@allen_ai

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

English
0
5
60
5.4K
Ranjay Krishna retweetledi
Ai2
Ai2@allen_ai·
Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵
English
10
41
277
58.8K
Ranjay Krishna retweetledi
pfung
pfung@philfung·
I read this paper and its awesome - it creates a high-performing, smooth reward function (far superior to GVL) that is SUPER simple to implement with an LLM. IMPLEMENTATION: 1. SELECT A MODEL: Pick an open-weight, multimedia LLM (ie Qwen3-VL). 2. PROMPT THE MODEL: Send the LLM the following prompt: "The above video shows a robot manipulation trajectory that completes the following task: {INSTRUCTION}. Decide whether the above statement is True or not. The answer is: " [where INSTRUCTION is any task like "fold the towel" or "pour coffee into the cup"] 3. EXTRACT THE REWARD: Find the *logit probability* for the specific token "True" and use that as your reward signal. [The logit probability is the raw, unnormalized score assigned by the model to the "True" token before it passes through the softmax layer. This logit prob is available for open-source models and some closed-source models - for example, ChatGPT exposes log probs, whereas Claude does not] That's it!! Obviously the logit prob and using the term "True" are key insights. It is quite elegant. Congrats to the brilliant authors at @UW and @allen_ai !
Jiafei Duan@DJiafei

Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇

San Francisco, CA 🇺🇸 English
7
25
220
39.1K
Ranjay Krishna retweetledi
pfung
pfung@philfung·
Inspired by the TopReward paper, I made a lil web tool to test these robot manipulation rewards on your own videos. Try: philfung.github.io/rewardscope Record yourself folding a towel, upload it, and compare: 1. TopReward (this paper) 2. GVL (Deepmind) 3. Brute Force (i.e. at each frame, ask LLM to reply with a probability) TopReward (Qwen3VL-8B) holds its own surprisingly well against the others, even if those use ChatGPT! Great work @DJiafei, UW, AllenAI, thanks for pushing @VilleKuosmanen.
pfung@philfung

I read this paper and its awesome - it creates a high-performing, smooth reward function (far superior to GVL) that is SUPER simple to implement with an LLM. IMPLEMENTATION: 1. SELECT A MODEL: Pick an open-weight, multimedia LLM (ie Qwen3-VL). 2. PROMPT THE MODEL: Send the LLM the following prompt: "The above video shows a robot manipulation trajectory that completes the following task: {INSTRUCTION}. Decide whether the above statement is True or not. The answer is: " [where INSTRUCTION is any task like "fold the towel" or "pour coffee into the cup"] 3. EXTRACT THE REWARD: Find the *logit probability* for the specific token "True" and use that as your reward signal. [The logit probability is the raw, unnormalized score assigned by the model to the "True" token before it passes through the softmax layer. This logit prob is available for open-source models and some closed-source models - for example, ChatGPT exposes log probs, whereas Claude does not] That's it!! Obviously the logit prob and using the term "True" are key insights. It is quite elegant. Congrats to the brilliant authors at @UW and @allen_ai !

Burlingame, CA 🇺🇸 English
8
21
151
31.1K
Ranjay Krishna retweetledi
Ai2
Ai2@allen_ai·
📢 Update: the Molmo 2 codebase is now open source. We're releasing the code behind Molmo 2—our open model family for video & image understanding, pointing, tracking, & more. Now you can easily train Molmo 2 on your own data. 🧵
Ai2 tweet media
English
6
51
364
30.9K
Ranjay Krishna retweetledi
Jiafei Duan
Jiafei Duan@DJiafei·
Instead of asking a VLM to output progress, it reads the model’s internal belief directly from token logits. No in-context learning. No fine-tuning. No reward training. 📈 We introduce: TOPReward, a zero-shot reward modeling approach for robotics using token probabilities from pretrained video VLMs. The simplest way of doing reward modelling for robotics! Project: topreward.github.io/webpage/ 🧵👇
English
12
65
362
105.8K
Ranjay Krishna retweetledi
Weikai Huang
Weikai Huang@weikaih04·
Free Jigsaw-like data > massive human-annotated data, on detection / segmentation tasks? Excited to share our CVPR 2026 paper from @UW + @allen_ai: SOC: Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding We generated 20M jigsaw puzzle-like synthetic object segments (47K categories) and composed 2M detection/segmentation/grounding training images, with zero human annotation. 💡Key idea: Diffusion models excel at generating single objects. So we: 1️⃣ Generate individual objects → get perfect masks for free 2️⃣ Compose them like jigsaw puzzles with 3D layout priors 3️⃣ Use generative relighting to harmonize the scene Result: Training data with pixel-perfect annotations at any scale. 📊 Highlights: → LVIS Detection: 50K SOC images → +9.7 AP, rare classes +13.4 AP, outperforming 20M GRIT and rivaling 200K human-annotated V3Det → Visual Grounding: gRefCOCO no-target accuracy +8.4, DoD +3.8 mAP, beating both GRIT & V3Det → Instance Seg: LVIS rare +3.83 AP; COCO 1% data regime +6.59 AP Huge thanks to my great mentors @RanjayKrishna, @JieyuZhang20 and all collaborators @TaoyangJia, @Michael3014018, Ziqi Gao, @jjaesungpark, and @WinsonHan Open-sourcing: 📄 arxiv.org/abs/2510.09110 💻 github.com/weikaih04/Synt… 🤗huggingface.co/collections/we…
Weikai Huang tweet mediaWeikai Huang tweet mediaWeikai Huang tweet mediaWeikai Huang tweet media
English
3
6
40
6K
Ranjay Krishna retweetledi
Arijit Ray
Arijit Ray@ARRay693·
"It is by logic that we prove, but by [abstract] intuition that we discover." - Henri Poincaré. When faced with a complex problem, we pause, we think. Not exactly in words, not exactly in images — in something more abstract, something harder to name. So, for truly intelligent agents, should we not ask that they do the same? Introducing Mull-Tokens — a modality-agnostic latent thinking paradigm. Now, the model can think in space, in time, in words, in affordances — in all the things that language alone cannot easily convey. arijitray.com/multimodal_thi…
English
1
1
7
549
Ranjay Krishna retweetledi
Manling Li
Manling Li@ManlingLi_·
📍Theory of Space (accepted at #ICLR2026) Theory of Mind → hidden mental states Theory of Space → hidden spatial beliefs from passive observers “What do I know?” to active explorers “What don’t I know, and how do I reduce that uncertainty?” Theory of Space is to evaluate if foundation models can actively construct, revise, and exploit internal spatial beliefs. We quantify Active-Passive Gap. Not just measure task accuracy, but how much uncertainty is reduced per step, and how many steps are needed in total for agents to build stable spatial beliefs. Exploration should prioritize information gain and reduce uncertainty per step. Instead, we observe LLMs/VLMs explore redundantly with stalled belief updates. Key findings: 1. Active agents perform worse than rule based programs 2. Cognitive Map Failures & Belief Drift (beliefs about previously observed objects degrades over time; new updates corrupt earlier correct perceptions) 3. Poor Vision Identification & Belief Inertia in Belief Revision Website: theory-of-space.github.io Code: github.com/mll-lab-nu/The… Data: huggingface.co/datasets/MLL-L… Theory of Space is a joint effort of @NorthwesternEng, @StanfordAILab, @uwcse, @Cornell_CS. Led by the amazing @WilliamZhangNU, jointly done with @zihanhuang66, @YueYuew8314, @JieyuZhang20, @XLe41402, @wzihanw, @qineng_wang, @keshigeyan, @RuohanZhang76, @YejinChoinka, @RanjayKrishna, @jiajunwu_cs, @drfeifei
English
7
93
491
51.3K
Ranjay Krishna
Ranjay Krishna@RanjayKrishna·
@dragon_khoi Because there were no diverse simulation environments available to train/test with... until now.
English
0
0
1
34
khoi
khoi@dragon_khoi·
@RanjayKrishna how come most of the big labs rely mostly on real world (teleop or video, etc) today? (nvidia, pi, 1x, deepmind, etc)
English
1
0
0
59
Ranjay Krishna
Ranjay Krishna@RanjayKrishna·
The amount and diversity of robot data we need is much higher than what we can scale. We are betting on simulation! MolmoSpaces allows you to generate seemingly unlimited amounts of robot data in large diverse environments across multiple simulators.
Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English
5
4
48
6.2K
Ranjay Krishna retweetledi
Luci Pars
Luci Pars@parsluci·
Robotlar artık 230 bin evde prova yapıyor: MolmoSpaces gerçek dünyayı simüle ediyor Allen AI MolmoSpaces adında devasa bir açık platformu duyurdu. Robotların gerçek dünyada hareket etmesini sağlayacak yapay zeka için inanılmaz bir kaynak ortaya çıktı. 230 binden fazla farklı ev içi mekan, 130 binden fazla gerçekçi 3D nesne modeli ve tam 42 milyon doğrulanmış tutuşma verisiyle dolu. Her şey fizik kurallarına göre simüle edilmiş, nesnelerin ağırlığı, sertliği, kapıların açılışı bile hesaba katılıyor. Eskiden basitçe nesneye dokununca tuttu diye geçiştirilen şeyler artık gerçekçi şekilde işliyor. Bu platform sayesinde robotlar yeni bir odaya girse bile daha önce görmediği eşyaları tutup kullanmayı öğrenebilecek. Ayrıca MolmoSpaces-Bench diye bir test sistemi var aydınlatmayı değiştir, nesnenin ağırlığını artır, komutu farklı söyle, tek tek zorluk ekleyip yapay zekanın nerede tökezlediğini net görebiliyorsun. Binlerce sahne üzerinde sistematik deney yapmak mümkün hale geldi. Üstelik herkes kullanabiliyor kod açık, veriler Hugging Face'te, demo sitede hazır, farklı simülatörlerle bile uyumlu. Telefonundan bile robotu uzaktan kontrol edip veri toplayabiliyorsun, ekstra kurulum gerekmiyor. Gelecekte evde yardımcı robotlar, fabrikalarda çalışan sistemler çok daha hızlı gelişecek gibi duruyor. Bu ölçekte açık bir veri seti ve araç seti çıkması araştırmacıları bayağı heyecanlandırdı. Reklam değildir.
Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

Türkçe
0
2
13
1.9K