
Paper link: arxiv.org/abs/2603.18004
Huge thanks to the people of PRIOR team at Ai2! This paper would not have been done without you all!
English
Harris Zhang
9 posts

@HyperStorm9682
PhD Student at UW-Madison in Computer Vision


Molmo 2 doesn't just answer questions about clips—it searches & points. The model returns coordinates & timestamps over videos + images, powering QA, counting, dense captioning, artifact detection, & subtitle-aware analysis. You can see exactly how it reasoned.




@yuyinzhou_cs @NeurIPSConf I have two D&B papers in the same situation: ACs recommended accept, but PCs overruled and rejected with the same exact vague reason that you got. They should at least provide a proper reason.



