Ai2

3.5K posts

Ai2 banner
Ai2

Ai2

@allen_ai

Breakthrough AI to solve the world's biggest problems. › Join us: https://t.co/MjUpZpKPXJ › Newsletter: https://t.co/k9gGznstwj

Seattle, WA Entrou em Eylül 2015
431 Seguindo83K Seguidores
Tweet fixado
Ai2
Ai2@allen_ai·
Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵
English
8
61
269
78.6K
Ai2 retweetou
Peter
Peter@PeterSushko·
LLM evals are hard. Agentic evals are very hard. Web browsing evals are crazy. The same webpage will show different content based on: Time of year (seasonal promos) Your IP (stores near me) Your device (os+browser combo) Random A/B tests This codebase solves evals and training.
Ai2@allen_ai

You can now train, adapt, and eval web agents on your own tasks. We're releasing the full MolmoWeb codebase—the training code, eval harness, annotation tooling, synthetic data pipeline, & client-side code for our demo. 🧵

English
0
5
14
3.7K
Ai2
Ai2@allen_ai·
We're also releasing the client-side code for our MolmoWeb demo—so you can see how we built the interface that lets you give MolmoWeb a task and watch it navigate websites in real time. Use it as a starting point for your own web agent UI ↓ youtube.com/watch?v=rzkBE8…
YouTube video
YouTube
English
1
1
7
1.9K
Ai2
Ai2@allen_ai·
You can now train, adapt, and eval web agents on your own tasks. We're releasing the full MolmoWeb codebase—the training code, eval harness, annotation tooling, synthetic data pipeline, & client-side code for our demo. 🧵
Ai2 tweet media
English
3
42
225
24.7K
Ai2 retweetou
Jason Ren
Jason Ren@RenZhongzheng·
We just dropped WildDet3D🔥 — open promptable 3D object detector for any image in the wild! 🚀 SOTA results 🤖 Real-world apps (AR, Robotics) 📦 Open data, model, benchmarks, code 📱 ...we even built an iPhone app! You might have heard recent news, but to be clear: Ai2 isn't going anywhere. We’re doubling down on what we do best: OPEN RESEARCH. More to come!
Ai2@allen_ai

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

English
0
12
71
12.6K
Ai2 retweetou
Jiafei Duan
Jiafei Duan@DJiafei·
Introducing WildDet3D, a grounding model for monocular 3D object detection in the wild. A question I keep coming back to is: what is the right backbone for robotics foundation models? Should it be a video model, a language model, or perhaps a grounding model? WildDet3D is our first step in exploring that direction.
Ai2@allen_ai

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

English
3
18
97
11.7K
Ai2 retweetou
Jieyu Zhang
Jieyu Zhang@JieyuZhang20·
Excited to share WildDet3D! I believe understanding in-the-wild objects in 3D is the key to spatial intelligence: models need to reason not only about what objects are, but also where they are, how large they are, and how they are oriented. To make this possible, we collected the largest-ever in-the-wild dataset for 3D object detection. 3D boxes are just the first step, with exciting future extensions to video and richer shape/geometry understanding. Huge kudos to lead author @weikaih04 for driving this as an undergrad at UW — I’m proud to have contributed and mentored him on the project.
Ai2@allen_ai

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

English
0
5
32
3.5K
Ai2 retweetou
Weikai Huang
Weikai Huang@weikaih04·
Thrilled to announce our latest project at @allen_ai @RAIVNLab: WildDet3D Humans understand objects in 3D effortlessly -- we see a mug on a desk, judge the distance to a parked car, or estimate the height of a building across the street. For CV / Robotics models, this remains surprisingly hard. We've built great models that each handle a piece of the puzzle: FoundationPose for 6-DoF pose over tabletops, MoGe 2 for accurate metric depth estimation, SAM for 2D segmentation and tracking. But they're fragmented -- each solves one sub-task, none gives you the full picture: where is this object in 3D, how big is it, and how is it oriented? Monocular 3D object detection is exactly this task -- recovering the full 3D bounding box of any object from a single RGB image. It's the missing link that connects 2D perception to real-world 3D understanding for robotics, AR/VR, and embodied AI. vehicles So why hasn't anyone cracked open-world 3D detection? Data. Existing 3D datasets (Omni3D, COCO3D) cover fewer than 100 categories, locked to driving corridors and indoor rooms. And the annotation methods -- BEV labelling, point cloud labelling -- fundamentally don't scale to in-the-wild scenes where you don't have LiDAR or a well-reconstructed point cloud. And objects are much more diverse in size/pose compared with vehicle and furniture. To tackle this: We designed a human-in-the-loop pipeline to change this. We build complex pseudo-3D box generators using different algorithms/models. Then, 1700+ human annotators from Prolific select the best candidate and verify quality. Along with thousands of annotators for several months, we got the result: WildDet3D-Data -- 1M total images, 13.5K categories of objects, with 100k of all human-verified 3d detection images. That's 138x more category coverage than Omni3D. Street food carts, violins, traffic cones, sculptures -- objects no 3D dataset has ever covered. With this data, we trained WildDet3D -- a single geometry-aware architecture built on SAM 3 and LingBot-Depth that unifies every way you'd want to interact with a 3D detector: - Text: "find all chairs" - Box prompt: click a 2D box, get its 3D box (geometric, one-to-one) - Exemplar prompt: draw one box, find all similar objects (one-to-many) - Point prompt: click on an object And when you have extra depth -- LiDAR, stereo, anything -- just pass it in. The model fuses it and gets substantially better: +20.7 AP on average. No depth? It works fine without it. Results on our new in-the-wild benchmark (WildDet3D-Bench, 700+ open-world categories): 22.6 AP text / 24.8 AP box -- up from 2.3 AP for the previous best. With depth: 41.6 AP text / 47.2 AP box. Also SOTA on Omni3D (34.2 AP text / 36.4 AP box) with 10x fewer training epochs, and strong zero-shot transfer to Argoverse 2 and ScanNet (40.3 / 48.9 ODS).
Ai2@allen_ai

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

English
5
18
83
17.9K
Ai2
Ai2@allen_ai·
Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵
English
8
61
269
78.6K