Chaitanya (Chay) Ryali

290 posts

Chaitanya (Chay) Ryali banner
Chaitanya (Chay) Ryali

Chaitanya (Chay) Ryali

@wrong_whp

Here for preprints.

Sumali Haziran 2018
1K Sinusundan225 Mga Tagasunod
Chaitanya (Chay) Ryali
Chaitanya (Chay) Ryali@wrong_whp·
@gabriberton Since vision 🍌 can't handle negatives, it can't actually do instance segmentation. Yet here's a lead of the project claiming it's SOTA at this, beating SAM 3 at something vision 🍌 cant actually do. So, pretty far from correct. x.com/i/status/20473…
Songyou Peng@songyoupeng

What's surprising: Vision Banana keeps its original image generation ability AND achieves state-of-the-art zero-shot performance across tasks. 👉 No task-specific heads. 👉 No special losses. (Yes, the boring table below👇) (3/5)

English
0
0
1
428
Chaitanya (Chay) Ryali
Chaitanya (Chay) Ryali@wrong_whp·
@vgabeur @alcinos26 @sainingxie @jalayrac @jon_barron Thanks for following up Valentin! Since vision 🍌 is not capable of handling negatives, it's not capable of open vocabulary instance segmentation, yet it's claimed to be SOTA at this by a project lead, so I hope you can see how this can be misleading. x.com/i/status/20473…
Songyou Peng@songyoupeng

What's surprising: Vision Banana keeps its original image generation ability AND achieves state-of-the-art zero-shot performance across tasks. 👉 No task-specific heads. 👉 No special losses. (Yes, the boring table below👇) (3/5)

English
0
0
1
233
Gabriele Berton
Gabriele Berton@gabriberton·
Vision Banana outperforms SAM3 on most segmentation tasks, it is SOTA on Normals and monocular Metric Depth Estimation And the craziest thing is that it doesn't even take the camera intrinsics as input! [3/n]
Gabriele Berton tweet media
English
2
4
52
12.2K
Gabriele Berton
Gabriele Berton@gabriberton·
A team of cracked @GoogleDeepMind colleagues just released Vision Banana A brief thread about Vision Banana, what it means for the future of AI, and the future of image understanding 🧵
Gabriele Berton tweet media
English
3
6
69
5.5K
Chaitanya (Chay) Ryali
Chaitanya (Chay) Ryali@wrong_whp·
@alcinos26 Unless I'm misunderstanding, they evaluate on 500 (image, noun-phrase) pairs, not 500 images, so even worse - evaluating on < 0.3% of a long-tailed benchmark 😅
English
0
0
0
69
Nicolas Carion
Nicolas Carion@alcinos26·
First, they chose to run only on 500 images, for... computational reasons? Is google running out of TPUs? The benchmark has 15.8k - this is a feature, not a bug. The real world is very diverse and long-tailed, it's impossible to get accurate stats on such a small subset. 1/x
Nicolas Carion tweet media
English
3
1
40
3.2K
Chaitanya (Chay) Ryali nag-retweet
AI at Meta
AI at Meta@AIatMeta·
We’re releasing SAM 3.1: a drop-in update to SAM 3 that introduces object multiplexing to significantly improve video processing efficiency without sacrificing accuracy. We’re sharing this update with the community to help make high-performance applications feasible on smaller, more accessible hardware. 🔗 Model Checkpoint: go.meta.me/8dd321 🔗 Codebase: go.meta.me/b0a9fb
AI at Meta tweet media
English
106
273
2.2K
334.6K
Chaitanya (Chay) Ryali nag-retweet
Meta Newsroom
Meta Newsroom@MetaNewsroom·
New on @instagram Edits: AI-powered video effects, enabled by our new SAM3 model, make it easier to blur an object, tag an outfit, outline, and more. about.fb.com/news/2025/04/i…
English
22
51
426
57.5K
Chaitanya (Chay) Ryali nag-retweet
AI at Meta
AI at Meta@AIatMeta·
🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more: go.meta.me/568e5d
English
406
915
6.4K
1.2M
Chaitanya (Chay) Ryali
Chaitanya (Chay) Ryali@wrong_whp·
@georgiagkioxari Nice work! SAM 3 also extensively leveraged VLMs as verifiers for grounding data - producing near human-annotation (HQ) level data. Great to see this direction gaining momentum!
Chaitanya (Chay) Ryali tweet media
English
0
0
1
65
Georgia Gkioxari
Georgia Gkioxari@georgiagkioxari·
Most people are hyped about LLMs as generators/actors. But IMO their real superpower is being verifiers/critics. And in computer vision this is especially true: today’s VLMs still struggle on lots of core vision tasks, yet they’re incredibly useful as feedback engines...check Damiano's work for more details x.com/marsilidamiano…
English
8
10
261
278.8K
nikshep
nikshep@nikshepsvn·
@vikhyatk bro what are these numbers, nice work
nikshep tweet media
English
2
0
0
150
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@CoolMFcat @vikhyatk (but I give it a reasonable chance that vik DID do the right thing, because he has seen me tweet about this in the past!)
English
1
0
8
262
Chaitanya (Chay) Ryali nag-retweet
Dilum Sanjaya
Dilum Sanjaya@DilumSanjaya·
Found the perfect sport to stress test Meta's SAM3 person segmentation. Dense crowds, extreme motion, zero structure. This is as tough as it gets, and SAM3 nailed it.
English
45
130
2K
157.5K
Chaitanya (Chay) Ryali nag-retweet
Dilum Sanjaya
Dilum Sanjaya@DilumSanjaya·
Tested Meta's SAM 3 on some low quality dashcam footage and expected the segmentation to fall apart, but it still picked up every vehicle and even spotted people on the roadside that I hadn't noticed at all.
English
42
119
1.7K
220.6K
Chaitanya (Chay) Ryali nag-retweet
AA
AA@measure_plan·
typing "player with a red shirt" was all it took to train this computer vision model just a few minutes to train the model, export to python, and create this video i'll keep testing roboflow rapid and will report back with the results in the meantime, enjoy the magic of Zlatan
SkalskiP@skalskip92

data labeling is dead. long live distillation. from data to object detection endpoint in 90 seconds. link: rapid.roboflow.com

English
12
33
491
239K
Chaitanya (Chay) Ryali nag-retweet
Kyle Walker
Kyle Walker@kyle_e_walker·
The new SAM3 model from @Meta is blowing my mind Shown here: detecting putting greens, pools, and cars in Scottsdale from simple text prompts via @Mapbox imagery R, Shiny, mapgl for the UI; Python backend via @giswqs's segment-geospatial package (thanks Qiusheng!)
English
15
62
606
43.8K
Chaitanya (Chay) Ryali
Chaitanya (Chay) Ryali@wrong_whp·
"laundry on the bed", "left light", "right couch" - are not "basic attributes", they are relationships. Easy way to see this: if you masked out other objects, e.g. the bed or one of the couches or lights, the referring phrase becomes incorrect for the target. On the subset that does look like PCS, it can be even better than GT as shown. Not sure what's not to buy. "Ref"COCO is a referring expression benchmark. A different task. Would you train on COCO and and deploy on a RefCOCO like task?
Chaitanya (Chay) Ryali tweet media
English
0
1
4
186
Chaitanya (Chay) Ryali nag-retweet
Qiusheng Wu
Qiusheng Wu@giswqs·
🚀 Video Segmentation and Object Tracking with SAM 3! Learn how to segment and track objects in any video using text and point prompts with Meta’s powerful SAM 3 (Segment Anything Model 3)! Whether you're removing unwanted objects or adding new ones, this tutorial walks you through everything from start to finish. ✅ What You’ll Learn: How to use text prompts for object segmentation Use point-based prompts to add or remove objects Easily track any object across multiple video frames Real-world examples using SamGeo 📌 Useful Resources: 🔗 GitHub Repository (SamGeo): github.com/opengeos/segme… 🔗 Notebook Example: samgeo.gishub.org/examples/sam3_… 🔗 Meta SAM 3 Overview: ai.meta.com/sam3 📺 Check out the full video tutorial at @giswqs/videos" target="_blank" rel="nofollow noopener">youtube.com/@giswqs/videos #SAM3 #GeoAI #Geospatial #OpenSource #Python #DataScience
English
9
144
946
52.7K
Ethan Reid
Ethan Reid@EthanReidMorro·
Really impressed with SAM3, but having trouble buying the PCS argument. What would you call the RefCOCO benchmark: PCS or referral segmentation? RefCOCO uses simple noun phrases (general categories + basic attributes) but it is not called a PCS benchmark. These samples are from the RefCOCO train set:
Ethan Reid tweet media
English
2
0
1
156
moondream
moondream@moondreamai·
Moondream’s new segmentation just dropped. Prompt: “dirty laundry items on the bed.” Moondream: pixel-perfect + actually understands the scene. SAM 3: grabs the floor.
moondream tweet media
English
28
92
1.2K
60K
Chaitanya (Chay) Ryali nag-retweet
Qiusheng Wu
Qiusheng Wu@giswqs·
🌍 Unlock powerful GeoAI workflows with SAM 3! In this step-by-step tutorial, I demonstrate how to segment remote sensing imagery using text prompts and bounding boxes, powered by Meta’s SAM 3 (Segment Anything Model 3). You’ll learn how to run image segmentation on satellite and aerial imagery, extract objects of interest, and export the results to geospatial formats like GeoTIFF for further GIS or Python analysis. 🔗 GitHub Repository (SamGeo): github.com/opengeos/segme… 🔗 Notebook Example: samgeo.gishub.org/examples/sam3_… 👉 Check out the full video tutorial at @giswqs/videos" target="_blank" rel="nofollow noopener">youtube.com/@giswqs/videos #SAM3 #GeoAI #Geospatial #OpenSource #Python #DataScience
Qiusheng Wu tweet media
English
5
44
293
11.7K