FocoosAI
11 posts


🚨CVPR 2025 Highlight Paper Alert 🚨 ➡️Paper Title: SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation 🌟Few pointers from the paper 🎯Referring Video Object Segmentation (RVOS) relies on natural language expressions to segment an object in a video clip. 🎯Existing methods restrict reasoning either to independent short clips, losing global context, or process the entire video offline, impairing their application in a streaming fashion. 🎯In this work, authors aimed to surpass these limitations and design an RVOS method capable of effectively operating in streaming-like scenarios while retaining contextual information from past frames. 🎯They build upon the Segment-Anything 2 (SAM2) model, that provides robust segmentation and tracking capabilities and is naturally suited for streaming processing. 🎯They made SAM2 wiser, by empowering it with natural language understanding and explicit temporal modeling at the feature extraction stage, without fine-tuning its weights, and without outsourcing modality interaction to external models. 🎯To this end, they introduced a novel adapter module that injects temporal information and multi-modal cues in the feature extraction process. 🎯They further revealed the phenomenon of tracking bias in SAM2 and proposed a learnable module to adjust its tracking focus when the current frame features suggest a new object more aligned with the caption. 🎯Their proposed method, “SAMWISE”, achieves state-of-the-art across various benchmarks, by adding a negligible overhead of less than 5 M parameters. 🏢Organization: Politecnico di Torino [@PoliTOnews ], @FocoosAI 🧙Paper Authors: Claudia Cuttano, @gabTrivv , Gabriele Rosi, @masone_carlo , Giuseppe Averta 📝 Read the Full Paper here: arxiv.org/abs/2411.17646 🗂️ Project Page: claudiacuttano.github.io/SAMWISE/ 🧑💻 Code: github.com/ClaudiaCuttano… 🎥 Be sure to watch the attached Demo Video - Sound on 🔊🔊 🎵 Music by Adi Iswanto from @pixabay Find this Valuable 💎 ? ♻️QT and teach your network something new Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements. #CVPR2025 #highlight


Should you SHOW 🖼️ or TELL 📝 a model what to segment? 🤔 Our new #benchmark compares visual vs textual prompts for semantic segmentation across 14 datasets spanning 7 domains! Check out our findings ⬇️



