Kevis-Kokitsi Maninis

160 posts

Kevis-Kokitsi Maninis

@kmaninis

Research Scientist at Google DeepMind

Zurich, Switzerland Katılım Aralık 2010

239 Takip Edilen749 Takipçiler

Sabitlenmiş Tweet

Kevis-Kokitsi Maninis@kmaninis·11 Mar

📢📢 We released checkpoints and Pytorch/Jax code for TIPS: github.com/google-deepmin… Paper updated with distilled models, and more: arxiv.org/abs/2410.16512 #ICLR2025

André Araujo@andrefaraujo

Excited to release a super capable family of image-text models from our TIPS #ICLR2025 paper! github.com/google-deepmin… We have models from ViT-S to -g, with spatial awareness, suitable to many multimodal AI applications. Can’t wait to see what the community will build with them!

English

2.2K

Kevis-Kokitsi Maninis retweetledi

André Araujo@andrefaraujo·1d

True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N

English

699

75.9K

Kevis-Kokitsi Maninis retweetledi

Google AI Developers@googleaidevs·5 Ara

Gemini 3 Pro is the frontier of multimodal AI, delivering SOTA performance across document, screen, spatial, and video understanding. Read our deep dive on how we’ve pushed our core capabilities to power hero use cases across: + Docs: "derender" complex docs into structured code (HTML/LaTeX) + Screen: build robust computer agents that automate complex tasks + Spatial: generate collision-free trajectories for robotics & XR + Video: analyze sports footage using high-FPS processing with "thinking" mode See how these capabilities are transforming workflows in education, biomedical, and law/finance → goo.gle/3Mt3UlT

English

134

1.1K

328.8K

Kevis-Kokitsi Maninis retweetledi

Demis Hassabis@demishassabis·18 Kas

We’ve been intensely cooking Gemini 3 for a while now, and we’re so excited and proud to share the results with you all. Of course it tops the leaderboards, including @arena, HLE, GPQA etc, but beyond the benchmarks it’s been by far my favourite model to use for its style and depth, and what it can do to help with everyday tasks.

English

218

485

5.7K

589.4K

Kevis-Kokitsi Maninis retweetledi

Sundar Pichai@sundarpichai·18 Kas

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting. Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI. Excited for you to try it!

English

1.1K

2.6K

21.4K

2.9M

Kevis-Kokitsi Maninis retweetledi

Arjun Karpur@arjunkarpur·25 Nis

Thanks to everyone that stopped by today! Open-source code and weights for TIPS are available at: gdm-tips.github.io #ICLR2025

English

1.5K

Kevis-Kokitsi Maninis retweetledi

Arjun Karpur@arjunkarpur·25 Nis

Excited to be presenting TIPS at this morning’s #ICLR2025 poster session! Come by poster #318 and say hi 👋 w/ @kfrancischen @andrefaraujo @kmaninis #ICLR #ICLR25

English

769

Kevis-Kokitsi Maninis retweetledi

André Araujo@andrefaraujo·18 Mar

Multimodal AI encoders often lack spatial understanding… but not anymore! Our #ICLR2025 TIPS model (Text-Image Pretraining with Spatial awareness) from @GoogleDeepMind can help 💡🚀 Check out our strong & versatile image-text encoder 💪 Paper & code: arxiv.org/abs/2410.16512

English

323

35.4K

Kevis-Kokitsi Maninis@kmaninis·11 Mar

@rgilman33 @ferjadnaeem @mtschannen @XiaohuaZhai @andrefaraujo Checkpoints and code released. My guess is that "TIPS also requires registers" if the goal is clean attention maps, but in our experience the features are quite strong for dense tasks. x.com/kmaninis/statu…

Kevis-Kokitsi Maninis@kmaninis

📢📢 We released checkpoints and Pytorch/Jax code for TIPS: github.com/google-deepmin… Paper updated with distilled models, and more: arxiv.org/abs/2410.16512 #ICLR2025

English

Rudy Gilman@rgilman33·2 Mar

@kmaninis @ferjadnaeem @mtschannen @XiaohuaZhai @andrefaraujo wow awesome! I'll get the features and attention maps up as soon as you do. Looking forward to it, it looks promising.

English

102

Rudy Gilman@rgilman33·1 Mar

Siglip needs registers For comparison, here's DINO-v2 with registers. It has five extra tokens for the model to work with: one CLS token and four "registers". Look at how smooth those attention maps are! No artifacts.

English

663

131.3K

Kevis-Kokitsi Maninis@kmaninis·2 Mar

@rgilman33 @ferjadnaeem @mtschannen @XiaohuaZhai @andrefaraujo Hi Rudy, we are planning to release the TIPS code and checkpoints on Monday, March 10th. I will ping this thread when we do. Thank you for your interest!

English

117

Rudy Gilman@rgilman33·2 Mar

@ferjadnaeem @mtschannen @XiaohuaZhai And thanks for the note on TIPS. I actually tried to find an implementation of it for visualizing but haven't been able to find one! Perhaps @andrefaraujo @kmaninis or someone on the team can let us know when code is ready. It looks promising!

English

270

Kevis-Kokitsi Maninis retweetledi

André Araujo@andrefaraujo·23 Eki

Want some TIPS? Well, then check out “Text-Image Pretraining with Spatial awareness” :) TIPS is a general-purpose image-text encoder, for off-the-shelf dense and image-level prediction. Finally image-text pretraining with spatially-aware representations! arxiv.org/abs/2410.16512

English

6.2K

Kevis-Kokitsi Maninis@kmaninis·15 Tem

x.com/shay_krishnan/…

Akshay Krishnan@shay_krishnan

New paper 📢📢 TL;DR: We introduce OmniNOCS: a multi-domain NOCS dataset with 90+ object classes. Our model trained on OmniNOCS can predict 6DoF object pose and partial shape from their 2D detections, even generalizing to in-the-wild images! Background & details in 🧵 below

ZXX

126

Kevis-Kokitsi Maninis@kmaninis·12 Tem

Accepted at #ECCV2024

English

283

Kevis-Kokitsi Maninis@kmaninis·12 Tem

📢📢 New 3D dataset! 📢📢 OmniNOCS is a NOCS (Normalized Object Coordinate Space) dataset that unifies data across different domains with 90+ classes. Internship project of @shay_krishnan , w/ @_abhijit_kundu_ , @jhhays , Matthew Brown. Project page: omninocs.github.io

AK@_akhaliq

OmniNOCS unified NOCS dataset and model for 3D lifting of 2D objects We propose OmniNOCS, a large-scale monocular dataset with 3D Normalized Object Coordinate Space (NOCS) maps, object masks, and 3D bounding box annotations for indoor and outdoor scenes. OmniNOCS has 20 times more object classes and 200 times more instances than existing NOCS datasets (NOCS-Real275, Wild6D). We use OmniNOCS to train a novel, transformer-based monocular NOCS prediction model (NOCSformer) that can predict accurate NOCS, instance masks and poses from 2D object detections across diverse classes. It is the first NOCS model that can generalize to a broad range of classes when prompted with 2D boxes. We evaluate our model on the task of 3D oriented bounding box prediction, where it achieves comparable results to state-of-the-art 3D detection methods such as Cube R-CNN. Unlike other 3D detection methods, our model also provides detailed and accurate 3D object shape and segmentation. We propose a novel benchmark for the task of NOCS prediction based on OmniNOCS, which we hope will serve as a useful baseline for future work in this area.

English

14.3K

Kevis-Kokitsi Maninis retweetledi

Akshay Krishnan@shay_krishnan·15 Tem

English

3.2K

Kevis-Kokitsi Maninis@kmaninis·16 Nis

How 3D aware are vision foundation models? Check out the super interesting internship project of the amazing @_mbanani! #CVPR2024

AK@_akhaliq

Google announces Probing the 3D Awareness of Visual Foundation Models Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their

English

11K

Kevis-Kokitsi Maninis@kmaninis·17 Eki

@miguelbandera Ah, this is looks great! NAVI is actually focused on smaller objects due to limitations of our scanning process. Curious what you mean by "the ground truth is off". Did you try aligning on images?

English

miguelbandera@miguelbandera·16 Eki

@kmaninis This is great! I’ve been collecting data for creating a dataset like tanks and temples but for ruins and machines but my ground truth is still a bit off

miguelbandera@miguelbandera

I´m building a dataset collection for benchmarking & synthetic input generation (#photogrammetry #nerf) . Scanning objects (like a cannon) with various lenses & light conditions using @RealityCapture_ . What would be the best way to publish/make it available? Kaggle?

English

315

Kevis-Kokitsi Maninis@kmaninis·16 Eki

Are you evaluating 3D reconstruction/dense correspondences on synthetic datasets because real datasets are "not accurate enough"? Check out NAVI, a dataset that offers near-perfect alignments of 3D shapes on real image collections: navidataset.github.io #NeurIPS2023 (1/2)

Vittorio Ferrari@VittoFerrariCV

Three papers accepted to #NeurIPS 3/3 NAVI: a dataset of image collections of objects, along with high-quality 3D object scans, near-perfect 2D-3D alignments, and accurate camera parameters. arxiv.org/abs/2306.09109 navidataset.github.io With @jampani_varun, @kmaninis, others

English

11K

Kevis-Kokitsi Maninis@kmaninis·16 Eki

NAVI's accurate alignments enable new ways of evaluating 3D tasks on real data (NeRF-based 3D, pose estimation, single-image 3D, etc etc.). With @jampani_varun , Andreas, Arjun, Howard, and many others! Tell us about your use-case (data collection still ongoing)! (2/2)

English

174

Kevis-Kokitsi Maninis retweetledi

Jordi Pont-Tuset@jponttuset·5 Eki

The best part of #ICCV so far: meeting CinfonIA’s amazing team. Great talent from Colombia’s @Uniandes under Pablo Arbelaez’s guidance. A pleasure to be working with them! @AngelaCast135 @mcamilaep @laurita_daza @gaperezsa @kmaninis

English

1.9K

Keşfet

@GoogleDeepMind @arena @Geminiapp @GoogleAIStudio @kfrancischen @andrefaraujo @rgilman33 @ferjadnaeem