Kevis-Kokitsi Maninis

160 posts

Kevis-Kokitsi Maninis banner
Kevis-Kokitsi Maninis

Kevis-Kokitsi Maninis

@kmaninis

Research Scientist at Google DeepMind

Zurich, Switzerland Katılım Aralık 2010
239 Takip Edilen749 Takipçiler
Sabitlenmiş Tweet
Kevis-Kokitsi Maninis
Kevis-Kokitsi Maninis@kmaninis·
📢📢 We released checkpoints and Pytorch/Jax code for TIPS: github.com/google-deepmin… Paper updated with distilled models, and more: arxiv.org/abs/2410.16512 #ICLR2025
André Araujo@andrefaraujo

Excited to release a super capable family of image-text models from our TIPS #ICLR2025 paper! github.com/google-deepmin… We have models from ViT-S to -g, with spatial awareness, suitable to many multimodal AI applications. Can’t wait to see what the community will build with them!

English
2
3
14
2.2K
Kevis-Kokitsi Maninis retweetledi
André Araujo
André Araujo@andrefaraujo·
True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N
André Araujo tweet media
English
9
86
699
75.9K
Kevis-Kokitsi Maninis retweetledi
Google AI Developers
Google AI Developers@googleaidevs·
Gemini 3 Pro is the frontier of multimodal AI, delivering SOTA performance across document, screen, spatial, and video understanding. Read our deep dive on how we’ve pushed our core capabilities to power hero use cases across: + Docs: "derender" complex docs into structured code (HTML/LaTeX) + Screen: build robust computer agents that automate complex tasks + Spatial: generate collision-free trajectories for robotics & XR + Video: analyze sports footage using high-FPS processing with "thinking" mode See how these capabilities are transforming workflows in education, biomedical, and law/finance → goo.gle/3Mt3UlT
Google AI Developers tweet media
English
45
134
1.1K
328.8K
Kevis-Kokitsi Maninis retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
We’ve been intensely cooking Gemini 3 for a while now, and we’re so excited and proud to share the results with you all. Of course it tops the leaderboards, including @arena, HLE, GPQA etc, but beyond the benchmarks it’s been by far my favourite model to use for its style and depth, and what it can do to help with everyday tasks.
Demis Hassabis tweet media
English
218
485
5.7K
589.4K
Kevis-Kokitsi Maninis retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting.  Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI.  Excited for you to try it!
English
1.1K
2.6K
21.4K
2.9M
Kevis-Kokitsi Maninis retweetledi
Arjun Karpur
Arjun Karpur@arjunkarpur·
Thanks to everyone that stopped by today! Open-source code and weights for TIPS are available at: gdm-tips.github.io #ICLR2025
Arjun Karpur tweet mediaArjun Karpur tweet media
English
0
3
16
1.5K
Kevis-Kokitsi Maninis retweetledi
André Araujo
André Araujo@andrefaraujo·
Multimodal AI encoders often lack spatial understanding… but not anymore! Our #ICLR2025 TIPS model (Text-Image Pretraining with Spatial awareness) from @GoogleDeepMind can help 💡🚀 Check out our strong & versatile image-text encoder 💪 Paper & code: arxiv.org/abs/2410.16512
André Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet media
English
6
64
323
35.4K
Kevis-Kokitsi Maninis
Kevis-Kokitsi Maninis@kmaninis·
@rgilman33 @ferjadnaeem @mtschannen @XiaohuaZhai @andrefaraujo Checkpoints and code released. My guess is that "TIPS also requires registers" if the goal is clean attention maps, but in our experience the features are quite strong for dense tasks. x.com/kmaninis/statu…
Kevis-Kokitsi Maninis@kmaninis

📢📢 We released checkpoints and Pytorch/Jax code for TIPS: github.com/google-deepmin… Paper updated with distilled models, and more: arxiv.org/abs/2410.16512 #ICLR2025

English
1
0
4
91
Rudy Gilman
Rudy Gilman@rgilman33·
Siglip needs registers For comparison, here's DINO-v2 with registers. It has five extra tokens for the model to work with: one CLS token and four "registers". Look at how smooth those attention maps are! No artifacts.
English
9
62
663
131.3K
Kevis-Kokitsi Maninis retweetledi
André Araujo
André Araujo@andrefaraujo·
Want some TIPS? Well, then check out “Text-Image Pretraining with Spatial awareness” :) TIPS is a general-purpose image-text encoder, for off-the-shelf dense and image-level prediction. Finally image-text pretraining with spatially-aware representations! arxiv.org/abs/2410.16512
André Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet media
English
4
11
49
6.2K
Kevis-Kokitsi Maninis retweetledi
Akshay Krishnan
Akshay Krishnan@shay_krishnan·
New paper 📢📢 TL;DR: We introduce OmniNOCS: a multi-domain NOCS dataset with 90+ object classes. Our model trained on OmniNOCS can predict 6DoF object pose and partial shape from their 2D detections, even generalizing to in-the-wild images! Background & details in 🧵 below
Akshay Krishnan tweet media
English
5
11
41
3.2K
Kevis-Kokitsi Maninis
Kevis-Kokitsi Maninis@kmaninis·
@miguelbandera Ah, this is looks great! NAVI is actually focused on smaller objects due to limitations of our scanning process. Curious what you mean by "the ground truth is off". Did you try aligning on images?
English
1
0
1
49
miguelbandera
miguelbandera@miguelbandera·
@kmaninis This is great! I’ve been collecting data for creating a dataset like tanks and temples but for ruins and machines but my ground truth is still a bit off
miguelbandera@miguelbandera

I´m building a dataset collection for benchmarking & synthetic input generation (#photogrammetry #nerf) . Scanning objects (like a cannon) with various lenses & light conditions using @RealityCapture_ . What would be the best way to publish/make it available? Kaggle?

English
1
0
0
315
Kevis-Kokitsi Maninis
Kevis-Kokitsi Maninis@kmaninis·
Are you evaluating 3D reconstruction/dense correspondences on synthetic datasets because real datasets are "not accurate enough"? Check out NAVI, a dataset that offers near-perfect alignments of 3D shapes on real image collections: navidataset.github.io #NeurIPS2023 (1/2)
Vittorio Ferrari@VittoFerrariCV

Three papers accepted to #NeurIPS 3/3 NAVI: a dataset of image collections of objects, along with high-quality 3D object scans, near-perfect 2D-3D alignments, and accurate camera parameters. arxiv.org/abs/2306.09109 navidataset.github.io With @jampani_varun, @kmaninis, others

English
2
5
49
11K
Kevis-Kokitsi Maninis
Kevis-Kokitsi Maninis@kmaninis·
NAVI's accurate alignments enable new ways of evaluating 3D tasks on real data (NeRF-based 3D, pose estimation, single-image 3D, etc etc.). With @jampani_varun , Andreas, Arjun, Howard, and many others! Tell us about your use-case (data collection still ongoing)! (2/2)
English
0
0
2
174