Nikhil Keetha

916 posts

Nikhil Keetha banner
Nikhil Keetha

Nikhil Keetha

@Nik__V__

PhD in Robotics @CMU_Robotics @airlabcmu | Visiting Researcher @Meta | Making robots 🤖 see the 🌍

Beigetreten Kasım 2017
1.3K Folgt1.9K Follower
Angehefteter Tweet
Nikhil Keetha
Nikhil Keetha@Nik__V__·
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇
English
30
132
741
121.1K
Nikhil Keetha retweetet
Manu Gaur
Manu Gaur@gaur_manu·
Pretrained ViTs like DINOv2 or CLIP are great, but they produce fixed, generic representations that encode the most salient visual concepts (e.g., "cat"). In human vision, prior priming with language changes how people parse an image. We believe visual encoders should do the same 🚨 Introducing Steerable Visual Representations, a new family of visual features you can steer with text towards specific visual concepts.
Manu Gaur tweet media
English
10
122
815
111.8K
Nikhil Keetha
Nikhil Keetha@Nik__V__·
@NikolausWest @OmarAlama Yes, that would be great! One of the drawbacks compared to Viser where I can't use Rerun as an API and also create things like custom camera paths.
English
0
0
1
7
Nikolaus West
Nikolaus West@NikolausWest·
@OmarAlama We should really add headless rendering and clicking soon 🤔
English
3
0
3
27
Nikhil Keetha retweetet
Omar Alama عمر الأعمى
I think this is a cool direction. I played with this a bit creating a mini "rerun-mcp". If an agent can go back and forth in time, inspect sensory feeds, and examine 3D data, it can provide global mission intelligence. Examples of Opus4.6 localizing a water tower👇
Rerun@rerundotio

Claude/Codex are great, and still struggle with spatial- and time-based problems. What helps: code-first tools like Rerun that give structure + visualization. Agents don’t just need to think, they need to see and validate as well.

English
2
3
10
1.3K
Bruno Santos🇵🇹
Bruno Santos🇵🇹@brunoeducsant·
I was reading MapAnything by @Nik__V__ & all. And I was wondering, is Dinov2 more general and faster than Dinov3? Or this any reason people still prefer in 2026 to use Dinov2 ?
English
3
0
0
397
Gabriele Berton
Gabriele Berton@gabriberton·
I have joined @GoogleDeepMind! I'll be training VLMs And I'll still keep posting about latest developments on AI, Computer Vision and LLMs So no more posts on PyTorch tricks. I might post about JAX. Stay tuned...
Gabriele Berton tweet media
English
122
64
3.6K
145.5K
François Fleuret
François Fleuret@francoisfleuret·
I know I am probably late to the party but Claude Opus hunting bugs is uncanny. @AnthropicAI
English
3
0
81
8.8K
Nikhil Keetha retweetet
Ethan Weber
Ethan Weber@ethanjohnweber·
I made a Claude Code skill that generates conference posters 🛠️ Instead of a static PDF, it outputs a single HTML file — drag to resize columns, swap sections, adjust fonts, then give your layout back to Claude. 🔁 🔗 Skill 👉 github.com/ethanweber/pos…
English
29
330
2.5K
183.1K
François Fleuret
François Fleuret@francoisfleuret·
@DanielePaliotta No method GPU-friendly that I am aware off allows to implement what I consider the most critical functionality of a recurrent memory: a garbage collector that removes redundant information.
English
6
2
44
5.7K
François Fleuret
François Fleuret@francoisfleuret·
The two main problems with architecture design are that 1. You have to please the GPU, so for instance anything recurrent is prohibited, 2. You have to beat baselines which have co-evolved with the data sets and training procedures.
smiz@__smiz

@francoisfleuret @ylecun When will it be easy, or even cheap, to iterate on model architectures? I suspect that’s when this will pop wide open.

English
8
4
91
10.8K
Nikhil Keetha retweetet
Shubham Tulsiani
Shubham Tulsiani@shubhtuls·
[1/N] Current visual geometry prediction models primarily rely on labeled 3D data. Our CVPR26 paper, Flow3r, allows additionally leveraging unlabeled videos (using flow supervision) for scalable visual geometry learning, enabling accurate multi-view 3D reconstruction in-the-wild.
English
2
26
206
15.6K
Nikhil Keetha
Nikhil Keetha@Nik__V__·
The feeling when this happens again for @CVPR 2026 😇 This time the authors executed brilliantly on a scope shift and experiment I suggested in the review! 🙌 Constructive feedback ftw 💪 #CVPR2026
Nikhil Keetha@Nik__V__

Reviewed a @CVPR paper where: Pre Rebuttal -> All Weak Reject Post Rebuttal -> All Weak Accept Kudos to the authors’ amazing rebuttal and my fellow co-reviewers! 🙌 Surprisingly all my reviewed papers have unanimous decisions 🤔😮 #CVPR2024 #R2 No more?!

English
1
0
30
4.8K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
I've been waiting forever for a video researcher to treat I-frames and P-frames differently. Another one is that jpeg/mpeg do patchification to 8x8 in the codec in the first place. Seems sensible to me to reuse that, at least if you want super high performance system.
Lucas Beyer (bl16) tweet media
Brian Li@Brian_Bo_Li

x.com/i/article/2021…

English
16
40
497
71.2K