Bhuvan Sachdeva

48 posts

Bhuvan Sachdeva banner
Bhuvan Sachdeva

Bhuvan Sachdeva

@SachdevaBhuvan

Visiting Researcher at @MSFTResearch | Past Research at @AmazonScience Looking for Research Engineer roles!

Se unió Aralık 2019
910 Siguiendo107 Seguidores
Tweet fijado
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
Our paper has been accepted for Oral presentation at #CVPR 🎉🎉 Kudos to the team: @karan_uppal3, @abhinav_java See you in Denver! On a side note, I am finishing my fellowship by June and looking for full-time research roles. DMs open.
Bhuvan Sachdeva@SachdevaBhuvan

New paper: Understanding Task Transfer in Vision–Language Models How does finetuning a model on one task affect its performance on other tasks? @karan_uppal3 and @abhinav_java are presenting this work at Unireps, NeurIPS!! 📍 Ballroom 20D ⏰ 3:45 PM – 5:00 PM Come and say Hi!🧵

English
4
5
31
2.7K
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
@gabriberton I agree with the pros for vision encoder and think vision encoder are here to stay. That being said, do you think there's something inherently special with the visual modality to require separate encoding or are the advangates merely compute efficiency?
English
0
0
0
234
Gabriele Berton
Gabriele Berton@gabriberton·
Cool paper from Meta suggesting that future MLLMs will be Native Multimodal Models (NMM), hence no vision encoders anymore But I disagree I actually think we'll go in the other direction (what? more encoders? yes! read on...) All you need to know about the future of MLLMs 🧵
Gabriele Berton tweet media
Weiming Ren@wmren993

1/ 🚀 We’re excited to share Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation! Tuna-2 is a native unified multimodal model that supports visual understanding, text-to-image generation, and image editing directly from pixel embeddings. 🐟✨ 📄 Paper: arxiv.org/abs/2604.24763 🌐 Project: tuna-ai.org/tuna-2 💻 Code: github.com/facebookresear… Most unified multimodal models still rely on pretrained vision encoders, which add architectural complexity and can create representation mismatches between understanding and generation. Tuna-2 asks a simple question: Do we still need vision encoders? 👀 Our answer is No! Tuna-2 has a completely encoder-free architecture, where images are processed directly by a unified transformer together with text tokens. Take a glimpse at what our model can generate ↓ 🎨🖼️

English
10
24
191
68.3K
Bhuvan Sachdeva retuiteado
Saining Xie
Saining Xie@sainingxie·
vision🍌 is here vision-banana.github.io if you got into computer vision the way I did, starting with pixel-level labeling tasks like segmentation, edges, depth, or surface normals, you’ll probably feel the same seeing these results -- something big has quietly shifted, and it’s going to change how we approach these problems for good 🧵
English
11
110
785
65.2K
Bhuvan Sachdeva retuiteado
Bhuvan Sachdeva retuiteado
André Araujo
André Araujo@andrefaraujo·
True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N
André Araujo tweet media
English
11
95
736
82.6K
Bhuvan Sachdeva retuiteado
Lorenzo Xiao
Lorenzo Xiao@lrzneedresearch·
Made a public RL-for-LLMs reading list because I was trying to prepare for my interviews 96 papers, 5 categories, 24 subtopics, mostly around the 2025-2026 wave, with notes on what’s worth reading carefully vs what you can skim. Hopefully useful if you’re getting into RLHF/agentic RL/ reward modeling, or just cramming for interviews. algoroxyolo.github.io/blog/2026/rl-r… Note: Paper selection mainly adhere to me and @sun_hanchi's taste... Make sure you follow me so I can have the incentive to update this and the agentic system design series #RLHF #LLM #AIAgents
English
19
90
944
55.5K
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
@_Creation22 Does a given sequence have a single solution only? For example, take a sequence that has all the numbers in order from 1 to 25, except 12. You can use 12 from the start and decompose 21 into 1 and 2. This way, the sequence can be missing 12 or 21.
English
0
0
0
726
Srajan
Srajan@_Creation22·
One of the hardest problems I have seen asked in a phone screen round.
Srajan tweet media
English
112
19
1.3K
476.7K
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
(5/5) For more details, check out the paper: arxiv.org/abs/2511.18787. Happy to answer questions! Shoutout to my amazing co-authors and mentor Vineeth N B.
English
0
0
3
173
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
(4/5) Why use PGF? PGF-guided selection enables alternative datasets that rival or surpass direct finetuning when supervision data is scarce.
Bhuvan Sachdeva tweet media
English
1
0
1
270
Bhuvan Sachdeva
Bhuvan Sachdeva@SachdevaBhuvan·
New paper: Understanding Task Transfer in Vision–Language Models How does finetuning a model on one task affect its performance on other tasks? @karan_uppal3 and @abhinav_java are presenting this work at Unireps, NeurIPS!! 📍 Ballroom 20D ⏰ 3:45 PM – 5:00 PM Come and say Hi!🧵
Bhuvan Sachdeva tweet media
English
1
5
16
9.8K
Bhuvan Sachdeva retuiteado
Amit Sharma
Amit Sharma@amt_shrma·
Honored to be among the top three finalists for the 2025 TMLR Outstanding Paper Award. With the advent of LLMs, this paper helped me clarify what causal reasoning is, and I'm glad many others found it useful too. I believe it also offers a path forward in building causal agents and advancing scientific discovery. Some reflections below.
English
1
10
54
3.3K