Divyanshu Mishra

518 posts

Divyanshu Mishra

@Perceptron97

Research @AmazonScience. DPhil from @UniOfOxford @NobleLabOxford. Interested in video understanding, world foundation models.

Oxford, England Bergabung Aralık 2010

578 Mengikuti198 Pengikut

Tweet Disematkan

Divyanshu Mishra@Perceptron97·14 Eyl

🚀 We’re excited to announce that our paper, “STAN-LOC: Visual Query-based Video Clip Localization for Fetal Ultrasound Sweep Videos,” has been accepted to #MICCAI2024! 🎉

English

1.4K

Divyanshu Mishra me-retweet

David Fan@DavidJFan·4 Mar

[1/9] What happens when you treat vision as a first-class citizen during multimodal pretraining? To find out, we studied the design space of training Transfusion-style models that input and output all modalities, from scratch. Here is what we learned about visual representations, data, world modeling, architecture, and scaling behavior! Paper: arxiv.org/abs/2603.03276 Website: beyond-llms.github.io @TongPetersb, @DavidJFan, @__JohnNguyen__, @ellisbrown, @GaoyueZhou, @JasonQSY, @boyangzheng, @webalorn, @han_junlin, @rob_fergus, @NailaMurray, @gh_marjan, @ml_perception, Nicolas Ballas, @_amirbar, Michael Rabbat, Jakob Verbeek, @LukeZettlemoyer, @koustuvsinha, @ylecun, @sainingxie

English

303

49.9K

Divyanshu Mishra@Perceptron97·19 Şub

@HubermanSaar Amazing work. This can be of interest arxiv.org/abs/2211.10636.

English

149

Saar Huberman@HubermanSaar·16 Şub

SemanticMoments - Semantic motion similarity How do you find videos with similar motion? It’s harder than it sounds. Models like VideoMAE and V-JEPA encode motion, but their embeddings are dominated by appearance. So how do we build a compact embedding for motion similarity? Joint work with @kfir99 @OPatashnik @BenaimSagie @MokadyRon

GIF

English

182

26.4K

Divyanshu Mishra me-retweet

Kostas Kamnitsas@KostasKamnitsas·12 Şub

📣New #ICLR paper🥳 🤖 You Point, I Learn: Online Adaptation of Interactive Segmentation Models in Medical Imaging. #Interactive: Reacts to user input #Adapts: Learns from user after each interaction Handles #distribution_shifts, eg new MRI sequences 📜 arxiv.org/abs/2503.06717…

English

Divyanshu Mishra@Perceptron97·9 Şub

@alifmunim Amazing work by the team 👏 Really impressive scale and results. Curious whether you considered comparisons with other recent video SSL architectures from around the V-JEPA2 timeframe (~2025), particularly to understand how different SSL methods scale for heart ultrasound.

English

Alif Munim (d/acc)@alifmunim·6 Şub

We trained a foundation model on 18 million heart ultrasound videos to predict structure instead of pixels. Introducing EchoJEPA, the first foundation-scale JEPA for medical video. Paper: arxiv.org/abs/2602.02603 Code: github.com/bowang-lab/Ech… 🧵 1/n

English

381

2.8K

589.4K

Divyanshu Mishra me-retweet

Yash Bhalgat@ysbhalgat·6 Oca

PhD applicants take note. As @j_foerst said, funding situation this year is not good. My advice is to consider CDTs if you are applying for an AI PhD in the UK. I am with @aims_oxford and highly recommend applying. Deadline: 28 January 2026 (check this).

Jakob Foerster@j_foerst

Hello World: I am reviewing Phd applications and the level of talent is amazing. Sadly, the funding situation is extremely challenging. SO: If you'd like to gift someone brilliant literally the opportunity of their lifetime and sponsor their Phd in my group please let me know 🙏

English

635

Divyanshu Mishra me-retweet

Martin Ziqiao Ma@ziqiao_ma·19 Ara

NEPA: Next-Embedding Predictive Autoregression A simple objective for visual SSL and generative pretraining. Instead of reconstructing pixels or predicting discrete tokens, we train an autoregressive model to predict the next embedding given all previous embeddings. Key ideas: - One self-supervised signal: cosine-style next-embedding prediction - Autoregression runs directly on the embeddings from a native encoder (no offline encoder) - No pixel decoder (and loss), no contrastive pairs, no task-specific heads, no random masks Scales into modern ViT backbones and stays competitive after supervised fine-tuning: - ImageNet-1K (Base 83.8%; Large 85.3%) - ADE20K Fully open-sourced with reproducibility verified: - Homepage: sihanxu.me/nepa/ - Paper: arxiv.org/abs/2512.16922 - Code: github.com/SihanXU/nepa - Weights: huggingface.co/collections/Si… This work is led by @6SihanXu and advised by @SLED_AI, @sainingxie, and Stella X. Yu. Contributors: me, @wenhaocha1, @ChenXuweiyi, and @JinWeiyang18434.

English

100

732

141.5K

Divyanshu Mishra me-retweet

Tengda Han@TengdaHan·5 Ara

Human learns from unique data -- everyone's OWN life -- but our visual representations eventually align. In our recent work "Unique Lives, Shared World" @GoogleDeepMind, we train models with "single-life" videos from distinct sources, and study their alignment and generalisation.

English

146

12.8K

Divyanshu Mishra me-retweet

Sindhu Hegde@SindhuBHegde·24 Eki

🎉Thrilled to be awarded the 2025 Google PhD Fellowship in Machine Perception for my research on human gesture understanding! Huge thanks to my advisor Prof. Andrew Zisserman for his constant guidance & to @GoogleAI, @Googleorg for this incredible honor. @Oxford_VGG @UniofOxford

Google.org@Googleorg

🎉 We're excited to announce the 2025 Google PhD Fellows! @GoogleOrg is providing over $10 million to support 255 PhD students across 35 countries, fostering the next generation of research talent to strengthen the global scientific landscape. Read more: goo.gle/43wJWw8

English

2.8K

Divyanshu Mishra me-retweet

Shashank@shawshank_v·21 Eki

An amazing written blog @ysbhalgat, must read for prospective PhD students.

Yash Bhalgat@ysbhalgat

💡 Should you do a PhD in AI (2025–26)? 🎓 🔗: yashbhalgat.github.io/blog/phd-or-no… Every October, students considering PhD applications ask me: is a PhD still the right path in AI? ⚖️ ⚠️ After a few years moving between academia (@UniofOxford's @Oxford_VGG) and industry (@QCOMResearch, @Meta Reality Labs, and a few startups), I’ve seen both sides of the research world. And the truth is: they’ve never felt further apart. 🌟 Today, most of the *scale-driven* work -- world models, video generation, large VLMs -- happens in industry. Compute access, data scale, and iteration speed make that inevitable. But academia still matters: it’s where new ideas, theory, and deep conceptual work often begin. The difference now is knowing what not to work on. 📢 I’ve written a longer, no-BS post on this -- what makes a PhD worth it, when it doesn’t, and how to think about your timing. 🧭 Read it, share it, debate it -- just don’t decide by inertia. Full post here: yashbhalgat.github.io/blog/phd-or-no… #PhD #AI #ArtificialIntelligence #MachineLearning #PhDLife #Research #AcademicTwitter #GradSchool #CareerAdvice

English

2.1K

Divyanshu Mishra me-retweet

Jia-Bin Huang@jbhuang0604·21 Eki

how a computer vision researcher sees the world (and solves most problems in vision).

Mustafa@oprydai

how a mathematician sees the world

English

131

1.7K

123K

Divyanshu Mishra me-retweet

Shashank@shawshank_v·18 Eki

Really excited to be giving a talk on “Openness of Vision Foundation Models” at the FOUND workshop tomorrow (19 Oct) at 10:20am, room 316C. Thanks to @HirokatuKataoka and colleagues for the invite. Looking forward to interacting with you all.

Hirokatsu Kataoka | 片岡裕雄@HirokatuKataoka

At ICCV 2025, I am organizing two workshops: the LIMIT Workshop and the FOUND Workshop. ◆ LIMIT Workshop (19 Oct, PM): iccv2025-limit-workshop.limitlab.xyz ◆ FOUND Workshop (19 Oct, AM): iccv2025-found-workshop.limitlab.xyz We warmly invite you to attend at these workshops in ICCV 2025 Hawaii!

English

6.8K

Divyanshu Mishra me-retweet

Yuki@y_m_asano·13 Eki

Our paper 'Self‑Labelling via Simultaneous Clustering and Representation Learning' just got its 1000th citation. On that occasion, I want to give my perspective on this question: Who or what is Sinkhorn–Knopp? Short answer: It’s the little ~1960s matrix‑normalization workhorse that now underpins modern self‑supervised vision training—think DINOv2, DINOv3, and Franca. Medium‑long answer. It’s an entropy‑regularized clustering routine that can run online. The entropy term spreads mass across prototypes (or equivalently "clusters" or "pseudolabels"), discouraging empty clusters and collapse—hence its popularity in modern SSL losses (DINO/ iBoT‑style heads). Long answer (history, intuition, relation to models like DINOv3) 👇

English

125

14.7K

Divyanshu Mishra me-retweet

Pramit Saha@PramitSaha5·11 Eki

🚀First step towards #agentic #automation in #Federated #Learning Excited to share our latest work #FedAgentBench on automating Real-world Federated Learning with LLM Agents: arxiv.org/pdf/2509.23803 Work done in @UniofOxford #FederatedLearning #LLM #AIagent #Multiagentsystem

English

137

Divyanshu Mishra me-retweet

Hermione Warr@Hermionegrace76·25 Eyl

📣 Excited to present our work at the ELAMI workshop #MICCAI_2025! 🗣️ Talk — Sept 27, 9:36am 🔍 Does the way we tokenize language affect performance of modern LMs in Radiology? 📄 paper: arxiv.org/abs/2508.09952 🧵 (1/5)

English

621

Divyanshu Mishra me-retweet

Angus Nicolson@angusjnic·21 Ağu

We're hiring! New postdoc position in our Digital Cardiology Lab at the Medical University of Innsbruck. Checkout the post on LinkedIn and feel free to reach out if you have any questions. linkedin.com/jobs/view/4288…

English

258

Divyanshu Mishra@Perceptron97·17 Ağu

@shawshank_v @abursuc @y_m_asano @v_pariza @MrzSalehi @SpyrosGidaris @LukasKnobel1 @EliasRamzi27714 @valeoai @FunAILab Great Work @shawshank_v . Truly inspiring 🙌😇

English

Divyanshu Mishra me-retweet

Shashank@shawshank_v·21 Tem

Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵

English

275

56.5K

Divyanshu Mishra me-retweet

Cohere Labs@Cohere_Labs·1 Ağu

Our Computer Vision group is excited to host David Fan and @TongPetersb next week on Tuesday, August 5th for a presentation on "Scaling Language-Free Visual Representation Learning" (arxiv.org/abs/2504.01017).

English

Divyanshu Mishra me-retweet

Yuki@y_m_asano·21 Tem

Today we release Franca, a new vision Foundation Model that matches and sometimes outperforms DINOv2. The data, the training code and the model weights (with intermediate checkpoints) are open-source, allowing everyone to build on this. Methodologically, we introduce two new SSL components, one is a multi-granularity SK clustering loss that utilizes Matryoshka representations and a quick post-pretraining scheme to remove unwanted spatial biases. This is the result of a close and fun collaboration @valeoai (in France) and @FunAILab (in Franconia)

Shashank@shawshank_v

English

171

13.6K

Divyanshu Mishra me-retweet

Yuki@y_m_asano·26 Haz

New paper accepted at @ICCVConference: MoSiC, our new post-pretraining technique for upgrading vision foundation models like DINOv2R using videos, thanks to strong point trackers and Sinkhorn clustering. Check the thread below :)

Shashank@shawshank_v

New paper out - accepted at @ICCVConference We introduce MoSiC, a self-supervised learning framework that learns temporally consistent representations from video using motion cues. Key idea: leverage long-range point tracks to enforce dense feature coherence across time.🧵

English

2.7K

Jelajahi

@TongPetersb @DavidJFan @__JohnNguyen__ @ellisbrown @GaoyueZhou @JasonQSY @boyangzheng @webalorn