
Arjun Karpur
19 posts

Arjun Karpur
@arjunkarpur
Head of ML @ Epsilon Health | Prev. @GoogleDeepMind @UTCompSci | Training image encoders & VLMs for radiology














Excited to release a super capable family of image-text models from our TIPS #ICLR2025 paper! github.com/google-deepmin… We have models from ViT-S to -g, with spatial awareness, suitable to many multimodal AI applications. Can’t wait to see what the community will build with them!

Want some TIPS? Well, then check out “Text-Image Pretraining with Spatial awareness” :) TIPS is a general-purpose image-text encoder, for off-the-shelf dense and image-level prediction. Finally image-text pretraining with spatially-aware representations! arxiv.org/abs/2410.16512






OmniGlue: Generalizable Feature Matching with Foundation Model Guidance @hanwenjiang1 , Arjun Karpur, Bingyi Cao, @qixing_huang @andrefaraujo tl;dr: use DINOv2 similarity to prune matching graphs, query and key use posenc, but attention not. No IMC eval arxiv.org/abs/2405.12979









OmniGlue: Generalizable Feature Matching with Foundation Model Guidance @hanwenjiang1, Arjun Karpur, Bingyi Cao, @qixing_huang, @andrefaraujo tl;dr: DINOv2->SG/LoFTR; DINOv2->similarities between keypoints->intra&inter-image graphs->self&cross-attention arxiv.org/pdf/2405.12979

Three papers accepted to #NeurIPS 3/3 NAVI: a dataset of image collections of objects, along with high-quality 3D object scans, near-perfect 2D-3D alignments, and accurate camera parameters. arxiv.org/abs/2306.09109 navidataset.github.io With @jampani_varun, @kmaninis, others












