Raphi Kang

32 posts

Raphi Kang

@RaphiKang

Caltech PhD doing Computer Vision / Mechanistic Interpretability things MIT '23

Katılım Ekim 2021

126 Takip Edilen74 Takipçiler

Sabitlenmiş Tweet

Raphi Kang@RaphiKang·27 Oca

🤓 How do LVLM/LMMMs reason about space and time? This was the central question of our #ICLR2016 paper, “Linear Mechanisms For Spatiotemporal Reasoning In Vision Language Models”. I’m very excited to finally share it:D 🥳🥳 A thread: [1/7]

English

3.5K

Raphi Kang retweetledi

David Bau@davidbau·2d

NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today. nethack.org/common/index.h… And ... it is a VERY cool large codebase to work with in the LLM era.

English

194

102K

Raphi Kang retweetledi

Aadarsh Sahoo@SahooAadarsh·17 Şub

Perception is actionable. Humans don't just see objects, we see affordances and constraints. "Something to sit on." "Region unsafe to walk." "Something that will tip if I bump it." But today’s vision models mostly see… labels. So we built ConverSeg: Conversational Image Segmentation 🧵 glab-caltech.github.io/converseg/

English

12.6K

Raphi Kang retweetledi

Bahareh Tolooshams@BTolooshams·2 Şub

Call for Reviewers | FOMORep @ ICML 2026 We are organizing the first workshop on the Geometry of Foundation Model Representations (FOMORep, under submission) at ICML 2026. Fill out this form if you are interested in serving on the Program Committee, whose role would be to review submissions. forms.gle/9f5zCP564LuARu… FOMORep focuses on understanding the geometry of representations learned by foundation models, specifically, what geometric structures these representations acquire, why they arise, and how they relate to performance and robustness. The workshop brings together researchers from representation learning, geometric machine learning, deep learning, and applied mathematics. Thanks for supporting the growing community for understanding foundation models with a geometric lens. -- Organizing Committee: Guy Gilboa (Technion) Raphi Kang (Caltech) @RaphiKang Uri Shaham (Bar-Ilan University) @UXShaham Yue Song (Caltech & Tsinghua University) @YueSong48287250 Bahareh Tolooshams (University of Alberta) Yossi Levi (Technion)

English

1.2K

Raphi Kang retweetledi

Damiano Marsili@marsilidamiano·27 Oca

Our paper, VALOR, got accepted at #ICLR2026 ! We explore improving visual reasoning using multimodal verifiers - all without any ground truth annotations! More details below 👇 Excited to see everyone in Rio!

Damiano Marsili@marsilidamiano

(1/N): Can we improve visual reasoning models without annotations? In VALOR, we introduce an annotation-free training framework that boosts both visual reasoning and object grounding by training with multimodal verifiers instead of human labels

English

4.8K

Raphi Kang@RaphiKang·27 Oca

@hongqiao_chen @georgiagkioxari Wow one day I will learn to tweet without mistakes. Obviously this is #ICLR2026 not 2016 🙈🙈 Oops.

English

180

Raphi Kang@RaphiKang·27 Oca

More details can be found in the paper: 📝 Arxiv: arxiv.org/pdf/2601.12626 🤖 Code: github.com/Raphoo/linear-… Shoutout to my awesome co-author @hongqiao_chen and advisors @georgiagkioxari & Pietro Perona! See you all in Rio 🇧🇷🇧🇷 :) [7/7]

English

189

Raphi Kang@RaphiKang·27 Oca

English

3.5K

Raphi Kang retweetledi

vincent!@vvhuang_·18 Ara

We trained a decoder to read the internal activations of an LLM and answer questions about what the model will think about or do next. We find that this decoder can understand LLM behaviors, even when the model itself is confused! (for instance, if the model has been jailbroken)

Transluce@TransluceAI

Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.

English

106

20.3K

Raphi Kang retweetledi

Ziqi Ma@ziqi__ma·16 Ara

Generative models shouldn’t just generate. They should be steerable by your commands. Meet Steer3D🕹️: edit generated 3D assets with text📝 in one forward pass. Trained on only 100k synthetic data, it shows that we can make generative models responsive to signals from another modality🎛️. Check out: glab-caltech.github.io/steer3d/

English

403

32.6K

Raphi Kang retweetledi

Damiano Marsili@marsilidamiano·15 Ara

English

9.1K

Raphi Kang retweetledi

Neehar Kondapaneni@TheRealPaneni·19 Kas

Excited to share our paper Representational Difference Explanations (RDX) was accepted to #NeurIPS2025! 🎉RDX is a new method for model diffing designed to isolate 🔍 representational differences. 1/7

English

3.1K

Raphi Kang retweetledi

Amil Dravid@_AmilDravid·4 Kas

Our paper "Vision Transformers Don't Need Trained Registers" will appear as a Spotlight at NeurIPS 2025! We uncover the mechanism behind high-norm tokens and attention sinks in ViTs, propose a training-free fix, and recently added an analytical model -- more on that below. ⬇️

Nick Jiang@nickhjiang

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

English

388

48.9K

Raphi Kang retweetledi

Yisong Yue@yisongyue·27 Eki

Since she's way too shy to post this herself, please join me in congratulating my amazing colleague and friend @klbouman for receiving tenure at @caltech! 🥳🎉

English

108

136

2.7M

Raphi Kang@RaphiKang·17 Eki

💡 We then propose Dense Cosine Similarity Maps (DCSMs): matrices preserving patch-token level topology, augmented with functional word awareness. We train a lightweight scoring module on top, which consistently outperform CLIP-like models. [5/6]

English

Raphi Kang@RaphiKang·17 Eki

📐 In our work, we formalize the CLIP latent space and show that no CLIP-style joint embedding w/ unit vectors + cosine similarity can at once represent basic image content, attribute binding, spatial relationships, and negation, due to geometric constraints. [4/6]

English

118

Raphi Kang@RaphiKang·17 Eki

🚀 Sharing our #ICCV2025 paper, "Is CLIP ideal? No. Can we fix it? Yes!". We will be at Poster session 5 (10:45AM 10/23), please come find us to chat or reach out online! A thread: [1/6]

English

893

Raphi Kang@RaphiKang·17 Eki

🔗 Find the paper here: arxiv.org/pdf/2503.08723 And code: github.com/Raphoo/DCSM_Id… Shoutouts to my awesome co-authors @YueSong48287250, @georgiagkioxari, and Pietro Perona !! ♥️ [6/6]

English

103

Keşfet

@UXShaham @YueSong48287250 @hongqiao_chen @georgiagkioxari @klbouman @Caltech @elonmusk @BarackObama