Neehar Kondapaneni

78 posts

Neehar Kondapaneni

@TheRealPaneni

Caltech PhD @ The Vision Lab | Researching model interpretability with a focus on model comparison/diffing.

Katılım Haziran 2010

289 Takip Edilen131 Takipçiler

Sabitlenmiş Tweet

Neehar Kondapaneni@TheRealPaneni·19 Kas

Excited to share our paper Representational Difference Explanations (RDX) was accepted to #NeurIPS2025! 🎉RDX is a new method for model diffing designed to isolate 🔍 representational differences. 1/7

English

3.1K

Neehar Kondapaneni retweetledi

Goodfire@GoodfireAI·5d

New Goodfire research: surfacing and mitigating undesired side effects of LLM post-training via probe-based data attribution!

Santiago Aranguri@santiaranguri

Post-training can introduce harmful side effects. Probes can trace one such effect to specific training data, which can be filtered to cut the behavior by 63% — beating LLM-judge and gradient attribution at 10× lower cost. Bonus: probes can surface unknown side effects! (1/6)

English

118

11.1K

Neehar Kondapaneni retweetledi

Goodfire@GoodfireAI·14 Nis

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

English

159

841

185.4K

Neehar Kondapaneni retweetledi

Goodfire@GoodfireAI·1 Nis

Introducing self-correcting search: a technique to let diffusion models self-correct mid-trajectory. Working with @RadicalAI, we gave MatterGen a feedback loop from its own activations, improving viable on-target candidates by ~30%. (1/8)

GIF

English

465

82.7K

Neehar Kondapaneni retweetledi

FAR.AI@farairesearch·23 Şub

1/ Training data attribution (TDA) is broken: methods are slow and find syntactically similar data, not actual causes. Our solution Concept Influence: semantically meaningful results, better performance, 20x faster approximations. We attribute it to concepts, not examples. 🧵

English

2.1K

Neehar Kondapaneni retweetledi

David Chanin@chanindav·17 Şub

SAEs fail even when the Linear Representation Hypothesis holds perfectly. We built SynthSAEBench: large-scale synthetic data with 16k ground-truth features, correlation, hierarchy, and superposition. We trained 5 SAE architectures on it. None achieve perfect feature recovery.

English

218

9.8K

Neehar Kondapaneni retweetledi

Or Shafran@OrShafran·5 Şub

It's time to look past dictionary learning for decomposing LM activations. What happens when we instead leverage local geometry? We find a natural region-based decomposition that yields better steering and localization 🧵 1/

GIF

English

152

22.8K

Neehar Kondapaneni retweetledi

Andy Keller@t_andy_keller·15 Oca

When you're crossing the street and turn your head, you typically remember whether or not a car is coming from the other direction - so why can't today's world models? Introducing Flow Equivariant World Models flowequivariantworldmodels.github.io Led by @hansenlillemark & @huskydogewoof🧵👇

Hansen Lillemark@hansenlillemark

State of the art World Models still lack a unified world memory for representing and predicting dynamics out of their field of view. Why is that, and how can we fix it? Introducing Flow Equivariant World Models: models with memory capable of predicting out of view dynamics!🧵⬇️

English

3.6K

Neehar Kondapaneni retweetledi

Zihan "Zenus" Wang ✈️ ICLR@wzenus·19 Ara

Everything is a world model if you squint hard enough.

English

112

887

55.7K

Neehar Kondapaneni retweetledi

Nick Jiang@nickhjiang·16 Ara

New work! What if we used sparse autoencoders to analyze data, not models—where SAE latents act as a large set of data labels 🏷️? We find that SAEs beat baselines on 4 data analysis tasks and uncover surprising, qualitative insights about models (e.g. Grok-4, OpenAI) from data.

English

258

82.3K

Neehar Kondapaneni retweetledi

Damiano Marsili@marsilidamiano·15 Ara

(1/N): Can we improve visual reasoning models without annotations? In VALOR, we introduce an annotation-free training framework that boosts both visual reasoning and object grounding by training with multimodal verifiers instead of human labels

English

9.1K

Neehar Kondapaneni@TheRealPaneni·8 Ara

@neur_reps @SuryaGanguli Are the talks recorded?

English

Symmetry and Geometry in Neural Representations@neur_reps·7 Ara

Thrilled to welcome Surya Ganguli (@SuryaGanguli) from Stanford as a NeurReps invited speaker! At 1:30 pm, he will present on "New Mathematical Approaches to Interpretability & Robustness that Directly Confront the High Dimensionality & Nonlinearity of Neural Representations"

Symmetry and Geometry in Neural Representations tweet media

English

9.3K

Neehar Kondapaneni retweetledi

Ekdeep Singh Lubana@EkdeepL·7 Ara

Come by the Interp workshop of the day! :D

CogInterp Workshop @ NeurIPS 2025@CogInterp

We're stoked to see everyone at the first workshop on interpreting cognition in deep learning models today @NeurIPSConf! There will be an extremely exciting lineup of speakers and spotlight talks, with an equally exciting poster session in between. (1/3)

English

4.1K

Neehar Kondapaneni@TheRealPaneni·3 Ara

I’ll be presenting our poster at #NeurIPS2025 today at 4:30PM , Exhibit Hall C,D,E # 1115. Come by and check out our approach for isolating model differences ⬇️!

Neehar Kondapaneni@TheRealPaneni

English

592

Neehar Kondapaneni@TheRealPaneni·1 Ara

@unireps @NeurIPSConf Where is the participation form?

English

UniReps@unireps·1 Ara

🔵🔴 Join us for the UniReps Workshop: Unifying Representations in Neural Models at @NeurIPSConf 2025! 📍 Ballroom 20D, San Diego Convention Center Dec 6 Don’t forget to fill out the participation form. Joining in person or remotely? We welcome your questions for the panel. 🔗 unireps.org

English

7.3K

Neehar Kondapaneni@TheRealPaneni·1 Ara

@AndrewLampinen Would love to chat. I read your work on aligning human and model representations and really enjoyed it.

English

157

Andrew Lampinen@AndrewLampinen·30 Kas

Heading to NeurIPS this week! Let me know if you want to chat about science of what models learn, interpretability, what models learn in context vs. from their training data, etc. A few things I'm involved in:

English

163

14.4K

Neehar Kondapaneni retweetledi

Bidipta Sarkar@bidiptas13·21 Kas

Introducing 🥚EGGROLL 🥚(Evolution Guided General Optimization via Low-rank Learning)! 🚀 Scaling backprop-free Evolution Strategies (ES) for billion-parameter models at large population sizes ⚡100x Training Throughput 🎯Fast Convergence 🔢Pure Int8 Pretraining of RNN LLMs

English

152

997

293.7K

Neehar Kondapaneni retweetledi

Yiming Li@YimingLi9702·26 Kas

🤔Visual-spatial reasoning requires a shift from a disembodied, passive paradigm to an embodied, active one: 🤖Grounding V* in humanoid agents! 🚀Introducing H* - a dataset, benchmark, and baseline to enable human-like visual search in real 360° environments! 🧵👇[1/n]

English

164

53.1K

Neehar Kondapaneni retweetledi

Paul Vicol@PaulVicol·25 Kas

🚀Introducing TMLR Beyond PDF! 🎬This is a new, HTML-based submission format for TMLR, that supports interactive figures and videos, along with the usual LaTeX and images. 🎉Thanks to TMLR Editors in Chief @hugo_larochelle @thegautamkamath @NailaMurray Nihar B. Shah @lcharlin!

English

213

64.4K

Neehar Kondapaneni retweetledi

Bo Wang@BoWang87·20 Kas

Tiny Models, Massive Capacity, Zero Labels — this is the future of health AI!! Thrilled to share that our paper-- EVA-X: a foundation model for general chest X-ray analysis with self-supervised learning, is now published in @Nature_NPJ! In collaboration with @XinggangWang’s group, we introduce EVA-X, a universal X-ray foundation model trained on 520k+ unlabeled images, capable of analyzing 20+ chest pathologies without heavy manual annotation, with only 6M parameters! 🔗 Paper: nature.com/articles/s4174… 🔗 Code & models: github.com/hustvl/EVA-X EVA-X is fully open-source, with pretrained models and a plug-and-play codebase — try it out and build on it! ⭐ Highlights — New SSL paradigm: Semantic tokenizer + masked image modeling for rich global + local features. — Versatile: SoTA on 10 downstream tasks — classification, segmentation, localization. — Data-efficient: 95% COVID-19 detection accuracy using 1% labeled data. — Tiny but mighty: 6M-parameter EVA-X-Ti outperforms much larger baselines. — Robust: Learns semantic + geometric cues, enabling broad clinical applicability. Proud of this milestone — another step toward scalable, annotation-free medical foundation models. 💪🩻

English

106

10.3K

Neehar Kondapaneni retweetledi

Transluce@TransluceAI·20 Kas

Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!

English

370

116.6K

Keşfet

@MayoClinic @RadicalAI @hansenlillemark @huskydogewoof @neur_reps @SuryaGanguli @unireps @NeurIPSConf