Harry Thasarathan

89 posts

Harry Thasarathan

@HThasarathan

PhD student @YorkUniversity @LassondeSchool, I work on computer vision and interpretability.

Toronto, Ontario Katılım Nisan 2019

443 Takip Edilen288 Takipçiler

Harry Thasarathan retweetledi

Tiberiu Mușat@Tiberiu_Musat_·2d

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data). The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights. arxiv.org/abs/2605.10878

English

155

1.1K

137.2K

Harry Thasarathan retweetledi

Matthew Kowal@MatthewKowal9·6d

Joining Kosta’s group was the best choice I could have made. Very thankful for your support and the other members ❤️🫡 HIGHLY recommend joining for what I can promise will be a great graduate school experience!!

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

Seeing your students excel is the best feeling in academia. HUGE congrats to @MatthewKowal9 on receiving the @YorkUniversity 2025–26 Dissertation Award 💪 🥂 Here’s to you, Matt, and to much continued success. It was an honour being your supervisor, and I’m incredibly proud of you 🥲 Don’t forget us little people 😉

English

2.7K

Harry Thasarathan retweetledi

Alec Helbling@alec_helbling·11 May

In 2000, neuroscientists showed that the retina of ferrets could be rerouted through the auditory pathway. The auditory cortex then adapted to support visual behavior. Reminds me of VLMs: connect a visual encoder to an LLM, and suddenly it can use visual information.

English

2.5K

Harry Thasarathan retweetledi

Arnas Uselis@a_uselis·11 Mar

How do embedding spaces of models that generalize from limited data look? We study what structure such models should exhibit. Turns out: linear and orthogonal. And modern embedding models like CLIP and SigLIP already show signs of it! 🧵 (1/n)

English

100

714

77.5K

Harry Thasarathan retweetledi

Thomas Fel@thomas_fel_·7 May

Happy to share my first post since joining Goodfire. Neural geometry has been my obsession for years, and our team here is building a really serious research agenda around it. I can't wait to share the series of papers coming over the next few weeks... Brace for shapes 🍩

Goodfire@GoodfireAI

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English

1.1K

77.1K

Harry Thasarathan retweetledi

Lee Sharkey@leedsharkey·5 May

My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)

English

192

1.5K

241.4K

Harry Thasarathan retweetledi

Sonia Joseph@soniajoseph_·1 May

"Interpreting Physics in Video World Models" has been accepted into ICML! @icmlconf See you Seoul! 🙂

Sonia Joseph@soniajoseph_

Today we release a new paper from Meta @AIatMeta: "Interpreting Physics in Video World Models," one of the first interpretability studies of video encoders. V-JEPA 2 shows rich, counterintuitive behaviors, including brain-like population codes and high-dimensional steering.

English

154

17.9K

Harry Thasarathan retweetledi

Bo Wang@BoWang87·30 Nis

Love seeing Silico (@GoodfireAI ) used to probe our EchoJEPA's representations! this is exactly the kind of interpretability work that's been missing for JEPA-style models. One thing that makes EchoJEPA particularly interesting to interpret: unlike MAE-based approaches, it never reconstructs pixels. The model learns entirely in latent space through masked prediction, so you can't just look at decoder outputs to understand what it captured. Attribution onto a temporally aligned 3D mesh is a much more honest probe of what the representations actually encode. What we found in building EchoJEPA: training on 18M echo videos across 300K patients, the model learns to disentangle cardiac anatomy from ultrasound noise (speckle, reverberation artifacts) almost entirely through self-supervision. With 1% labeled data it already outperforms supervised baselines trained on 100%. The latent space is doing real anatomical work, but until you can visualize it like this, "real anatomical work" is mostly a claim. Paper + code: arxiv.org/abs/2602.02603 | github.com/bowang-lab/Ech…

English

286

29.1K

Harry Thasarathan retweetledi

Goodfire@GoodfireAI·30 Nis

Introducing Silico: the platform for building AI models with the precision of written software. Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up. Early access is open now. 🧵(1/10)

English

114

870

110.1K

Harry Thasarathan retweetledi

Agus 🔸@austinc3301·11 Nis

Lately @hankgreen's takes on AI safety have been *so* good. I'm so glad such a great communicator is so informed. youtu.be/V6pgZKVcKpw?si…

YouTube

English

186

8.7K

Harry Thasarathan retweetledi

Chris Offner@chrisoffner3d·9 Nis

"You might say 'I found a head that has this pattern, it must be meaningful' but you might discover that it's a side effect of training or that it was important early in training but the model has grown past it and it might even be disadvantageous later on." – @nsaphra

English

4.2K

Harry Thasarathan retweetledi

Shruti Joshi@_shruti_joshi_·2 Nis

Check out our paper to find out more: arxiv.org/abs/2603.28744 code/docs: shrutij01.github.io/compositional-… jointly led w/ @vpacela. with @isacama_phys, @SimonLacosteJ, @klindt_david

English

1.7K

Harry Thasarathan retweetledi

David Klindt@klindt_david·2 Nis

So excited to finally share this! Linear probes often outperform SAEs, especially out-of-distribution (OOD). @thesubhashk @JoshAEngels et al showed this convincingly (arxiv.org/abs/2502.16681). This prompted @NeelNanda5 and others to de-emphasize SAE research. Empirically, fair enough. But we think the theoretical case for dictionary learning was dismissed too quickly. @oneill_c previously showed SAEs can't do proper sparse coding (arxiv.org/abs/2411.13117). @shruti_joshi @vpacela and @isacama_phys took this further and showed how this leads to problems particularly in OOD settings. So the issue may not be with dictionary learning itself, but with the current tools. Here's the core argument: if neural representations are in superposition, i.e. more features than dimensions encoded linearly (arxiv.org/abs/2503.01824), then linear probes fundamentally cannot be the answer. This is a compressed sensing problem. There's a linear measurement (the representation) and a nonlinear inference procedure (like an SAE encoder) that recovers the higher-dimensional sparse signal. Linear algebra tells us error-free recovery is impossible if decoding is restricted to be linear. (but see this cool work if errors are acceptable arxiv.org/abs/2602.11246) Check out our video: We have some neat demonstrations here. A linear decision boundary in 3D becomes nonlinear in 2D, even though all sparse combinations of latents remain distinguishable. Compressed sensing works: we can, in principle, recover the high-dimensional latent space where linear probes work and generalize OOD. Where does this leave us? With finite data and millions of concepts, simpler methods may perform better for a while. But if we want interpretability and safety methods that work OOD, especially compositional generalization covering all possible jailbreaks and real-world failures, we'll have to build bottom up from the right theory. @kennylpeng @thebasepoint @tegmark @yash_j_sharma @woog09 @livgorton @EkdeepL @thomas_fel_ @nsaphra

Shruti Joshi@_shruti_joshi_

SAEs fail at OOD tasks. Why? Features in superposition are linearly representable but not linearly accessible. Instead of discarding sparse coding, we embrace the geometry of superposition and use methods equipped to handle the nonlinearity it induces.

English

266

29.1K

Harry Thasarathan retweetledi

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD·27 Mar

2-min teaser of my @ELLIS_Amsterdam #FoMo2026 talk. 341 slides. ~1 week to create from analog stage (brainstorming to sticky note storyboarding) to digital (slides). Mix of my recent research, teaching, history, comedy, and sound effects. It was fun making and delivering it!

Yuki@y_m_asano

Day 3 of the @ELLISforEurope FOMO School! We start with Europe-@CSProfKGD of @VectorInst, @YorkUniversity on understanding what's inside the black box. First: Why should we? 👉Safety. Regulation. Expand human knowledge. PS:Yours truly reporting from a standing desk today 🫡

English

3.9K

Harry Thasarathan retweetledi

Yuki@y_m_asano·26 Mar

Finally (how quickly the time went by!) he highlighted how we can combine good old SLIC clustering with dictionary learning to make learned concepts in video models more interpretable: e.g. one that tracks an object, one that focuses on where something is dropped into & how universal they are across models, in a manner similar to the Rosetta neurons of @YGandelsman. And also using SAEs on vision foundation models like DINOv2 to discover universal conceps (while being careful to not do too much coffee-reading)

English

685

Harry Thasarathan retweetledi

Zhijing Jin@ZhijingJin·23 Mar

Mech interp or representation interp? We need to decode the causal computational graph of #LLMs—not just cataloguing representations (steering vectors etc). Analogy: we can’t understand biology by just blood composition. We need to understand how the body works. Same for LLMs.

English

165

9.9K

Harry Thasarathan retweetledi

Nathan Lambert@natolambert·16 Mar

Public reputation, rebuilding from years of many Americans feeling misled and left behind by big tech, is the hardest problem for the AI industry to solve. At this point, actually building AGI seems far easier.

Brad Gerstner@altcap

AI is deeply unpopular. According to Pew, sadly only 17% of Americans think AI will have a positive impact. In China, 83% believe AI will be positive. A token tax & political backlash is coming unless the narrative changes. 🇺🇸👀🧐

English

159

17.9K

Harry Thasarathan retweetledi

Ari Holtzman@universeinanegg·2 Mar

I love Anthropic posts, but why are they written to omit benign information that would make reproductions easier?

English

2.5K

Harry Thasarathan retweetledi

Sonia Joseph@soniajoseph_·25 Şub

English

627

79.1K

Harry Thasarathan retweetledi

Maria Brbic@mariabrbic·17 Şub

Are neural nets across modalities really converging to the same representation as they scale, as the Platonic Representation Hypothesis suggests? We show that common representational similarity metrics are confounded by network width & depth. We propose a permutation-based null calibration that fixes this. Result❓ • Global convergence largely disappears. • Local neighborhoods persist. We propose the alternative Aristotelian Representation Hypothesis: Neural networks, trained with different objectives on different data and modalities, are converging to shared local neighborhood relationships Very proud of @FabianGroger and @ShuoWen18 for this work! Paper: arxiv.org/abs/2602.14486 Webpage: brbiclab.epfl.ch/aristotelian Code: github.com/mlbio-epfl/ari…

English

598

69.2K

Keşfet

@GoodfireAI @icmlconf @hankgreen @nsaphra @vpacela @isacama_phys @SimonLacosteJ @klindt_david