Harry Thasarathan

89 posts

Harry Thasarathan banner
Harry Thasarathan

Harry Thasarathan

@HThasarathan

PhD student @YorkUniversity @LassondeSchool, I work on computer vision and interpretability.

Toronto, Ontario Katılım Nisan 2019
443 Takip Edilen288 Takipçiler
Harry Thasarathan retweetledi
Tiberiu Mușat
Tiberiu Mușat@Tiberiu_Musat_·
Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data). The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights. arxiv.org/abs/2605.10878
Tiberiu Mușat tweet media
English
26
155
1.1K
137.2K
Harry Thasarathan retweetledi
Matthew Kowal
Matthew Kowal@MatthewKowal9·
Joining Kosta’s group was the best choice I could have made. Very thankful for your support and the other members ❤️🫡 HIGHLY recommend joining for what I can promise will be a great graduate school experience!!
Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

Seeing your students excel is the best feeling in academia. HUGE congrats to @MatthewKowal9 on receiving the @YorkUniversity 2025–26 Dissertation Award 💪 🥂 Here’s to you, Matt, and to much continued success. It was an honour being your supervisor, and I’m incredibly proud of you 🥲 Don’t forget us little people 😉

English
1
3
17
2.7K
Harry Thasarathan retweetledi
Alec Helbling
Alec Helbling@alec_helbling·
In 2000, neuroscientists showed that the retina of ferrets could be rerouted through the auditory pathway. The auditory cortex then adapted to support visual behavior. Reminds me of VLMs: connect a visual encoder to an LLM, and suddenly it can use visual information.
Alec Helbling tweet media
English
1
6
29
2.5K
Harry Thasarathan retweetledi
Arnas Uselis
Arnas Uselis@a_uselis·
How do embedding spaces of models that generalize from limited data look? We study what structure such models should exhibit. Turns out: linear and orthogonal. And modern embedding models like CLIP and SigLIP already show signs of it! 🧵 (1/n)
English
4
100
714
77.5K
Harry Thasarathan retweetledi
Thomas Fel
Thomas Fel@thomas_fel_·
Happy to share my first post since joining Goodfire. Neural geometry has been my obsession for years, and our team here is building a really serious research agenda around it. I can't wait to share the series of papers coming over the next few weeks... Brace for shapes 🍩
Goodfire@GoodfireAI

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English
55
71
1.1K
77.1K
Harry Thasarathan retweetledi
Lee Sharkey
Lee Sharkey@leedsharkey·
My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)
English
34
192
1.5K
241.4K
Harry Thasarathan retweetledi
Sonia Joseph
Sonia Joseph@soniajoseph_·
"Interpreting Physics in Video World Models" has been accepted into ICML! @icmlconf See you Seoul! 🙂
Sonia Joseph@soniajoseph_

Today we release a new paper from Meta @AIatMeta: "Interpreting Physics in Video World Models," one of the first interpretability studies of video encoders. V-JEPA 2 shows rich, counterintuitive behaviors, including brain-like population codes and high-dimensional steering.

English
4
18
154
17.9K
Harry Thasarathan retweetledi
Bo Wang
Bo Wang@BoWang87·
Love seeing Silico (@GoodfireAI ) used to probe our EchoJEPA's representations! this is exactly the kind of interpretability work that's been missing for JEPA-style models. One thing that makes EchoJEPA particularly interesting to interpret: unlike MAE-based approaches, it never reconstructs pixels. The model learns entirely in latent space through masked prediction, so you can't just look at decoder outputs to understand what it captured. Attribution onto a temporally aligned 3D mesh is a much more honest probe of what the representations actually encode. What we found in building EchoJEPA: training on 18M echo videos across 300K patients, the model learns to disentangle cardiac anatomy from ultrasound noise (speckle, reverberation artifacts) almost entirely through self-supervision. With 1% labeled data it already outperforms supervised baselines trained on 100%. The latent space is doing real anatomical work, but until you can visualize it like this, "real anatomical work" is mostly a claim. Paper + code: arxiv.org/abs/2602.02603 | github.com/bowang-lab/Ech…
English
7
47
286
29.1K
Harry Thasarathan retweetledi
Goodfire
Goodfire@GoodfireAI·
Introducing Silico: the platform for building AI models with the precision of written software. Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up. Early access is open now. 🧵(1/10)
English
20
114
870
110.1K
Harry Thasarathan retweetledi
Agus 🔸
Agus 🔸@austinc3301·
Lately @hankgreen's takes on AI safety have been *so* good. I'm so glad such a great communicator is so informed. youtu.be/V6pgZKVcKpw?si…
YouTube video
YouTube
English
6
8
186
8.7K
Harry Thasarathan retweetledi
Chris Offner
Chris Offner@chrisoffner3d·
"You might say 'I found a head that has this pattern, it must be meaningful' but you might discover that it's a side effect of training or that it was important early in training but the model has grown past it and it might even be disadvantageous later on." – @nsaphra
English
0
2
22
4.2K
Harry Thasarathan retweetledi
David Klindt
David Klindt@klindt_david·
So excited to finally share this! Linear probes often outperform SAEs, especially out-of-distribution (OOD). @thesubhashk @JoshAEngels et al showed this convincingly (arxiv.org/abs/2502.16681). This prompted @NeelNanda5 and others to de-emphasize SAE research. Empirically, fair enough. But we think the theoretical case for dictionary learning was dismissed too quickly. @oneill_c previously showed SAEs can't do proper sparse coding (arxiv.org/abs/2411.13117). @shruti_joshi @vpacela and @isacama_phys took this further and showed how this leads to problems particularly in OOD settings. So the issue may not be with dictionary learning itself, but with the current tools. Here's the core argument: if neural representations are in superposition, i.e. more features than dimensions encoded linearly (arxiv.org/abs/2503.01824), then linear probes fundamentally cannot be the answer. This is a compressed sensing problem. There's a linear measurement (the representation) and a nonlinear inference procedure (like an SAE encoder) that recovers the higher-dimensional sparse signal. Linear algebra tells us error-free recovery is impossible if decoding is restricted to be linear. (but see this cool work if errors are acceptable arxiv.org/abs/2602.11246) Check out our video: We have some neat demonstrations here. A linear decision boundary in 3D becomes nonlinear in 2D, even though all sparse combinations of latents remain distinguishable. Compressed sensing works: we can, in principle, recover the high-dimensional latent space where linear probes work and generalize OOD. Where does this leave us? With finite data and millions of concepts, simpler methods may perform better for a while. But if we want interpretability and safety methods that work OOD, especially compositional generalization covering all possible jailbreaks and real-world failures, we'll have to build bottom up from the right theory. @kennylpeng @thebasepoint @tegmark @yash_j_sharma @woog09 @livgorton @EkdeepL @thomas_fel_ @nsaphra
Shruti Joshi@_shruti_joshi_

SAEs fail at OOD tasks. Why? Features in superposition are linearly representable but not linearly accessible. Instead of discarding sparse coding, we embrace the geometry of superposition and use methods equipped to handle the nonlinearity it induces.

English
4
40
266
29.1K
Harry Thasarathan retweetledi
Kosta Derpanis (sabbatical in Zurich)
2-min teaser of my @ELLIS_Amsterdam #FoMo2026 talk. 341 slides. ~1 week to create from analog stage (brainstorming to sticky note storyboarding) to digital (slides). Mix of my recent research, teaching, history, comedy, and sound effects. It was fun making and delivering it!
Yuki@y_m_asano

Day 3 of the @ELLISforEurope FOMO School! We start with Europe-@CSProfKGD of @VectorInst, @YorkUniversity on understanding what's inside the black box. First: Why should we? 👉Safety. Regulation. Expand human knowledge. PS:Yours truly reporting from a standing desk today 🫡

English
5
6
32
3.9K
Harry Thasarathan retweetledi
Yuki
Yuki@y_m_asano·
Finally (how quickly the time went by!) he highlighted how we can combine good old SLIC clustering with dictionary learning to make learned concepts in video models more interpretable: e.g. one that tracks an object, one that focuses on where something is dropped into & how universal they are across models, in a manner similar to the Rosetta neurons of @YGandelsman. And also using SAEs on vision foundation models like DINOv2 to discover universal conceps (while being careful to not do too much coffee-reading)
Yuki tweet mediaYuki tweet mediaYuki tweet media
English
0
2
5
685
Harry Thasarathan retweetledi
Zhijing Jin
Zhijing Jin@ZhijingJin·
Mech interp or representation interp? We need to decode the causal computational graph of #LLMs—not just cataloguing representations (steering vectors etc). Analogy: we can’t understand biology by just blood composition. We need to understand how the body works. Same for LLMs.
Zhijing Jin tweet media
English
4
25
165
9.9K
Harry Thasarathan retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
Public reputation, rebuilding from years of many Americans feeling misled and left behind by big tech, is the hardest problem for the AI industry to solve. At this point, actually building AGI seems far easier.
Brad Gerstner@altcap

AI is deeply unpopular. According to Pew, sadly only 17% of Americans think AI will have a positive impact. In China, 83% believe AI will be positive. A token tax & political backlash is coming unless the narrative changes. 🇺🇸👀🧐

English
14
7
159
17.9K
Harry Thasarathan retweetledi
Ari Holtzman
Ari Holtzman@universeinanegg·
I love Anthropic posts, but why are they written to omit benign information that would make reproductions easier?
English
1
1
16
2.5K
Harry Thasarathan retweetledi
Sonia Joseph
Sonia Joseph@soniajoseph_·
Today we release a new paper from Meta @AIatMeta: "Interpreting Physics in Video World Models," one of the first interpretability studies of video encoders. V-JEPA 2 shows rich, counterintuitive behaviors, including brain-like population codes and high-dimensional steering.
Sonia Joseph tweet media
English
14
86
627
79.1K
Harry Thasarathan retweetledi
Maria Brbic
Maria Brbic@mariabrbic·
Are neural nets across modalities really converging to the same representation as they scale, as the Platonic Representation Hypothesis suggests? We show that common representational similarity metrics are confounded by network width & depth. We propose a permutation-based null calibration that fixes this. Result❓ • Global convergence largely disappears. • Local neighborhoods persist. We propose the alternative Aristotelian Representation Hypothesis: Neural networks, trained with different objectives on different data and modalities, are converging to shared local neighborhood relationships Very proud of @FabianGroger and @ShuoWen18 for this work! Paper: arxiv.org/abs/2602.14486 Webpage: brbiclab.epfl.ch/aristotelian Code: github.com/mlbio-epfl/ari…
English
12
91
598
69.2K