Goodfire

475 posts

Goodfire banner
Goodfire

Goodfire

@GoodfireAI

Using interpretability to understand, learn from, and design AI.

San Francisco Katılım Ağustos 2024
28 Takip Edilen14.1K Takipçiler
Sabitlenmiş Tweet
Goodfire
Goodfire@GoodfireAI·
Introducing Silico: the platform for building AI models with the precision of written software. Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up. Early access is open now. 🧵(1/10)
English
20
112
847
103.4K
Goodfire
Goodfire@GoodfireAI·
Takeaway for eval design: treat verbalized eval awareness as a signal that the model doesn’t find an interaction genuine, inspect reasoning across rollouts to find why, and fix what looks artificial. More realistic evals are within reach! Full post: goodfire.ai/research/verba… (7/7)
English
0
1
12
1.8K
Goodfire
Goodfire@GoodfireAI·
What about internals? We show steering vectors that reduce verbalized eval awareness — including ones in recent system cards — may do so by changing how the model represents user intent. So the eval may measure the model under a different intent than it was meant to test. (6/7)
English
2
0
12
613
Goodfire
Goodfire@GoodfireAI·
New research from @AISecurityInst and Goodfire: Models sometimes recognize they're being evaluated, occasionally even identifying the benchmark. We show this verbalized eval awareness inflates safety scores, meaning safety benchmarks may not reflect real-world behavior. (1/7)
Goodfire tweet media
English
7
20
142
13.4K
Goodfire retweetledi
Tom McGrath
Tom McGrath@banburismus_·
if you're wondering what sort if thing you can do with silico, this is a great example!
Bo Wang@BoWang87

Love seeing Silico (@GoodfireAI ) used to probe our EchoJEPA's representations! this is exactly the kind of interpretability work that's been missing for JEPA-style models. One thing that makes EchoJEPA particularly interesting to interpret: unlike MAE-based approaches, it never reconstructs pixels. The model learns entirely in latent space through masked prediction, so you can't just look at decoder outputs to understand what it captured. Attribution onto a temporally aligned 3D mesh is a much more honest probe of what the representations actually encode. What we found in building EchoJEPA: training on 18M echo videos across 300K patients, the model learns to disentangle cardiac anatomy from ultrasound noise (speckle, reverberation artifacts) almost entirely through self-supervision. With 1% labeled data it already outperforms supervised baselines trained on 100%. The latent space is doing real anatomical work, but until you can visualize it like this, "real anatomical work" is mostly a claim. Paper + code: arxiv.org/abs/2602.02603 | github.com/bowang-lab/Ech…

English
0
6
42
4K
Goodfire retweetledi
Bo Wang
Bo Wang@BoWang87·
Love seeing Silico (@GoodfireAI ) used to probe our EchoJEPA's representations! this is exactly the kind of interpretability work that's been missing for JEPA-style models. One thing that makes EchoJEPA particularly interesting to interpret: unlike MAE-based approaches, it never reconstructs pixels. The model learns entirely in latent space through masked prediction, so you can't just look at decoder outputs to understand what it captured. Attribution onto a temporally aligned 3D mesh is a much more honest probe of what the representations actually encode. What we found in building EchoJEPA: training on 18M echo videos across 300K patients, the model learns to disentangle cardiac anatomy from ultrasound noise (speckle, reverberation artifacts) almost entirely through self-supervision. With 1% labeled data it already outperforms supervised baselines trained on 100%. The latent space is doing real anatomical work, but until you can visualize it like this, "real anatomical work" is mostly a claim. Paper + code: arxiv.org/abs/2602.02603 | github.com/bowang-lab/Ech…
English
7
45
281
26.7K
Goodfire retweetledi
Sauers
Sauers@Sauers_·
This beats the standard method (CADD, used in clinical genetics) on a type of variant (small insertions / deletions) that was never seen in the training data (single-letter variants only) by Goodfire's model, but was seen by CADD!
Sauers tweet media
Goodfire@GoodfireAI

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

English
0
3
24
2.4K
Goodfire
Goodfire@GoodfireAI·
@pranavxviswa Silico lets you shape model behavior in many ways, including steering vectors, but the biggest successes are generally in shaping the training process itself!
English
0
0
3
804
Pranav Viswanath
Pranav Viswanath@pranavxviswa·
@GoodfireAI Congrats on the launch, super cool product! To shape model behavior does it use steering vectors based on the desired behavior, and how do you ensure it doesn’t degrade the rest of model behavior?
English
1
0
2
940
Goodfire
Goodfire@GoodfireAI·
Introducing Silico: the platform for building AI models with the precision of written software. Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up. Early access is open now. 🧵(1/10)
English
20
112
847
103.4K
Goodfire retweetledi
Yan-David (Yanda) Erlich
What if you could get the power of AI with the precision engineering of “traditional” software? If that feels like having your cake and eating it too, then @GoodfireAI is serving up infinite cake ♾️🎂.
Goodfire@GoodfireAI

Introducing Silico: the platform for building AI models with the precision of written software. Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up. Early access is open now. 🧵(1/10)

English
1
2
19
2.8K
Goodfire
Goodfire@GoodfireAI·
@_virgil19 We use a broad set of tools, much more than SAEs! It's true that standard SAE features can be inconsistent across runs (though check out Archetypal SAEs). Silico is equipped with many different tools, and knows how to use them with the appropriate nuance
English
1
0
7
944
Virgil Maro
Virgil Maro@_virgil19·
@GoodfireAI the bit i keep wondering with tools like Silico: when you find a feature, is it stable across different encoding choices, or an artifact of the SAE you trained? engrams hit the same wall, what you tag at encoding determines what cell-set you can re-fire.
English
2
0
1
1.2K
Goodfire
Goodfire@GoodfireAI·
@subminima Yes! Model health checks include an entire set of tests that study signal propagation (forward and backward) through the model
English
0
0
2
47
min
min@subminima·
@GoodfireAI can I view exploding gradients with it?
English
1
0
4
813