Michael Pearce

88 posts

Michael Pearce banner
Michael Pearce

Michael Pearce

@_MichaelPearce

Interpretability @ Goodfire | Physics | Evolution | Ecology

Katılım Eylül 2015
625 Takip Edilen174 Takipçiler
Michael Pearce retweetledi
Garyk Brixi
Garyk Brixi@garykbrixi·
Evo 2 is out in Nature today, showing that genome language models can predict and design across the full complexity of life, from phages to eukaryotes. A few surprises from the project, including how ignoring trillions of nucleotides was key to getting a good model. 🧵
Garyk Brixi tweet media
English
13
208
1K
99.1K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.
English
30
60
488
207K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)
Goodfire tweet media
English
50
223
1.7K
394.2K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)
Goodfire tweet media
English
12
49
332
70K
Michael Pearce retweetledi
Wes Gurnee
Wes Gurnee@wesg52·
New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!
Wes Gurnee tweet media
English
44
323
2.5K
459.2K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
Agents for experimental research != agents for software development. This is a key lesson we've learned after several months refining agentic workflows! More takeaways on effectively using experimenter agents + a key tool we're open-sourcing to enable them: 🧵
Goodfire tweet media
English
3
30
221
70K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
We're excited to announce a collaboration with @MayoClinic! We're working to improve personalized patient outcomes by extracting richer, more reliable signals from genomic & digital pathology models. That could mean novel biomarkers, personalized diagnostics, & more.
Goodfire tweet media
English
3
10
72
10.1K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
Does making an SAE bigger let you explain more of your model's features? New research from @ericjmichaud_ models SAE scaling dynamics, and explores whether SAEs will pack increasingly many latents onto a few multidimensional features, rather than learning more features.
Goodfire tweet media
English
3
19
154
27.1K
Michael Pearce
Michael Pearce@_MichaelPearce·
@ATinyGreenCell @pdhsu @GoodfireAI Good questions, we used a set of 2400 prokaryote genomes with complete assemblies that are representative genomes in both the GTDB and NCBI databases. No viruses or metagenomic assemblies!
English
1
0
1
79
Sebastian S. Cocioba🪄🌷
Sebastian S. Cocioba🪄🌷@ATinyGreenCell·
@pdhsu @GoodfireAI Im assuming all these data were carefully curated and checked for contamination in the RSA data? Was this refseq only? Was this metagenomics as well? Do you consider MAGs as real? Does this account for virome? Connections based on all available sequences obv relies on data quali
English
1
0
4
346
Michael Pearce
Michael Pearce@_MichaelPearce·
@PhilEmmanuele @GoodfireAI The flow of gathering activations from random genomic regions and averaging is described in the post. We’re happy to share the resulting data—the species-averaged embeddings and the phylogenetic distances between species—for you to play with. Will let you know when it's available
English
0
0
0
16
Goodfire
Goodfire@GoodfireAI·
Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world? Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)
Goodfire tweet media
GIF
English
10
53
366
77.2K
Michael Pearce
Michael Pearce@_MichaelPearce·
@J33P4 @GoodfireAI We indeed expected it to be there! But what's novel is understanding how the model represents the tree of life (manifold structure and low-dim subspace) and the techniques we developed to isolate the representations, which we plan to extend to more complex bioinformatic questions
English
1
0
1
59
Joshua Batson
Joshua Batson@thebasepoint·
@GoodfireAI A tiny question: how is alpha (a loss coefficient) learnable? Seems like if you let it vary the model would send it to zero (or negative).
Joshua Batson tweet media
English
2
0
6
1.4K
Michael Pearce
Michael Pearce@_MichaelPearce·
@Sauers_ @GoodfireAI Agree, more would be better! We used 2400+ species but did notice that certain clades were a bit underrepresented by the sampling. Those ones tended to be placed in the center of the umaps.
English
1
0
1
44
Sauers
Sauers@Sauers_·
@GoodfireAI Very cool. You guys should try it on more species!
English
1
0
0
50
Michael Pearce
Michael Pearce@_MichaelPearce·
@Sauers_ @GoodfireAI We haven’t looked yet, but it’d be interesting to see how the species representation changes along sequences with different ancestry due to horizontal gene transfer.
English
1
0
1
16
Michael Pearce
Michael Pearce@_MichaelPearce·
@Sauers_ @GoodfireAI We're definitely interesting in better interpreting the directions so those types of metrics could be useful there. Most metrics are based on direct sequence comparisons, though, which the model isn’t necessarily doing.
English
1
0
1
27
Michael Pearce
Michael Pearce@_MichaelPearce·
The structure seems consistent with the manifold with “ripples” picture seen in LLMs. Finding similar patterns across diverse models hints at a general organizing principle behind feature geometry. Looking forward to characterizing more biological structures in genomic models!
English
0
0
7
170
Michael Pearce
Michael Pearce@_MichaelPearce·
Excited to share our work digging into how Evo 2 represents species relatedness or phylogeny. Genetics provides a good quantitative measure of relatedness, so we could use it to probe the model and see if its internal geometry reflects it.
Goodfire@GoodfireAI

Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world? Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)

English
1
8
46
8K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
Adversarial examples - a vulnerability of every AI model, and a “mystery” of deep learning - may simply come from models cramming many features into the same neurons! Less feature interference → more robust models. New research from @livgorton 🧵 (1/4)
Goodfire tweet media
English
4
25
250
28.8K
Michael Pearce retweetledi
Goodfire
Goodfire@GoodfireAI·
New research! Post-training often causes weird, unwanted behaviors that are hard to catch before deployment because they only crop up rarely - then are found by bewildered users. How can we find these efficiently? (1/7)
English
10
37
369
46.3K
Michael Pearce retweetledi
Jack Merullo
Jack Merullo@jack_merullo_·
Could we tell if gpt-oss was memorizing its training data? I.e., points where it’s reasoning vs reciting? We took a quick look at the curvature of the loss landscape of the 20B model to understand memorization and what’s happening internally during reasoning
Jack Merullo tweet media
English
14
53
515
46.9K