Michael Pearce

88 posts

Michael Pearce

@_MichaelPearce

Interpretability @ Goodfire | Physics | Evolution | Ecology

Katılım Eylül 2015

625 Takip Edilen174 Takipçiler

Michael Pearce retweetledi

Garyk Brixi@garykbrixi·4 Mar

Evo 2 is out in Nature today, showing that genome language models can predict and design across the full complexity of life, from phages to eukaryotes. A few surprises from the project, including how ignoring trillions of nucleotides was key to getting a good model. 🧵

English

208

99.1K

Michael Pearce retweetledi

Goodfire@GoodfireAI·5 Şub

We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.

English

488

207K

Michael Pearce retweetledi

Goodfire@GoodfireAI·28 Oca

We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)

English

223

1.7K

394.2K

Michael Pearce retweetledi

Goodfire@GoodfireAI·29 Eki

Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)

English

332

70K

Michael Pearce retweetledi

Wes Gurnee@wesg52·21 Eki

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

English

323

2.5K

459.2K

Michael Pearce retweetledi

Goodfire@GoodfireAI·2 Eki

Agents for experimental research != agents for software development. This is a key lesson we've learned after several months refining agentic workflows! More takeaways on effectively using experimenter agents + a key tool we're open-sourcing to enable them: 🧵

English

221

70K

Michael Pearce retweetledi

Goodfire@GoodfireAI·9 Eyl

We're excited to announce a collaboration with @MayoClinic! We're working to improve personalized patient outcomes by extracting richer, more reliable signals from genomic & digital pathology models. That could mean novel biomarkers, personalized diagnostics, & more.

English

10.1K

Michael Pearce retweetledi

Goodfire@GoodfireAI·9 Eyl

Does making an SAE bigger let you explain more of your model's features? New research from @ericjmichaud_ models SAE scaling dynamics, and explores whether SAEs will pack increasingly many latents onto a few multidimensional features, rather than learning more features.

English

154

27.1K

Michael Pearce@_MichaelPearce·29 Ağu

@ATinyGreenCell @pdhsu @GoodfireAI Good questions, we used a set of 2400 prokaryote genomes with complete assemblies that are representative genomes in both the GTDB and NCBI databases. No viruses or metagenomic assemblies!

English

Sebastian S. Cocioba🪄🌷@ATinyGreenCell·29 Ağu

@pdhsu @GoodfireAI Im assuming all these data were carefully curated and checked for contamination in the RSA data? Was this refseq only? Was this metagenomics as well? Do you consider MAGs as real? Does this account for virome? Connections based on all available sequences obv relies on data quali

English

346

Patrick Hsu@pdhsu·27 Ağu

Finding the tree of life within Evo 2 - amazing work by our collaborators @GoodfireAI

Goodfire@GoodfireAI

Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world? Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)

English

184

21.9K

Michael Pearce@_MichaelPearce·29 Ağu

@PhilEmmanuele @GoodfireAI The flow of gathering activations from random genomic regions and averaging is described in the post. We’re happy to share the resulting data—the species-averaged embeddings and the phylogenetic distances between species—for you to play with. Will let you know when it's available

English

Philip Emmanuele@PhilEmmanuele·28 Ağu

@GoodfireAI Is this data available? I’d love to take a look at it

English

Goodfire@GoodfireAI·27 Ağu

GIF

English

366

77.2K

Michael Pearce@_MichaelPearce·28 Ağu

@J33P4 @GoodfireAI We indeed expected it to be there! But what's novel is understanding how the model represents the tree of life (manifold structure and low-dim subspace) and the techniques we developed to isolate the representations, which we plan to extend to more complex bioinformatic questions

English

𝗝 𝟯 𝟯 𝗣 𝟰 | 𝗷𝟯𝟯𝗽𝟰.𝗲𝘁𝗵@J33P4·28 Ağu

@GoodfireAI Why is this surprising when it was trained on the tree of life? And that it is well recognised you can detect this from phylogenetic types of comparisons. What’s new about this?

English

329

Michael Pearce@_MichaelPearce·28 Ağu

@thebasepoint @GoodfireAI Good catch! That's written wrong. Beta is learnable, but alpha is just a hyperparameter. We'll update it

English

147

Joshua Batson@thebasepoint·27 Ağu

@GoodfireAI A tiny question: how is alpha (a loss coefficient) learnable? Seems like if you let it vary the model would send it to zero (or negative).

English

1.4K

Michael Pearce@_MichaelPearce·27 Ağu

@Sauers_ @GoodfireAI Agree, more would be better! We used 2400+ species but did notice that certain clades were a bit underrepresented by the sampling. Those ones tended to be placed in the center of the umaps.

English

Sauers@Sauers_·27 Ağu

@GoodfireAI Very cool. You guys should try it on more species!

English

Michael Pearce@_MichaelPearce·27 Ağu

@Sauers_ @GoodfireAI We haven’t looked yet, but it’d be interesting to see how the species representation changes along sequences with different ancestry due to horizontal gene transfer.

English

Michael Pearce@_MichaelPearce·27 Ağu

@Sauers_ @GoodfireAI We're definitely interesting in better interpreting the directions so those types of metrics could be useful there. Most metrics are based on direct sequence comparisons, though, which the model isn’t necessarily doing.

English

Michael Pearce@_MichaelPearce·27 Ağu

The structure seems consistent with the manifold with “ripples” picture seen in LLMs. Finding similar patterns across diverse models hints at a general organizing principle behind feature geometry. Looking forward to characterizing more biological structures in genomic models!

English

170

Michael Pearce@_MichaelPearce·27 Ağu

Excited to share our work digging into how Evo 2 represents species relatedness or phylogeny. Genetics provides a good quantitative measure of relatedness, so we could use it to probe the model and see if its internal geometry reflects it.

Goodfire@GoodfireAI

English

Michael Pearce retweetledi

Goodfire@GoodfireAI·26 Ağu

Adversarial examples - a vulnerability of every AI model, and a “mystery” of deep learning - may simply come from models cramming many features into the same neurons! Less feature interference → more robust models. New research from @livgorton 🧵 (1/4)

English

250

28.8K

Michael Pearce retweetledi

Goodfire@GoodfireAI·21 Ağu

New research! Post-training often causes weird, unwanted behaviors that are hard to catch before deployment because they only crop up rarely - then are found by bewildered users. How can we find these efficiently? (1/7)

English

369

46.3K

Michael Pearce retweetledi

Jack Merullo@jack_merullo_·8 Ağu

Could we tell if gpt-oss was memorizing its training data? I.e., points where it’s reasoning vs reciting? We took a quick look at the curvature of the loss landscape of the 20B model to understand memorization and what’s happening internally during reasoning

English

515

46.9K

Keşfet

@PrimaMente @RakutenGroup @MayoClinic @ericjmichaud_ @ATinyGreenCell @pdhsu @GoodfireAI @PhilEmmanuele