johnny

22 posts

johnny

johnny

@johnnylin

@neuronpedia. prev @apple.

San Francisco, CA Katılım Ocak 2009
3 Takip Edilen538 Takipçiler
johnny
johnny@johnnylin·
i still sometimes code things by hand
johnny tweet media
English
1
0
1
87
johnny retweetledi
Anthropic
Anthropic@AnthropicAI·
Researchers can use the Neuronpedia interactive interface here: neuronpedia.org/gemma-2-2b/gra… And we’ve provided an annotated walkthrough: github.com/safety-researc… This project was led by participants in our Anthropic Fellows program, in collaboration with Decode Research.
English
16
63
499
53.6K
johnny retweetledi
neuronpedia
neuronpedia@neuronpedia·
Announcement: we're open sourcing Neuronpedia! 🚀 This includes all our mech interp tools: the interpretability API, steering, UI, inference, autointerp, search, plus 4 TB of data - cited by 35+ research papers and used by 50+ write-ups. What you can do with OSS Neuronpedia: 🧵
GIF
English
2
29
152
13K
johnny retweetledi
Curt Tigges
Curt Tigges@CurtTigges·
Neuronpedia now hosts Chain-of-Thought! Steer and inspect Deepseek-R1-Distill-Llama-8B with SAEs trained by @Open_MOSS on @neuronpedia (linked below). One fun initial result: the model can easily be steered into "overthinking/anxious" mode with a single latent.
Curt Tigges tweet mediaCurt Tigges tweet media
English
2
10
45
7.4K
johnny retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Gemma Scope allows us to study how features evolve throughout the model and interact to create more complex ones. Want to learn more? Here’s an interactive demo made by @neuronpedia - no coding necessary ↓ dpmd.ai/gemma-scope
GIF
English
2
10
72
25.7K
johnny retweetledi
Neel Nanda
Neel Nanda@NeelNanda5·
Want to learn more? @neuronpedia have made a gorgeous interactive demo walking you through what Sparse Autoencoders are, and what Gemma Scope can do. If this could happen pre-launch, I'm excited to see what the community will do with Gemma Scope now! neuronpedia.org/gemma-scope
Neel Nanda tweet media
English
2
5
115
8.5K
johnny retweetledi
Neel Nanda
Neel Nanda@NeelNanda5·
Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training costs limit research Announcing Gemma Scope: An open suite of SAEs on every layer & sublayer of Gemma 2 2B & 9B! We hope to enable even more ambitious work
GIF
English
17
151
1K
211.2K
johnny
johnny@johnnylin·
exciting new research from @apolloaisafety and @jordantensor: E2E SAEs (w/ ~700k features) are now live on @neuronpedia - the first to use dual UMAPs for visual comparison and exploration between SAE training methods. check it out at neuronpedia.org/gpt2sm-apollojt
johnny tweet media
Lee Sharkey@leedsharkey

Proud to share Apollo Research's first interpretability paper! In collaboration w @JordanTensor! ⤵️ publications.apolloresearch.ai/end_to_end_spa… Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Our SAEs explain significantly more performance than before! 1/

English
0
3
16
1.4K
johnny
johnny@johnnylin·
6/ Oh and of course, @neuronpedia is publicly available for anyone to experiment and play with at neuronpedia.org. Let us know what you think!
English
0
1
7
869
johnny
johnny@johnnylin·
1/ Introducing Neuronpedia: an open platform for interpretability research with hosting, visualizations, and tooling for Sparse Autoencoders (SAEs). Let's try it out! ➡️ Neuronpedia lets us instantly test activations of SAE features with custom text. Here's a Star Wars feature:
English
4
32
199
20.1K
johnny retweetledi
Joseph Bloom
Joseph Bloom@JBloomAus·
Super impressed by @johnnylin's Interactive Interface for exploring my GPT2 Small SAE Features. neuronpedia.org/gpt2-small/res…. First 5000 for each layer are there with the rest coming shortly! We've updated the feature-activation highlighting to better show multiple fires per context!
Joseph Bloom tweet media
English
0
1
8
401
johnny
johnny@johnnylin·
best IoT feature: devices that automatically update for daylight savings time
English
0
0
0
123
johnny
johnny@johnnylin·
twitter encourages logical local optima
English
2
0
0
0