Ching Fang (chingfang.bsky.social)

86 posts

Ching Fang (chingfang.bsky.social) banner
Ching Fang (chingfang.bsky.social)

Ching Fang (chingfang.bsky.social)

@chingfang17

Member of Technical Staff @GoodfireAI working on AI interpretability for scientific discovery. Prev: @Harvard, neuroscience PhD @Columbia @cu_neurotheory

San Francisco, CA Beigetreten Ocak 2015
643 Folgt953 Follower
Ching Fang (chingfang.bsky.social) retweetet
Goodfire
Goodfire@GoodfireAI·
We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)
Goodfire tweet media
English
10
154
816
177.5K
Ching Fang (chingfang.bsky.social) retweetet
Goodfire
Goodfire@GoodfireAI·
We used interpretability to scale RL against open-ended tasks, cutting Gemma 12B’s hallucination rate in half by teaching it to self-correct in tandem with our probing harness.
English
13
38
344
73.8K
Ching Fang (chingfang.bsky.social) retweetet
Goodfire
Goodfire@GoodfireAI·
We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.
English
30
59
496
210.8K
Ching Fang (chingfang.bsky.social)
@jwuphysics @GoodfireAI @PrimaMente We flow through the original embeddings and classifier. The SAE is inserted as an intermediate layer to compute feature attributions, but the classification model itself operates on the original activations (via a detached residual).
English
0
0
6
51
John F. Wu
John F. Wu@jwuphysics·
@GoodfireAI @PrimaMente Really fantastic work; love your team's approach to solving real question via interp. Quick question: when performing grad attribution on SAE features, do you flow through SAE reconstructions or original embeddings (i.e. train hierarchical classifier on SAE decoded embeds)?
John F. Wu tweet media
English
1
0
5
817
Goodfire
Goodfire@GoodfireAI·
We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)
Goodfire tweet media
English
50
223
1.7K
394.9K
Ching Fang (chingfang.bsky.social) retweetet
Goodfire
Goodfire@GoodfireAI·
Our infra lets us steer trillion-parameter frontier models in real time: - live, mid-CoT edits to internal activations - directly altering how the model reasons (not just outputs) - stackable edits - no added latency We can make models more Gen Z, more concise, etc.
Gopal@gopalkraman

at @GoodfireAI, @RyanPanwar and Michael Anderson are building tools to intentionally design and control AI, moving beyond prompting to direct, real-time intervention.

English
7
24
237
29.2K
Ching Fang (chingfang.bsky.social) retweetet
Karim Jerbi
Karim Jerbi@karimjerbineuro·
🔴 Live: Panel discussion #1: @ MAIN2025 The future of Neuroscience - The role of AI ?  with Andreas Tolias, Siva Reddy, Joao Sacramento, Ching Fang + Eva Portelance Moderated by Patrick Mineault @patrickmineault #MAIN2025
Karim Jerbi tweet mediaKarim Jerbi tweet mediaKarim Jerbi tweet media
English
0
4
15
1K
Ching Fang (chingfang.bsky.social)
But overall this is encouraging for interpretability - simple mechanistic tools may be more robust to encoded reasoning than we expected. Joint work with Sam Marks @saprmarks, done during my fellowship with Cambridge Boston Alignment Initiative @cbai_ai
English
0
0
13
485
Ching Fang (chingfang.bsky.social)
One caveat is that we used SFT on the base model's own responses, which might preserve more of the original activation space. There's room for more testbeds here - RL-induced encoding, more complex ciphers, etc.
English
1
0
7
532
Ching Fang (chingfang.bsky.social) retweetet
Jack Lindsey
Jack Lindsey@Jack_W_Lindsey·
We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic!  We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…
English
194
205
2.5K
468.4K