Ching Fang (chingfang.bsky.social) (@chingfang17) - Twitter-Profil

Ching Fang (chingfang.bsky.social) retweetet

Goodfire@GoodfireAI·14 Nis

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

English

10

154

816

177.5K

Ching Fang (chingfang.bsky.social) retweetet

Goodfire@GoodfireAI·11 Şub

We used interpretability to scale RL against open-ended tasks, cutting Gemma 12B’s hallucination rate in half by teaching it to self-correct in tandem with our probing harness.

English

13

38

344

73.8K

Ching Fang (chingfang.bsky.social) retweetet

Goodfire@GoodfireAI·5 Şub

We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.

English

30

59

496

210.8K

Ching Fang (chingfang.bsky.social) retweetet

Kempner Institute at Harvard University@KempnerInst·3 Şub

🤖📊 NEW in the Deeper Learning blog: @AnnHuang42 & @KanakaRajanPhD break down their recent work examining how recurrent neural networks solve the same task in different ways —and why that matters. Joint work with @tweetsatpreet & @FlaviohMar bit.ly/4kj4fVd #NeuroAI #AI

English

0

6

27

7K

Ching Fang (chingfang.bsky.social)@chingfang17·30 Oca

@jwuphysics @GoodfireAI @PrimaMente We flow through the original embeddings and classifier. The SAE is inserted as an intermediate layer to compute feature attributions, but the classification model itself operates on the original activations (via a detached residual).

English

0

6

51

John F. Wu@jwuphysics·29 Oca

@GoodfireAI @PrimaMente Really fantastic work; love your team's approach to solving real question via interp. Quick question: when performing grad attribution on SAE features, do you flow through SAE reconstructions or original embeddings (i.e. train hierarchical classifier on SAE decoded embeds)?

English

1

0

5

817

Goodfire@GoodfireAI·28 Oca

We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)

English

50

223

1.7K

394.9K

Ching Fang (chingfang.bsky.social)@chingfang17·29 Oca

A really exciting example of using interp to advance scientific progress! I had a lot of fun working on this with our scientific discovery team :)

Goodfire@GoodfireAI

We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)

English

0

2

32

1K

Ching Fang (chingfang.bsky.social) retweetet

Ryan Panwar@RyanPanwar·18 Ara

Kimi is just like me frfr Steering @Kimi_Moonshot's K2 Thinking's reasoning in the Kimi CLI

Goodfire@GoodfireAI

Our infra lets us steer trillion-parameter frontier models in real time: - live, mid-CoT edits to internal activations - directly altering how the model reasons (not just outputs) - stackable edits - no added latency We can make models more Gen Z, more concise, etc.

English

1

8

28

7.1K

Ching Fang (chingfang.bsky.social) retweetet

Goodfire@GoodfireAI·18 Ara

Our infra lets us steer trillion-parameter frontier models in real time: - live, mid-CoT edits to internal activations - directly altering how the model reasons (not just outputs) - stackable edits - no added latency We can make models more Gen Z, more concise, etc.

Gopal@gopalkraman

at @GoodfireAI, @RyanPanwar and Michael Anderson are building tools to intentionally design and control AI, moving beyond prompting to direct, real-time intervention.

English

7

24

237

29.2K

Ching Fang (chingfang.bsky.social) retweetet

Karim Jerbi@karimjerbineuro·12 Ara

🔴 Live: Panel discussion #1: @ MAIN2025 The future of Neuroscience - The role of AI ? with Andreas Tolias, Siva Reddy, Joao Sacramento, Ching Fang + Eva Portelance Moderated by Patrick Mineault @patrickmineault #MAIN2025

English

0

4

15

1K

Ching Fang (chingfang.bsky.social)@chingfang17·14 Ara

Had a great time at MAIN 2025! Loved chatting with everyone 😊

Karim Jerbi@karimjerbineuro

🎉MAIN 2025 conferences just kicked of Our opening lecture is by Ching Fang (Goodfire AI, San Francisco) - "From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers" Montreal AI & Neuroscience is brought to you by @ai_unique #MAIN2025 #NeuroAI #RL

English

0

25

1.6K

Ching Fang (chingfang.bsky.social)@chingfang17·6 Ara

I'll be presenting this as a spotlight work at the #NeurIPS2025 mechanistic interpretability workshop this Sunday in the afternoon session!

Ching Fang (chingfang.bsky.social)@chingfang17

Sharing some new work! We asked whether we could decode a model's reasoning if it was thinking in encrypted text, using only its internal activations. The answer is surprisingly yes. arxiv.org/abs/2512.01222

English

1

3

20

1.8K

Ching Fang (chingfang.bsky.social)@chingfang17·6 Ara

But overall this is encouraging for interpretability - simple mechanistic tools may be more robust to encoded reasoning than we expected. Joint work with Sam Marks @saprmarks, done during my fellowship with Cambridge Boston Alignment Initiative @cbai_ai

English

0

13

485

Ching Fang (chingfang.bsky.social)@chingfang17·6 Ara

One caveat is that we used SFT on the base model's own responses, which might preserve more of the original activation space. There's room for more testbeds here - RL-induced encoding, more complex ciphers, etc.

English

1

0

7

532

Ching Fang (chingfang.bsky.social)@chingfang17·6 Ara

Sharing some new work! We asked whether we could decode a model's reasoning if it was thinking in encrypted text, using only its internal activations. The answer is surprisingly yes. arxiv.org/abs/2512.01222

English

1

6

61

6.4K

Ching Fang (chingfang.bsky.social) retweetet

Jack Lindsey@Jack_W_Lindsey·24 Tem

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…

English

194

205

2.5K

468.4K

Ching Fang (chingfang.bsky.social)

Entdecken