Goodfire

415 posts

Goodfire banner
Goodfire

Goodfire

@GoodfireAI

Using interpretability to understand, learn from, and design AI.

San Francisco Se unió Ağustos 2024
28 Siguiendo13.1K Seguidores
Tweet fijado
Goodfire
Goodfire@GoodfireAI·
We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.
English
30
60
488
204.4K
Goodfire retuiteado
Nathan Labenz
Nathan Labenz@labenz·
In <2 years, @goodfireai has built an all-star team, landed their first blue-chip customers, published a ton of research, and now... Raised $150M at a $1.25B valuation to become the first interpretability unicorn. 🦄 Here @DanJBalsam & @banburismus_ explain the motivation & vision for their new *Intentional Design" research agenda, which aims to understand and control what models learn in training. Full episode ↓
English
2
5
27
5.9K
Goodfire
Goodfire@GoodfireAI·
Interpretability methods can augment monitoring by identifying relevant information internally that isn’t reliably expressed. Plus, well-calibrated probes give us a knob to control the tradeoff between accuracy and token usage, enabling adaptive inference-time compute! (6/7)
English
1
1
24
1.7K
Goodfire
Goodfire@GoodfireAI·
LLMs often reason “performatively” well after deciding on a final answer - something that CoT monitors are slow to catch. Our new paper finds that: - probes can help monitor for this - it seems to track with task difficulty - probes enable early CoT exit, saving tokens! (1/7)
Goodfire tweet media
English
8
39
327
41.4K
Goodfire
Goodfire@GoodfireAI·
New blog post: how we built infrastructure to enable interp at trillion-parameter scale with minimal inference overhead. In a couple short years, interpretability has gone from toy models to the frontier. (1/6)
Goodfire tweet media
English
2
18
212
18.8K