Goodfire

415 posts

Goodfire banner
Goodfire

Goodfire

@GoodfireAI

Using interpretability to understand, learn from, and design AI.

San Francisco Bergabung Ağustos 2024
28 Mengikuti13.1K Pengikut
Tweet Disematkan
Goodfire
Goodfire@GoodfireAI·
We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.
English
30
60
488
204.4K
Goodfire me-retweet
Nathan Labenz
Nathan Labenz@labenz·
In <2 years, @goodfireai has built an all-star team, landed their first blue-chip customers, published a ton of research, and now... Raised $150M at a $1.25B valuation to become the first interpretability unicorn. 🦄 Here @DanJBalsam & @banburismus_ explain the motivation & vision for their new *Intentional Design" research agenda, which aims to understand and control what models learn in training. Full episode ↓
English
2
5
27
5.9K
Goodfire
Goodfire@GoodfireAI·
Interpretability methods can augment monitoring by identifying relevant information internally that isn’t reliably expressed. Plus, well-calibrated probes give us a knob to control the tradeoff between accuracy and token usage, enabling adaptive inference-time compute! (6/7)
English
1
1
24
1.7K
Goodfire
Goodfire@GoodfireAI·
LLMs often reason “performatively” well after deciding on a final answer - something that CoT monitors are slow to catch. Our new paper finds that: - probes can help monitor for this - it seems to track with task difficulty - probes enable early CoT exit, saving tokens! (1/7)
Goodfire tweet media
English
8
39
327
41.4K
Goodfire
Goodfire@GoodfireAI·
New blog post: how we built infrastructure to enable interp at trillion-parameter scale with minimal inference overhead. In a couple short years, interpretability has gone from toy models to the frontier. (1/6)
Goodfire tweet media
English
2
18
212
18.8K