Transluce

231 posts

Transluce banner
Transluce

Transluce

@TransluceAI

Open and scalable technology for understanding AI systems.

Katılım Ekim 2024
17 Takip Edilen9.2K Takipçiler
Transluce retweetledi
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.
Jacob Steinhardt tweet media
English
4
31
119
13.6K
Transluce
Transluce@TransluceAI·
Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇
Transluce tweet media
English
2
15
71
8.7K
Transluce
Transluce@TransluceAI·
We're hiring a Governance & Policy Fellow to help define how independent AI evaluation works in practice—setting standards, supporting mental health evals, and supporting government evaluators. Hybrid technical + policy background, $200K–$300K. Link in replies.
English
5
44
245
25.8K
Transluce retweetledi
Transluce retweetledi
Sarah Schwettmann
Sarah Schwettmann@cogconfluence·
All @TransluceAI work that I described in my NeurIPS mech interp workshop keynote is now out! ✨ Today we released Predictive Concept Decoders, led by @vvhuang_ Paper: arxiv.org/pdf/2512.15712 Blog: transluce.org/pcd And here's @damichoi95's work on scalably extracting latent representations of users from model internals: transluce.org/user-modeling
Justin Angel@JustinAngel

We can train models on maximizing how well they explain LLMs to humans 🤯@cogconfluence paraphrased. Mechanistic Interpretability Workshop #NeurIPS2025.

English
1
17
88
9.7K
Transluce
Transluce@TransluceAI·
Chat with a live version of our PCD at decoder.transluce.org. Try testing whether the decoder can accurately predict Llama-3.1-8B’s behavior, and check whether the decoder’s response is consistent with the encoder’s active concepts!
Transluce tweet media
English
1
0
15
3.4K
Transluce
Transluce@TransluceAI·
Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.
GIF
English
2
33
164
34.4K