Transluce

231 posts

Transluce banner
Transluce

Transluce

@TransluceAI

Open and scalable technology for understanding AI systems.

Присоединился Ekim 2024
17 Подписки9.2K Подписчики
Transluce ретвитнул
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.
Jacob Steinhardt tweet media
English
4
31
119
13.6K
Transluce
Transluce@TransluceAI·
Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇
Transluce tweet media
English
2
15
71
8.7K
Transluce
Transluce@TransluceAI·
We're hiring a Governance & Policy Fellow to help define how independent AI evaluation works in practice—setting standards, supporting mental health evals, and supporting government evaluators. Hybrid technical + policy background, $200K–$300K. Link in replies.
English
5
44
245
25.8K
Transluce ретвитнул
Transluce ретвитнул
Ethan Perez
Ethan Perez@EthanJPerez·
Transluce is a top-tier AI safety research lab - I follow their work as closely as work from our own safety teams at Anthropic. They're also well-positioned to become a strong third-party auditor for AI labs. Consider donating if you're interested in helping them out!
Transluce@TransluceAI

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

English
2
7
157
14.1K
Transluce ретвитнул
Sarah Schwettmann
Sarah Schwettmann@cogconfluence·
All @TransluceAI work that I described in my NeurIPS mech interp workshop keynote is now out! ✨ Today we released Predictive Concept Decoders, led by @vvhuang_ Paper: arxiv.org/pdf/2512.15712 Blog: transluce.org/pcd And here's @damichoi95's work on scalably extracting latent representations of users from model internals: transluce.org/user-modeling
Justin Angel@JustinAngel

We can train models on maximizing how well they explain LLMs to humans 🤯@cogconfluence paraphrased. Mechanistic Interpretability Workshop #NeurIPS2025.

English
1
17
88
9.7K
Transluce
Transluce@TransluceAI·
Chat with a live version of our PCD at decoder.transluce.org. Try testing whether the decoder can accurately predict Llama-3.1-8B’s behavior, and check whether the decoder’s response is consistent with the encoder’s active concepts!
Transluce tweet media
English
1
0
15
3.4K
Transluce
Transluce@TransluceAI·
Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.
GIF
English
2
33
164
34.4K