Transluce

232 posts

Transluce

@TransluceAI

Open and scalable technology for understanding AI systems.

Katılım Ekim 2024

17 Takip Edilen9.2K Takipçiler

Transluce retweetledi

Aryaman Arora@aryaman2020·4d

This paper is now a spotlight at ICML! arxiv.org/abs/2601.22594

Transluce@TransluceAI

Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!

English

314

31.3K

Transluce retweetledi

Jacob Steinhardt@JacobSteinhardt·18 Şub

New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.

English

121

15K

Transluce@TransluceAI·17 Şub

Use Docent to analyze your own traces: docs.transluce.org/quickstart Read our Blog: transluce.org/docent/blog/te…

English

898

Transluce@TransluceAI·17 Şub

You can replicate our full analysis with 5 min of setup. Clone our Terminal-Bench data & follow along: transluce.org/docent/blog/te…

English

1.1K

Transluce@TransluceAI·17 Şub

Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇

English

9.6K

Transluce@TransluceAI·29 Oca

See the full post and apply here: jobs.gem.com/transluce/am9i…

English

2.9K

Transluce@TransluceAI·29 Oca

We're hiring a Governance & Policy Fellow to help define how independent AI evaluation works in practice—setting standards, supporting mental health evals, and supporting government evaluators. Hybrid technical + policy background, $200K–$300K. Link in replies.

English

241

26K

Transluce retweetledi

Aryaman Arora@aryaman2020·16 Oca

our circuit tracing codebase from this project is public now! github.com/TransluceAI/ci… please try it out and ping me if you have any questions 😄 and expect more updates soon!

Transluce@TransluceAI

English

147

15.1K

Transluce retweetledi

Jacob Austin@jacobaustin132·23 Ara

I admire the folks at Transluce a lot. They're super smart and have a good model for how to do useful AI oversight work without being embedded in (read: beholden to) any big AI labs. Read their stuff and consider supporting!

Transluce@TransluceAI

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

English

5.8K

Transluce retweetledi

Ethan Perez@EthanJPerez·23 Ara

Transluce is a top-tier AI safety research lab - I follow their work as closely as work from our own safety teams at Anthropic. They're also well-positioned to become a strong third-party auditor for AI labs. Consider donating if you're interested in helping them out!

Transluce@TransluceAI

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

English

157

14.2K

Transluce retweetledi

Sarah Schwettmann@cogconfluence·18 Ara

All @TransluceAI work that I described in my NeurIPS mech interp workshop keynote is now out! ✨ Today we released Predictive Concept Decoders, led by @vvhuang_ Paper: arxiv.org/pdf/2512.15712 Blog: transluce.org/pcd And here's @damichoi95's work on scalably extracting latent representations of users from model internals: transluce.org/user-modeling

Justin Angel@JustinAngel

We can train models on maximizing how well they explain LLMs to humans 🤯@cogconfluence paraphrased. Mechanistic Interpretability Workshop #NeurIPS2025.

English

9.9K

Transluce@TransluceAI·18 Ara

Paper: arxiv.org/abs/2512.15712 Blog: transluce.org/pcd Authors: @vvhuang_, @damichoi95, @_ddjohnson, @cogconfluence, @JacobSteinhardt If you’re excited about building scalable interpretability assistants, visit transluce.org/company

English

1.5K

Transluce@TransluceAI·18 Ara

Chat with a live version of our PCD at decoder.transluce.org. Try testing whether the decoder can accurately predict Llama-3.1-8B’s behavior, and check whether the decoder’s response is consistent with the encoder’s active concepts!

English

3.5K

Transluce@TransluceAI·18 Ara

Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.

GIF

English

166

35.7K

Keşfet

@vvhuang_ @damichoi95 @_ddjohnson @cogconfluence @JacobSteinhardt @elonmusk @BarackObama @taylorswift13