Jacob Steinhardt

493 posts

Jacob Steinhardt

@JacobSteinhardt

Associate Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

Katılım Aralık 2011

82 Takip Edilen11.5K Takipçiler

Sabitlenmiş Tweet

Jacob Steinhardt@JacobSteinhardt·24 Eki

In July, I went on leave from UC Berkeley to found @TransluceAI, together with Sarah Schwettmann (@cogconfluence). Now, our work is finally public.

Transluce@TransluceAI

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

English

361

53.1K

Jacob Steinhardt retweetledi

Dami Choi@damichoi95·16 Mar

Code for our user modeling project is out now! github.com/TransluceAI/ob… This includes data generation, belief evaluation, and training code for our LatentQA decoders. We also uploaded our datasets and decoder checkpoints on Hugging Face: huggingface.co/collections/Tr…

Transluce@TransluceAI

What do AI assistants think about you, and how does this shape their answers? Because assistants are trained to optimize human feedback, how they model users drives issues like sycophancy, reward hacking, and bias. We provide data + methods to extract & steer these user models.

English

6.1K

Jacob Steinhardt@JacobSteinhardt·18 Şub

You can read the full blog post here: bounded-regret.ghost.io/building-techn…

English

629

Jacob Steinhardt@JacobSteinhardt·18 Şub

For technically skilled people who care about AI governance, working on these problems is one of the highest-leveraged things you can do: it's neglected compared to other avenues of impact, and it's something that has historically been impactful in other domains.

English

619

Jacob Steinhardt@JacobSteinhardt·18 Şub

New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.

English

122

14.3K

Jacob Steinhardt retweetledi

Transluce@TransluceAI·17 Şub

Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇

English

9.2K

Jacob Steinhardt retweetledi

Grace Luo@graceluo_·9 Şub

We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with @feng_jiahai, @trevordarrell, @AlecRad, @JacobSteinhardt. More in thread 🧵

English

185

1.4K

211.6K

Jacob Steinhardt retweetledi

Lisa Dunlap@lisabdunlap·14 Oca

🌟NEW PAPER🌟 Do you know that changing a visual marker from red to blue can completely reorder VLM leaderboards? In our most recent work, we explore the fragility of visually prompted benchmarks. lisadunlap.github.io/vpbench/

English

217

47.3K

Jacob Steinhardt@JacobSteinhardt·6 Oca

Read more in the full post here! bounded-regret.ghost.io/oversight-assi…

English

1.6K

Jacob Steinhardt@JacobSteinhardt·6 Oca

This opens up an important possibility: that we can decouple *oversight* abilities from *general* capabilities. This would significantly democratize AI oversight, and could also help unsure that oversight agents categorically outmatch the systems they are overseeing.

English

1.9K

Jacob Steinhardt@JacobSteinhardt·6 Oca

New blog post out: a position piece on "Turning Compute into Understanding", by training superhuman oversight assistants.

English

230

28.8K

Keşfet

@feng_jiahai @trevordarrell @AlecRad @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates