Jacob Steinhardt

493 posts

Jacob Steinhardt

Jacob Steinhardt

@JacobSteinhardt

Associate Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

Katılım Aralık 2011
82 Takip Edilen11.5K Takipçiler
Sabitlenmiş Tweet
Jacob Steinhardt retweetledi
Dami Choi
Dami Choi@damichoi95·
Code for our user modeling project is out now! github.com/TransluceAI/ob… This includes data generation, belief evaluation, and training code for our LatentQA decoders. We also uploaded our datasets and decoder checkpoints on Hugging Face: huggingface.co/collections/Tr…
Transluce@TransluceAI

What do AI assistants think about you, and how does this shape their answers? Because assistants are trained to optimize human feedback, how they model users drives issues like sycophancy, reward hacking, and bias. We provide data + methods to extract & steer these user models.

English
0
8
48
6.1K
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
For technically skilled people who care about AI governance, working on these problems is one of the highest-leveraged things you can do: it's neglected compared to other avenues of impact, and it's something that has historically been impactful in other domains.
English
1
0
4
619
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.
Jacob Steinhardt tweet media
English
4
30
122
14.3K
Jacob Steinhardt retweetledi
Transluce
Transluce@TransluceAI·
Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇
Transluce tweet media
English
2
15
71
9.2K
Jacob Steinhardt retweetledi
Grace Luo
Grace Luo@graceluo_·
We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with @feng_jiahai, @trevordarrell, @AlecRad, @JacobSteinhardt. More in thread 🧵
English
31
185
1.4K
211.6K
Jacob Steinhardt retweetledi
Lisa Dunlap
Lisa Dunlap@lisabdunlap·
🌟NEW PAPER🌟 Do you know that changing a visual marker from red to blue can completely reorder VLM leaderboards? In our most recent work, we explore the fragility of visually prompted benchmarks. lisadunlap.github.io/vpbench/
Lisa Dunlap tweet media
English
6
39
217
47.3K
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
This opens up an important possibility: that we can decouple *oversight* abilities from *general* capabilities. This would significantly democratize AI oversight, and could also help unsure that oversight agents categorically outmatch the systems they are overseeing.
English
1
1
14
1.9K
Jacob Steinhardt
Jacob Steinhardt@JacobSteinhardt·
New blog post out: a position piece on "Turning Compute into Understanding", by training superhuman oversight assistants.
Jacob Steinhardt tweet media
English
5
36
230
28.8K