Constantin Venhoff

49 posts

Constantin Venhoff

@cvenhoff00

PhD Student at Oxford University @OxfordTVG | @MATSprogram 7.0/7.1 Scholar with Neel Nanda | Intern @Meta

Katılım Nisan 2024

120 Takip Edilen405 Takipçiler

Constantin Venhoff retweetledi

Anna Soligo@anna_soligo·10 Mar

Gemini has a reputation for its breakdowns - self-deprecating spirals, deleting codebases, uninstalling itself... Turns out Gemma is worse: “THIS is my last time with YOU. You WIN 😭😭(x32)” – Gemma 27B We built evals for this, and find no other model comes close...

English

109

906

83.7K

Constantin Venhoff retweetledi

Goodfire@GoodfireAI·28 Oca

We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)

English

224

1.7K

393.5K

Constantin Venhoff@cvenhoff00·6 Ara

Huge thanks to my amazing co-authors @ashk__on, @soniajoseph_, @philiptorr, and @NeelNanda5! Also grateful to the @MATSprogram for support. Come chat at Poster #4615 today at 4:30pm! Paper link: arxiv.org/abs/2512.03276

English

146

Constantin Venhoff@cvenhoff00·6 Ara

Key takeaway: Successful multimodal alignment requires more than representational compatibility. It depends on integrating visual information into the functional circuits of the LLM backbone!

English

111

Constantin Venhoff@cvenhoff00·6 Ara

Excited to present our NeurIPS paper today at 4:30pm in Exhibit Hall C,D,E (Poster #4615)! "Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval" Details 🧵👇

English

5.8K

Constantin Venhoff retweetledi

Sharan@_maiush·4 Kas

AI that is “forced to be good” v “genuinely good” Should we care about the difference? (yes!) We’re releasing the first open implementation of character training. We shape the persona of AI assistants in a more robust way than alternatives like prompting or activation steering.

English

193

61.3K

Constantin Venhoff retweetledi

Tim Hua 🇺🇦@Tim_Hua_·30 Eki

Problem: AIs can detect when they are being tested and fake good behavior. Can we suppress the “I’m being tested” concept & make them act normally? Yes! In a new paper, we show that subtracting this concept vector can elicit real-world behavior even when normal prompting fails.

English

245

59.2K

Constantin Venhoff@cvenhoff00·10 Eki

Work done with my awesome collaborators @IvanArcus @ArthurConmy @NeelNanda5 @philiptorr as part of the @MATSprogram

English

1.4K

Constantin Venhoff@cvenhoff00·10 Eki

Try it yourself! 🌐 Interactive demo: thinking-llms-interp.com 💻 Code: github.com/cvenhoff/think… 📄 Paper: arxiv.org/abs/2510.07364 Accepted at NeurIPS 2025 Mechanistic Interpretability Workshop ✨

English

1.5K

Constantin Venhoff@cvenhoff00·10 Eki

🚨 What do reasoning models actually learn during training? Our new paper shows base models already contain reasoning mechanisms, thinking models learn when to use them! By invoking those skills at the right time in the base model, we recover up to 91% of the performance gap 🧵

English

583

81.3K

Keşfet

@PrimaMente @soniajoseph_ @philiptorr @NeelNanda5 @MATSprogram @IvanArcus @ArthurConmy @elonmusk