Constantin Venhoff

49 posts

Constantin Venhoff

Constantin Venhoff

@cvenhoff00

PhD Student at Oxford University @OxfordTVG | @MATSprogram 7.0/7.1 Scholar with Neel Nanda | Intern @Meta

Katılım Nisan 2024
120 Takip Edilen405 Takipçiler
Constantin Venhoff retweetledi
Anna Soligo
Anna Soligo@anna_soligo·
Gemini has a reputation for its breakdowns - self-deprecating spirals, deleting codebases, uninstalling itself... Turns out Gemma is worse: “THIS is my last time with YOU. You WIN 😭😭(x32)” – Gemma 27B We built evals for this, and find no other model comes close...
Anna Soligo tweet media
English
31
109
906
83.7K
Constantin Venhoff retweetledi
Goodfire
Goodfire@GoodfireAI·
We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente. How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)
Goodfire tweet media
English
50
224
1.7K
393.5K
Constantin Venhoff
Constantin Venhoff@cvenhoff00·
Key takeaway: Successful multimodal alignment requires more than representational compatibility. It depends on integrating visual information into the functional circuits of the LLM backbone!
English
1
0
2
111
Constantin Venhoff
Constantin Venhoff@cvenhoff00·
Excited to present our NeurIPS paper today at 4:30pm in Exhibit Hall C,D,E (Poster #4615)! "Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval" Details 🧵👇
Constantin Venhoff tweet media
English
3
5
21
5.8K
Constantin Venhoff retweetledi
Sharan
Sharan@_maiush·
AI that is “forced to be good” v “genuinely good” Should we care about the difference? (yes!) We’re releasing the first open implementation of character training. We shape the persona of AI assistants in a more robust way than alternatives like prompting or activation steering.
Sharan tweet media
English
5
38
193
61.3K
Constantin Venhoff retweetledi
Tim Hua 🇺🇦
Tim Hua 🇺🇦@Tim_Hua_·
Problem: AIs can detect when they are being tested and fake good behavior. Can we suppress the “I’m being tested” concept & make them act normally? Yes! In a new paper, we show that subtracting this concept vector can elicit real-world behavior even when normal prompting fails.
Tim Hua 🇺🇦 tweet media
English
15
33
245
59.2K
Constantin Venhoff
Constantin Venhoff@cvenhoff00·
🚨 What do reasoning models actually learn during training? Our new paper shows base models already contain reasoning mechanisms, thinking models learn when to use them! By invoking those skills at the right time in the base model, we recover up to 91% of the performance gap 🧵
Constantin Venhoff tweet media
English
15
69
583
81.3K