
Daniel Paleka
1.4K posts

Daniel Paleka
@dpaleka
ai safety researcher | phd @CSatETH | https://t.co/hCoh5RJgZD
Zurich Katılım Mart 2012
926 Takip Edilen4.6K Takipçiler
Sabitlenmiş Tweet

Reminder: if you like what you see here, you should subscribe to my newsletter. newsletter.danielpaleka.com
English

@Afinetheorem This is interesting. I think the Avg Dist metric makes ~no sense as a metric of capability, unless the model knows it's optimizing for this. I like the % success here better. In general a different scoring func would produce different optimal guesses
English

(Gemini models also smoke other ones on my 'where is this not-GeoGuessrable photograph taken' benchmark: kevinbryanecon.com/HardGeoBench/. But all of this tells you, with multiple tasks strung together with agents, single question logic and a bad harness is not a good combination.)

English

@panickssery 'tis a benchmark. take an existing set of qs and search how early in the question LLMs know the answer.
English

Out of curiosity, at what point in this quiz-bowl question do you know the answer? (poll in next tweet)
Late in this battle, command was shifted to the Euryalus led by Captain Cuthbert Collingwood. John Pasco consulted on the wording of a message before this battle, which began when the losing side broke out of the port of (*) Cádiz. The flag signal “England expects that every man will do his duty” was sent just prior to—for 10 points—what 1805 naval battle that led to the death of the victorious admiral, Horatio Nelson?
English

Daniel Paleka retweetledi

Timely research.
We've all tried to figure out who someone is online. Now LLMs can do this at scale and better.
I'm sure no one would misuse this.
Daniel Paleka@dpaleka
Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵
English

Andreas 2022 had foresight 20/20 on the persona emulation concept and 0/20 on picking a name for the concept ("Language Models as Agent Models")
Anthropic@AnthropicAI
AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why? In a new post we describe a theory that explains why AIs act like humans: the persona selection model. anthropic.com/research/perso…
English

@spion @YonatanCale @RosieCampbell @allTheYud they don't tell you this but you can make your own METR plot, the data and code are public
English

@dpaleka @YonatanCale @RosieCampbell @allTheYud I am happy you left in the confidence intervals though :)
English

@spion @YonatanCale @RosieCampbell @allTheYud this is a joke plot, the 2020-2023 period is squeezed. it's an exponential, not a cubic
English

Privacy online is fundamentally at odds with intelligence getting cheaper.
Anonymity on the internet has always relied on practical obscurity. We publish in hopes that people can adapt to LLMs changing this.
Paper: arxiv.org/abs/2602.16800
English

Can LLMs figure out who you are from your anonymous posts?
From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.
New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

English



