Daniel Paleka

1.4K posts

Daniel Paleka

@dpaleka

ai safety researcher | phd @CSatETH | https://t.co/hCoh5RJgZD

Zurich Katılım Mart 2012

926 Takip Edilen4.6K Takipçiler

Sabitlenmiş Tweet

Daniel Paleka@dpaleka·8 Ara

Reminder: if you like what you see here, you should subscribe to my newsletter. newsletter.danielpaleka.com

English

3.5K

Daniel Paleka@dpaleka·9 Mar

@IvanVendrov nikolajurkovic.substack.com/p/mourning-a-l…

QME

467

ivan@IvanVendrov·8 Mar

a mood I'm really missing in the current AI discourse is grief yes things might go terribly and yes we might see glories beyond imagining but no matter what, we will lose much of what it has meant to be human, forever. I'd like to be with that grief more, and held in it.

English

824

58.2K

Daniel Paleka@dpaleka·5 Mar

@Afinetheorem This is interesting. I think the Avg Dist metric makes ~no sense as a metric of capability, unless the model knows it's optimizing for this. I like the % success here better. In general a different scoring func would produce different optimal guesses

English

Kevin A. Bryan@Afinetheorem·4 Mar

(Gemini models also smoke other ones on my 'where is this not-GeoGuessrable photograph taken' benchmark: kevinbryanecon.com/HardGeoBench/. But all of this tells you, with multiple tasks strung together with agents, single question logic and a bad harness is not a good combination.)

English

1.1K

Kevin A. Bryan@Afinetheorem·4 Mar

Last month, I wrote benchmark questions for a big tech company. They are hard - not math or coding, linked to real-world tasks. Gemini 3 Pro *smoked* other frontier models: like 2x more right. It just needs better integrations, agent harness, "longer" think time/less laziness.

English

5.3K

Daniel Paleka@dpaleka·4 Mar

@panickssery 'tis a benchmark. take an existing set of qs and search how early in the question LLMs know the answer.

English

501

Arjun Panickssery@panickssery·4 Mar

Out of curiosity, at what point in this quiz-bowl question do you know the answer? (poll in next tweet) Late in this battle, command was shifted to the Euryalus led by Captain Cuthbert Collingwood. John Pasco consulted on the wording of a message before this battle, which began when the losing side broke out of the port of (*) Cádiz. The flag signal “England expects that every man will do his duty” was sent just prior to—for 10 points—what 1805 naval battle that led to the death of the victorious admiral, Horatio Nelson?

English

2.1K

Daniel Paleka@dpaleka·3 Mar

newsletter.danielpaleka.com/p/you-should-d…

ZXX

3.4K

Daniel Paleka@dpaleka·3 Mar

It begins

Yaron (Ron) Minsky@yminsky

I wonder if we're starting to hit a deflationary era in software engineering. For the first time, we're starting to talk about this in a planning context; it can make sense to put off some projects because we expect they'll be easier to achieve in the future than today.

English

152K

Daniel Paleka retweetledi

Lennart Heim@ohlennart·3 Mar

Timely research. We've all tried to figure out who someone is online. Now LLMs can do this at scale and better. I'm sure no one would misuse this.

Daniel Paleka@dpaleka

Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

English

3.4K

Daniel Paleka@dpaleka·24 Şub

arxiv.org/abs/2212.01681

ZXX

419

Daniel Paleka@dpaleka·24 Şub

Andreas 2022 had foresight 20/20 on the persona emulation concept and 0/20 on picking a name for the concept ("Language Models as Agent Models")

Anthropic@AnthropicAI

AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why? In a new post we describe a theory that explains why AIs act like humans: the persona selection model. anthropic.com/research/perso…

English

1.9K

Daniel Paleka@dpaleka·24 Şub

@spion @YonatanCale @RosieCampbell @allTheYud they don't tell you this but you can make your own METR plot, the data and code are public

English

spion@spion·23 Şub

@dpaleka @YonatanCale @RosieCampbell @allTheYud I am happy you left in the confidence intervals though :)

English

Rosie Campbell@RosieCampbell·21 Şub

The sigmoid can stay exponential longer than you can stay relevant

English

667

30.8K

Daniel Paleka@dpaleka·23 Şub

@spion @YonatanCale @RosieCampbell @allTheYud this is a joke plot, the 2020-2023 period is squeezed. it's an exponential, not a cubic

English

spion@spion·23 Şub

@YonatanCale @RosieCampbell @allTheYud yes x.com/dpaleka/status…

Daniel Paleka@dpaleka

Found the sigmoid!

Daniel Paleka@dpaleka·22 Şub

Found the sigmoid!

English

350

21K

Daniel Paleka@dpaleka·20 Şub

Privacy online is fundamentally at odds with intelligence getting cheaper. Anonymity on the internet has always relied on practical obscurity. We publish in hopes that people can adapt to LLMs changing this. Paper: arxiv.org/abs/2602.16800

English

1.3K

Daniel Paleka@dpaleka·20 Şub

If you're anonymous, what should you do? Avoid sharing specific details, and adopt a security mindset: if a team of smart investigators were trying to identify you from your posts, could they plausibly figure out who you are? If yes, LLM agents will soon be able to do the same.

English

1.4K

Daniel Paleka@dpaleka·20 Şub

English

237

50.2K

Keşfet

@IvanVendrov @Afinetheorem @panickssery @spion @YonatanCale @RosieCampbell @allTheYud @elonmusk