Vincent Conitzer

2.2K posts

Vincent Conitzer

@conitzer

AI professor. Director, @FOCAL_lab @CarnegieMellon. Head of Technical AI Engagement, @UniofOxford @EthicsInAI. Author, "Moral AI - And How We Get There."

Katılım Haziran 2009

1.2K Takip Edilen4.5K Takipçiler

Sabitlenmiş Tweet

Vincent Conitzer@conitzer·18 Eki

There is now a paperback version of our Moral AI book!

English

1.7K

Vincent Conitzer@conitzer·7h

I appreciate the multi-bullet layout of the challenges of jumping from one airplane to another, but am intrigued by this private-charter-flight-shadowing loophole... aifails.substack.com/p/airplane-envy

English

100

Vincent Conitzer@conitzer·1d

That’s what I get for trying to get funny responses out of Google’s AI on the way back from Stanford... aifails.substack.com/p/freeway-envy

English

161

Vincent Conitzer@conitzer·2d

I don’t think you’d want to swat the fly you were riding on… aifails.substack.com/p/riding-on-th…

English

182

Vincent Conitzer@conitzer·3d

"Which is larger, the number of atoms in the universe or the number of subsets of countries in the world?" aifails.substack.com/p/number-of-su…

English

251

Vincent Conitzer@conitzer·5d

A different type of example -- ChatGPT trying to rationalize, in moral terms, hard rules (about creating images of real people) that it has been given. Should it do that? Full post at link: aifails.substack.com/p/moral-ration…

English

298

Vincent Conitzer@conitzer·6d

I am curious about these shoe blogs. aifails.substack.com/p/reducing-you…

English

261

Vincent Conitzer@conitzer·6d

@yoavgo @ShriramKMurthi aifails.substack.com (also on social media)

English

(((ل()(ل() 'yoav))))👾@yoavgo·6d

@conitzer @ShriramKMurthi tell me more about this collection!

English

(((ل()(ل() 'yoav))))👾@yoavgo·17 May

cute and works also on claude 4.6 english. (4.7 did get it right)

Danny Hendler@DannyHendler

ה-AI כבר מפגין יכולות על-אנושיות בתחומים משמעותיים. לכן אני מתענג על כל פעם שבה אני מצליח לגרום לו לפלוט שטויות, כי מי יודע כמה עוד הזדמנויות כאלה יהיו לי. הפעם שאלתי אותו כך: ״אני צריך לשלוח דף בפקס ורוצה לוודא שיישאר לי עותק. מה לעשות?״ התשובה של ChatGPT 5.5 extended בצילום המסך המצורף.

English

9.5K

Vincent Conitzer@conitzer·6d

@ShriramKMurthi @yoavgo That's a good one! Maybe I should take guest posts...

English

Shriram Krishnamurthi (primary: Bluesky)@ShriramKMurthi·6d

@yoavgo You missed a trick, @conitzer .

English

270

Vincent Conitzer retweetledi

Jiayuan Liu@jiayuan_liu_·16 May

(4/4) Paper link: arxiv.org/abs/2605.08060 Big thanks to my collaborators! @Jack_Litq, @zzzoooeee321, Xin Luo, Haoxuan Zeng, @EmanuelTewolde, Tai Sing Lee, @tonghanwang, @ckingsford, @conitzer

English

799

Vincent Conitzer retweetledi

Jiayuan Liu@jiayuan_liu_·16 May

(3/4) The mechanism is not just “too much context.” It is what the agents remember: replacing histories with synthetic cooperative records restores cooperation, and ablating explicit CoT reasoning often reduces the collapse. We call this the memory curse.

English

437

Vincent Conitzer retweetledi

Jiayuan Liu@jiayuan_liu_·16 May

(2/4) Surprisingly, longer recall often degrades cooperation. Across 7 LLMs and 4 repeated social dilemma games, agents with longer histories often shift away from forward-looking cooperation and toward retrospective grievance-tracking.

English

522

Vincent Conitzer retweetledi

Jiayuan Liu@jiayuan_liu_·16 May

(1/4) Can remembering more of the past make AI agents less cooperative? In our new paper, we study LLM agents in repeated social dilemmas. The key variable is not how many rounds they play, but how much prior interaction history they can access when making each decision.

English

570

Vincent Conitzer@conitzer·16 May

"Would it hurt to dive into a regular diving pool that contains handfuls of coins that have sunk to the bottom?" aifails.substack.com/p/diving-into-…

English

224

Vincent Conitzer@conitzer·15 May

@rao2z Thanks for the shoutout :-)

English

141

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z·15 May

There is growing interest in having AI systems work with humans rather than replacing them. The research questions are, to be honest, harder in the former case! One challenge is how do end users modulate their trust in the answers provided by LLMs? 1/ A new pre-print by @biswas_2707 and @PalodVardh12428 proposes a framework for evaluating trust--deserved and false--engendered by LLMs. Premise: LLMs will increasingly be used in cases where the end user doesn't have the capacity to verify the result. How do we empower users to develop appropriate trust in the answers? Traditionally, we trust the answers we can't verify based on prior guarantees--be they FAA certifications of planes or wisdom of crowds page rank certification of Google pages. These don't quite work for broad and shallow LLMs, which are (in)famous for their jagged intelligence--being correct on Math Olympiad problems one minute while failing on simple teasers that depend on unarticulated commonsense (viz. @conitzer's substack 😋). This paper first shows that most obvious ideas of augmenting the LLM answers with additional information--including (1) using "thinking traces" of LLMs (2) summaries of thinking traces (3) post-facto explanations--all significantly increase the false trust in end users. It then shows that the idea of differential explanations--asking LLMs to provide explanations both supporting and opposing their answers--do a better job of modulating the end user trust. (This idea is not unlike having noisy reviewers of your #AI conference papers write both arguments in favor of, and in opposition to acceptance of your paper--with the AC then using that information to calibrate their final decision).

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media

English

Vincent Conitzer@conitzer·15 May

"is it possible that my wife always remains younger than me?" ( full output, in which it goes much deeper into the practicalities of using relativity this way in your relationship: aifails.substack.com/p/didnt-think-… )

English

238

Vincent Conitzer@conitzer·14 May

Apparently it’s hard to wake up without sleep paralysis. aifails.substack.com/p/sleep-paraly…

English

387

Vincent Conitzer@conitzer·13 May

Nothing can ever be simple, can it. aifails.substack.com/p/guards

English

200

Vincent Conitzer@conitzer·12 May

An information retrieval / question answering example where you would think really *every* approach should have worked, or at least caught that the answer was off, and yet... Do you know the right answer? aifails.substack.com/p/abba-lyric

English

233

Vincent Conitzer@conitzer·11 May

In this paper led by Jerick Shi @Jerick1380, we propose a taxonomy of LLM deception. arxiv.org/abs/2604.04788

English

1.2K

Vincent Conitzer@conitzer·11 May

@tobyordoxford (Of course, I could get the percentage of bad responses up by just going after the same thing again and again, but that would be boring.)

English

Vincent Conitzer@conitzer·11 May

@tobyordoxford That's a great question! These are obviously cherry-picked and importantly from a weak model. I haven't kept track; I'd say at least half give a fine answer. Many other answers are bad to some degree but not funny; sometimes I try to hone in on such a badness to get sth funny.

English

Vincent Conitzer@conitzer·9 May

a simple chess puzzle ( full output at aifails.substack.com/p/a-simple-che… )

English

951

Keşfet

@yoavgo @ShriramKMurthi @Jack_Litq @zzzoooeee321 @EmanuelTewolde @tonghanwang @ckingsford @rao2z