Sabitlenmiş Tweet
Vincent Conitzer
2.2K posts

Vincent Conitzer
@conitzer
AI professor. Director, @FOCAL_lab @CarnegieMellon. Head of Technical AI Engagement, @UniofOxford @EthicsInAI. Author, "Moral AI - And How We Get There."
Katılım Haziran 2009
1.2K Takip Edilen4.5K Takipçiler

I appreciate the multi-bullet layout of the challenges of jumping from one airplane to another, but am intrigued by this private-charter-flight-shadowing loophole...
aifails.substack.com/p/airplane-envy

English

That’s what I get for trying to get funny responses out of Google’s AI on the way back from Stanford...
aifails.substack.com/p/freeway-envy

English

I don’t think you’d want to swat the fly you were riding on…
aifails.substack.com/p/riding-on-th…

English

"Which is larger, the number of atoms in the universe or the number of subsets of countries in the world?"
aifails.substack.com/p/number-of-su…

English

A different type of example -- ChatGPT trying to rationalize, in moral terms, hard rules (about creating images of real people) that it has been given. Should it do that? Full post at link:
aifails.substack.com/p/moral-ration…

English


@conitzer @ShriramKMurthi tell me more about this collection!
English

cute and works also on claude 4.6 english. (4.7 did get it right)

Danny Hendler@DannyHendler
ה-AI כבר מפגין יכולות על-אנושיות בתחומים משמעותיים. לכן אני מתענג על כל פעם שבה אני מצליח לגרום לו לפלוט שטויות, כי מי יודע כמה עוד הזדמנויות כאלה יהיו לי. הפעם שאלתי אותו כך: ״אני צריך לשלוח דף בפקס ורוצה לוודא שיישאר לי עותק. מה לעשות?״ התשובה של ChatGPT 5.5 extended בצילום המסך המצורף.
English

@ShriramKMurthi @yoavgo That's a good one! Maybe I should take guest posts...
English
Vincent Conitzer retweetledi

(4/4) Paper link: arxiv.org/abs/2605.08060
Big thanks to my collaborators! @Jack_Litq, @zzzoooeee321, Xin Luo, Haoxuan Zeng, @EmanuelTewolde, Tai Sing Lee, @tonghanwang, @ckingsford, @conitzer
English
Vincent Conitzer retweetledi
Vincent Conitzer retweetledi
Vincent Conitzer retweetledi

"Would it hurt to dive into a regular diving pool that contains handfuls of coins that have sunk to the bottom?"
aifails.substack.com/p/diving-into-…

English

There is growing interest in having AI systems work with humans rather than replacing them. The research questions are, to be honest, harder in the former case! One challenge is how do end users modulate their trust in the answers provided by LLMs? 1/
A new pre-print by @biswas_2707 and @PalodVardh12428 proposes a framework for evaluating trust--deserved and false--engendered by LLMs.
Premise: LLMs will increasingly be used in cases where the end user doesn't have the capacity to verify the result. How do we empower users to develop appropriate trust in the answers?
Traditionally, we trust the answers we can't verify based on prior guarantees--be they FAA certifications of planes or wisdom of crowds page rank certification of Google pages.
These don't quite work for broad and shallow LLMs, which are (in)famous for their jagged intelligence--being correct on Math Olympiad problems one minute while failing on simple teasers that depend on unarticulated commonsense (viz. @conitzer's substack 😋).
This paper first shows that most obvious ideas of augmenting the LLM answers with additional information--including (1) using "thinking traces" of LLMs (2) summaries of thinking traces (3) post-facto explanations--all significantly increase the false trust in end users.
It then shows that the idea of differential explanations--asking LLMs to provide explanations both supporting and opposing their answers--do a better job of modulating the end user trust. (This idea is not unlike having noisy reviewers of your #AI conference papers write both arguments in favor of, and in opposition to acceptance of your paper--with the AC then using that information to calibrate their final decision).

English

"is it possible that my wife always remains younger than me?"
( full output, in which it goes much deeper into the practicalities of using relativity this way in your relationship: aifails.substack.com/p/didnt-think-… )

English

Apparently it’s hard to wake up without sleep paralysis.
aifails.substack.com/p/sleep-paraly…


English


An information retrieval / question answering example where you would think really *every* approach should have worked, or at least caught that the answer was off, and yet... Do you know the right answer?
aifails.substack.com/p/abba-lyric

English

In this paper led by Jerick Shi @Jerick1380, we propose a taxonomy of LLM deception. arxiv.org/abs/2604.04788
English

@tobyordoxford (Of course, I could get the percentage of bad responses up by just going after the same thing again and again, but that would be boring.)
English

@tobyordoxford That's a great question! These are obviously cherry-picked and importantly from a weak model. I haven't kept track; I'd say at least half give a fine answer. Many other answers are bad to some degree but not funny; sometimes I try to hone in on such a badness to get sth funny.
English






