Vincent Conitzer

2.2K posts

Vincent Conitzer

Vincent Conitzer

@conitzer

AI professor. Director, @FOCAL_lab @CarnegieMellon. Head of Technical AI Engagement, @UniofOxford @EthicsInAI. Author, "Moral AI - And How We Get There."

Katılım Haziran 2009
1.2K Takip Edilen4.5K Takipçiler
Sabitlenmiş Tweet
Vincent Conitzer
Vincent Conitzer@conitzer·
There is now a paperback version of our Moral AI book!
Vincent Conitzer tweet media
English
0
1
17
1.7K
Vincent Conitzer
Vincent Conitzer@conitzer·
A different type of example -- ChatGPT trying to rationalize, in moral terms, hard rules (about creating images of real people) that it has been given. Should it do that? Full post at link: aifails.substack.com/p/moral-ration…
Vincent Conitzer tweet media
English
0
0
2
298
Vincent Conitzer retweetledi
Jiayuan Liu
Jiayuan Liu@jiayuan_liu_·
(3/4) The mechanism is not just “too much context.” It is what the agents remember: replacing histories with synthetic cooperative records restores cooperation, and ablating explicit CoT reasoning often reduces the collapse. We call this the memory curse.
English
1
2
4
437
Vincent Conitzer retweetledi
Jiayuan Liu
Jiayuan Liu@jiayuan_liu_·
(2/4) Surprisingly, longer recall often degrades cooperation. Across 7 LLMs and 4 repeated social dilemma games, agents with longer histories often shift away from forward-looking cooperation and toward retrospective grievance-tracking.
English
1
2
3
522
Vincent Conitzer retweetledi
Jiayuan Liu
Jiayuan Liu@jiayuan_liu_·
(1/4) Can remembering more of the past make AI agents less cooperative? In our new paper, we study LLM agents in repeated social dilemmas. The key variable is not how many rounds they play, but how much prior interaction history they can access when making each decision.
Jiayuan Liu tweet media
English
1
3
8
570
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
There is growing interest in having AI systems work with humans rather than replacing them. The research questions are, to be honest, harder in the former case! One challenge is how do end users modulate their trust in the answers provided by LLMs? 1/ A new pre-print by @biswas_2707 and @PalodVardh12428 proposes a framework for evaluating trust--deserved and false--engendered by LLMs. Premise: LLMs will increasingly be used in cases where the end user doesn't have the capacity to verify the result. How do we empower users to develop appropriate trust in the answers? Traditionally, we trust the answers we can't verify based on prior guarantees--be they FAA certifications of planes or wisdom of crowds page rank certification of Google pages. These don't quite work for broad and shallow LLMs, which are (in)famous for their jagged intelligence--being correct on Math Olympiad problems one minute while failing on simple teasers that depend on unarticulated commonsense (viz. @conitzer's substack 😋). This paper first shows that most obvious ideas of augmenting the LLM answers with additional information--including (1) using "thinking traces" of LLMs (2) summaries of thinking traces (3) post-facto explanations--all significantly increase the false trust in end users. It then shows that the idea of differential explanations--asking LLMs to provide explanations both supporting and opposing their answers--do a better job of modulating the end user trust. (This idea is not unlike having noisy reviewers of your #AI conference papers write both arguments in favor of, and in opposition to acceptance of your paper--with the AC then using that information to calibrate their final decision).
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media
English
5
12
36
4K
Vincent Conitzer
Vincent Conitzer@conitzer·
An information retrieval / question answering example where you would think really *every* approach should have worked, or at least caught that the answer was off, and yet... Do you know the right answer? aifails.substack.com/p/abba-lyric
Vincent Conitzer tweet media
English
0
0
0
233
Vincent Conitzer
Vincent Conitzer@conitzer·
@tobyordoxford (Of course, I could get the percentage of bad responses up by just going after the same thing again and again, but that would be boring.)
English
0
0
1
9
Vincent Conitzer
Vincent Conitzer@conitzer·
@tobyordoxford That's a great question! These are obviously cherry-picked and importantly from a weak model. I haven't kept track; I'd say at least half give a fine answer. Many other answers are bad to some degree but not funny; sometimes I try to hone in on such a badness to get sth funny.
English
1
0
1
17