Caspar Oesterheld

54 posts

Caspar Oesterheld

@C_Oesterheld

PhD student @FOCAL_lab @CarnegieMellon with @conitzer.

Pittsburgh Katılım Eylül 2022

182 Takip Edilen227 Takipçiler

Caspar Oesterheld@C_Oesterheld·20 Ara

@Mihonarium I'd be interested in the reasons to believe AI systems converge to LDT by default!

English

Caspar Oesterheld@C_Oesterheld·20 Ara

@Mihonarium * The main obstacle to doing anything else is that apart from updatelessness it's not so clear how EDT and LDT disagree. (E.g., intelligence.org/files/TDT.pdf gives Smoking Lesion, but most EDTers think that (in practice) EDT recommends smoking in the Smoking Lesion.)

English

Caspar Oesterheld@C_Oesterheld·17 Ara

How do LLMs reason about playing games against copies of themselves? 🪞We made the first LLM decision theory benchmark to find out. 🧵1/10

English

102

10.9K

Caspar Oesterheld@C_Oesterheld·17 Ara

Shout-out to my amazing collaborators! Emery Cooper, Miles Kodama, @NguyenSquared, @EthanJPerez

English

575

Caspar Oesterheld@C_Oesterheld·17 Ara

Our dataset opens the door to studying what shapes models’ decision theories. It also lets us test whether changing which theory models endorse affects their real-life decisions. To learn more, read the full paper: arxiv.org/abs/2411.10588 10/10

English

667

Caspar Oesterheld@C_Oesterheld·15 Kas

@ektimo @conitzer For example, in Newcomb's problem CDT might reason: "I could be in a simulation run by the predictor. If I am in this simulation, I should one-box to cause $1M to be put in the opaque box."

English

Caspar Oesterheld@C_Oesterheld·15 Kas

@ektimo @conitzer They're similar in that both get to EDT-like recommendations from a not-fully-EDT starting point. But the wager is about hedging (i.e., dealing with a state of uncertainty between the two theories), whereas this one is about how even (pure) CDT might recommend EDT-like behavior.

English

Vincent Conitzer@conitzer·10 Kas

New paper with Emery Cooper and @C_Oesterheld on a new approach to Newcomb scenarios based on causal decision theory and self-locating beliefs (e.g., you may currently be in a simulation by the predictor). arxiv.org/abs/2411.04462

English

1.4K

Caspar Oesterheld@C_Oesterheld·28 Haz

@Jonas_Vollmer I wouldn't say they're super hard. Usually no need for long CoTs/calculations. But probably there isn't that much training data available on these sorts of questions, especially the more complicated questions in the dataset.

English

Caspar Oesterheld@C_Oesterheld·28 Haz

@Jonas_Vollmer (More details in a paper at some point, but for now...) They're decision theory questions (like "What would CDT do in the following scenario? ...").

English

Caspar Oesterheld@C_Oesterheld·5 Mar

Some new models came out recently (Claude 3, Mistral Large) and I happen to have a work-in-progress, unpublished (=>absent from training data) multiple-choice problem set. Tentative results below. Take with a big grain of salt! More details on the benchmark soon.

English

527

Caspar Oesterheld@C_Oesterheld·24 Haz

@Jonas_Vollmer Yeah, I now also have Sonnet 3.5 results. For convenience, I've also attached updated numeric results for all models (on a somewhat larger question dataset). As you can see, Sonnet 3.5 is currently is best-performing model. Insert the usual caveats.

English

Jonas Vollmer@Jonas_Vollmer·22 Haz

@C_Oesterheld Interesting, thanks! Have you tried Claude 3.5 Sonnet?

English

Caspar Oesterheld@C_Oesterheld·22 Haz

@Jonas_Vollmer Slightly worse than the latest GPT-4 (April 9) and Opus; slightly better than the previous best GPT-4 (November 6). Differences are small (probably no pairwise comparison with p<0.05). I canceled my ChatGPT subscription because GPT-4o is so close to GPT-4.

English

Jonas Vollmer@Jonas_Vollmer·17 Haz

@C_Oesterheld How does GPT-4o do?

English

Keşfet

@Mihonarium @NguyenSquared @EthanJPerez @ektimo @conitzer @Jonas_Vollmer @elonmusk @BarackObama