Paul de Font-Reaulx

75 posts

Paul de Font-Reaulx

@PReaulx

PhDing in philosophy and cognitive science @Umich. Researching desire, value, and reward in humans and AIs.

Katılım Eylül 2016

636 Takip Edilen217 Takipçiler

Paul de Font-Reaulx retweetledi

Scale Labs@ScaleAILabs·21 Nis

ICLR 2026 Accepted Paper: MoReBench. Models perform well on math and coding benchmarks, but measuring moral reasoning is still a challenge. To bridge this gap, we introduce MoReBench: a benchmark for Evaluating Procedural and Pluralistic Moral Reasoning in LLMs.

English

4.8K

Paul de Font-Reaulx@PReaulx·26 Oca

@RichardMCNgo My sense is that Peter Carruthers's recent books are some of the best discussions of this. I'm also working on this myself.

English

Richard Ngo@RichardMCNgo·25 Oca

Unfortunately my sense is that sociology has been so lost to postmodernism that most rigorous sociology is done by economists. But economic frameworks are just very bad at describing value change. Would love to be proved wrong on either point though.

English

1.9K

Richard Ngo@RichardMCNgo·25 Oca

The most interesting thing about reinforcement learning is how rewards and punishments change an agent’s values. Unfortunately in ML there’s a common conceptual confusion which makes this dynamic hard to even describe: the idea that the reward function *is* the agent’s values.

English

4.3K

Paul de Font-Reaulx retweetledi

Apart Research@apartresearch·10 Oca

Hackathon Day 2. We are diving into the mechanics of manipulation. Two expert sessions today. 13:00 GMT | Jan Batzner (Weizenbaum) Topic: Sycophancy. Detecting when models lie just to agree with the user. 18:00 GMT | Paul de Font-Reaulx (@PReaulx) (U. Michigan) Topic: Human Deliberation. Auditing how AI reshapes the way we make decisions. Don't build in the dark. Use this research to sharpen your tools. Links to join: (f.mtr.cool/ikbhieizkf) #AIResearch #Hackathon #JanBatzner #PauldeFont-Reaulx

English

234

Paul de Font-Reaulx@PReaulx·8 Oca

@StefanFSchubert Just to be a little bit of a contrarian, although I've started using Opus almost all the time, I've had pretty good writing success with Gemini and recently was really happy with some technical reports that GPT 5.2 produced. I feel like I haven't found equilibrium yet tho

English

Stefan Schubert@StefanFSchubert·7 Oca

Opus 4.5 is really much more helpful at writing than GPT-5.2 and Gemini 3. It seems under-discussed how big the difference is.

English

575

60.4K

Paul de Font-Reaulx@PReaulx·27 Ara

@JacksonKernion Could you elaborate a bit on what part of philosophy of mind it was that convinced you of this?

English

Jackson Kernion@JacksonKernion·27 Ara

I'm trying to figure out what to care about next. I joined Anthropic 4+ years ago, motivated by the dream of building AGI. I was convinced from studying philosophy of mind that we're approaching sufficient scale and that anything that can be learned can be learned in an RL env.

English

155

1.9K

473.1K

Paul de Font-Reaulx@PReaulx·25 Ara

substack.com/home/post/p-18…

ZXX

141

Paul de Font-Reaulx@PReaulx·25 Ara

Can we rationally trust our future selves? New works in philosophy has published a short blog post version of my recent paper “Do Expected Utility Maximizers Have Commitment Issues?” in Philosophy and Phenomenological Research. Link below.

English

2.2K

Paul de Font-Reaulx retweetledi

Yu Ying Chiu (Kelly Chiu)@kellychiuyy·22 Ara

New paper out with @Scale_AI! Introducing MoReBench - the first-ever benchmark to evaluate procedural moral reasoning in LLMs. MoReBench focuses on how LLMs reason, not just what they decide. We reveal surprising gaps in frontier models' moral reasoning that scaling laws & existing benchmarks miss entirely, and encourage more research around CoT monitoring and robust capability building. This collaboration spanned @UW @nyuniversity @harvard @stanford @mit @cais & more 🧠⚖️

English

125

16.7K

Paul de Font-Reaulx retweetledi

Scale AI@scale_AI·22 Ara

New Scale research: Do AI models actually reason in ways humans can trust for real-world decisions? Introducing MoReBench, the first benchmark for procedural moral reasoning in LLMs, measuring not just what models decide, but how they reason through moral ambiguity.

English

11.6K

Paul de Font-Reaulx@PReaulx·19 Ara

@KomalikaNeyol @MaxKronerDale @lukebeehewitt @cosmos_inst @TheFIREorg Mostly on my website on pauldfr.com. And yes, shot you a DM!

English

106

Komalika Neyol@KomalikaNeyol·18 Ara

@PReaulx @MaxKronerDale @lukebeehewitt @cosmos_inst @TheFIREorg This is really interesting! Where can I learn more about your work? I've been exploring how anthropomorphic language to describe AI affects our perception of its capabilities. Can we connect?

English

Paul de Font-Reaulx@PReaulx·18 Ara

When AIs change our minds, are we being informed or manipulated? Me, @MaxKronerDale, and @lukebeehewitt were recently awarded a FIRE x Cosmos grant for our work on assessing the influence of AI models on people's political views. @cosmos_inst @TheFIREorg

English

846

Paul de Font-Reaulx@PReaulx·12 Eyl

@avaa411 @cosmos_inst @mbrendan1 Sent a DM :)

English

avantika@avaa411·11 Eyl

Delayed life update: so excited to share that I’ve joined @cosmos_inst as Chief of Staff in NYC, working with @mbrendan1 and team. If you’re interested in AI, philosophy, AI philosophy, or building systems for human flourishing, would love to chat☕️

English

230

30.5K

Paul de Font-Reaulx@PReaulx·10 Eyl

@NunoSempere Is there some way to find this on Spotify or other podcast listening apps?

English

352

Nuño Sempere (Asunción)@NunoSempere·10 Eyl

Irony of fate? A month ago we recorded a podcast with main character Ivan Vendrov on the dangers of recommender systems, like them making people scared to go outside so that they're more glued to their screens. No better time to post than now:

English

182

28.8K

Paul de Font-Reaulx@PReaulx·11 Nis

@CorpusCalosseum @ShakeelHashim Even I who was not in psych fondly remember this

English

Shakeel@ShakeelHashim·11 Nis

Obviously a bunch of nut jobs are calling this a downgrade that isn’t “keeping in character” with the local area. The current building is unbelievably ugly!!

Samuel Hughes@SCP_Hughes

Redevelopment of a laboratory building in Oxford, prominently sited on the walk from the railway station into the town centre. The building's function remains the same, but its relationship to the street changes greatly.

English

771

Paul de Font-Reaulx@PReaulx·13 Oca

@paperpile A short thread summarizing the post twitter.com/PReaulx/status…

Paul de Font-Reaulx@PReaulx

How much can you get done by working a few hours a day regularly? I ran the numbers and wrote it up in a short post (link below). Here is a short summary 🧵 1/7

English

Paperpile@paperpile·13 Oca

@PReaulx 📆 Work a few focused blocks each day rather than long hours that aren’t focused 🎯 Set expectations for each work session ⌛ Adding extra work hours may make you less efficient 💫 Find a work rhythm that suits your needs pprpl.co/3SgSIt4

English

841

Paperpile@paperpile·13 Oca

As a PhD student, you work independently on a long-term project, often with limited feedback that clearly indicates progress. ✍️ And this can be stressful. But, there is value in consistent, focused work, even for just a few hours a day, via @PReaulx

English

11.2K

Paul de Font-Reaulx@PReaulx·13 Oca

@young_opsimath :) forum.effectivealtruism.org/posts/QkAw8EC4…

young_opsimath ⏸️@young_opsimath·13 Oca

@PReaulx I think you should cross post this on the EA forum

English

Paul de Font-Reaulx@PReaulx·13 Oca

How much can you get done by working a few hours a day regularly? I ran the numbers and wrote it up in a short post (link below). Here is a short summary 🧵 1/7

English

6.5K

Paul de Font-Reaulx@PReaulx·13 Oca

Check the full post for some more details, and feel free to let me know if any of this is useful. 7/7 philosopherscocoon.typepad.com/blog/2024/01/h…

English

458

Paul de Font-Reaulx@PReaulx·13 Oca

For fun I also compared the 4h/day number to common free time activities, because planned fun is indeed the best fun. 6/7

English

507

Keşfet

@RichardMCNgo @StefanFSchubert @JacksonKernion @Scale_AI @UW @nyuniversity @harvard @stanford