Paul de Font-Reaulx

75 posts

Paul de Font-Reaulx banner
Paul de Font-Reaulx

Paul de Font-Reaulx

@PReaulx

PhDing in philosophy and cognitive science @Umich. Researching desire, value, and reward in humans and AIs.

Katılım Eylül 2016
636 Takip Edilen217 Takipçiler
Paul de Font-Reaulx retweetledi
Scale Labs
Scale Labs@ScaleAILabs·
ICLR 2026 Accepted Paper: MoReBench. Models perform well on math and coding benchmarks, but measuring moral reasoning is still a challenge. To bridge this gap, we introduce MoReBench: a benchmark for Evaluating Procedural and Pluralistic Moral Reasoning in LLMs.
Scale Labs tweet media
English
1
8
34
4.8K
Paul de Font-Reaulx
Paul de Font-Reaulx@PReaulx·
@RichardMCNgo My sense is that Peter Carruthers's recent books are some of the best discussions of this. I'm also working on this myself.
English
0
0
1
18
Richard Ngo
Richard Ngo@RichardMCNgo·
Unfortunately my sense is that sociology has been so lost to postmodernism that most rigorous sociology is done by economists. But economic frameworks are just very bad at describing value change. Would love to be proved wrong on either point though.
English
6
0
20
1.9K
Richard Ngo
Richard Ngo@RichardMCNgo·
The most interesting thing about reinforcement learning is how rewards and punishments change an agent’s values. Unfortunately in ML there’s a common conceptual confusion which makes this dynamic hard to even describe: the idea that the reward function *is* the agent’s values.
English
6
1
39
4.3K
Paul de Font-Reaulx retweetledi
Apart Research
Apart Research@apartresearch·
Hackathon Day 2. We are diving into the mechanics of manipulation. Two expert sessions today. 13:00 GMT | Jan Batzner (Weizenbaum) Topic: Sycophancy. Detecting when models lie just to agree with the user. 18:00 GMT | Paul de Font-Reaulx (@PReaulx) (U. Michigan) Topic: Human Deliberation. Auditing how AI reshapes the way we make decisions. Don't build in the dark. Use this research to sharpen your tools. Links to join: (f.mtr.cool/ikbhieizkf) #AIResearch #Hackathon #JanBatzner #PauldeFont-Reaulx
Apart Research tweet media
English
0
1
0
234
Paul de Font-Reaulx
Paul de Font-Reaulx@PReaulx·
@StefanFSchubert Just to be a little bit of a contrarian, although I've started using Opus almost all the time, I've had pretty good writing success with Gemini and recently was really happy with some technical reports that GPT 5.2 produced. I feel like I haven't found equilibrium yet tho
English
0
0
1
32
Stefan Schubert
Stefan Schubert@StefanFSchubert·
Opus 4.5 is really much more helpful at writing than GPT-5.2 and Gemini 3. It seems under-discussed how big the difference is.
English
80
8
575
60.4K
Jackson Kernion
Jackson Kernion@JacksonKernion·
I'm trying to figure out what to care about next. I joined Anthropic 4+ years ago, motivated by the dream of building AGI. I was convinced from studying philosophy of mind that we're approaching sufficient scale and that anything that can be learned can be learned in an RL env.
English
155
46
1.9K
473.1K
Paul de Font-Reaulx
Paul de Font-Reaulx@PReaulx·
Can we rationally trust our future selves? New works in philosophy has published a short blog post version of my recent paper “Do Expected Utility Maximizers Have Commitment Issues?” in Philosophy and Phenomenological Research. Link below.
Paul de Font-Reaulx tweet media
English
1
1
13
2.2K
Paul de Font-Reaulx retweetledi
Yu Ying Chiu (Kelly Chiu)
Yu Ying Chiu (Kelly Chiu)@kellychiuyy·
New paper out with @Scale_AI! Introducing MoReBench - the first-ever benchmark to evaluate procedural moral reasoning in LLMs. MoReBench focuses on how LLMs reason, not just what they decide. We reveal surprising gaps in frontier models' moral reasoning that scaling laws & existing benchmarks miss entirely, and encourage more research around CoT monitoring and robust capability building. This collaboration spanned @UW @nyuniversity @harvard @stanford @mit @cais & more 🧠⚖️
Yu Ying Chiu (Kelly Chiu) tweet mediaYu Ying Chiu (Kelly Chiu) tweet mediaYu Ying Chiu (Kelly Chiu) tweet media
English
5
22
125
16.7K
Paul de Font-Reaulx retweetledi
Scale AI
Scale AI@scale_AI·
New Scale research: Do AI models actually reason in ways humans can trust for real-world decisions? Introducing MoReBench, the first benchmark for procedural moral reasoning in LLMs, measuring not just what models decide, but how they reason through moral ambiguity.
Scale AI tweet media
English
7
14
48
11.6K
avantika
avantika@avaa411·
Delayed life update: so excited to share that I’ve joined @cosmos_inst as Chief of Staff in NYC, working with @mbrendan1 and team. If you’re interested in AI, philosophy, AI philosophy, or building systems for human flourishing, would love to chat☕️
English
28
11
230
30.5K
Nuño Sempere (Asunción)
Nuño Sempere (Asunción)@NunoSempere·
Irony of fate? A month ago we recorded a podcast with main character Ivan Vendrov on the dangers of recommender systems, like them making people scared to go outside so that they're more glued to their screens. No better time to post than now:
English
4
21
182
28.8K
Paperpile
Paperpile@paperpile·
@PReaulx 📆 Work a few focused blocks each day rather than long hours that aren’t focused 🎯 Set expectations for each work session ⌛ Adding extra work hours may make you less efficient 💫 Find a work rhythm that suits your needs pprpl.co/3SgSIt4
English
1
0
5
841
Paperpile
Paperpile@paperpile·
As a PhD student, you work independently on a long-term project, often with limited feedback that clearly indicates progress. ✍️ And this can be stressful. But, there is value in consistent, focused work, even for just a few hours a day, via @PReaulx
English
4
10
61
11.2K
Paul de Font-Reaulx
Paul de Font-Reaulx@PReaulx·
How much can you get done by working a few hours a day regularly? I ran the numbers and wrote it up in a short post (link below). Here is a short summary 🧵 1/7
Paul de Font-Reaulx tweet media
English
2
3
19
6.5K
Paul de Font-Reaulx
Paul de Font-Reaulx@PReaulx·
For fun I also compared the 4h/day number to common free time activities, because planned fun is indeed the best fun. 6/7
Paul de Font-Reaulx tweet media
English
1
0
6
507