AI alignment is not only a reward-design problem; it is an incentive-design problem.
In our new paper, we propose a strategic post-training signal for agentic AI pipelines, inspired by the economics of deterrence and enforcement.
With my students @Rohit_Writes, @JoshuaLinML, and colleague Mark Braverman, link to paper:
Life update🙂: I’m on sabbatical from Princeton and have started at OpenAI, working on building AGI. Happy to be back in the Bay Area after 6 years! Bay Area friends—DMs open for food & hikes.
Excited to share that I’ve been promoted to Associate Professor with tenure at Princeton!🎉 6 years may not be long, but AI research has evolved significantly during this period. Grateful to all my students, collaborators, colleagues for being with me on this remarkable journey!