AI Safety Papers

328 posts

AI Safety Papers banner
AI Safety Papers

AI Safety Papers

@safe_paper

Sharing the latest in AI safety research.

arXiv Katılım Mayıs 2023
265 Takip Edilen2.2K Takipçiler
AI Safety Papers
AI Safety Papers@safe_paper·
Natural Emergent Misalignment from Reward Hacking in Production RL Monte MacDiarmid, Benjamin Wright (@RightBenguin), @JonathanUesato, @JoeJBenton, Jon Kutasov, Sara Price (@sprice354_), Naia Bouscal, Sam Bowman (@sleepinyourhat), @TrentonBricken, Alex Cloud, Carson Denison, Johannes Gasteiger (@gasteigerjo), @RyanPGreenblatt, @janleike, @Jack_W_Lindsey, Vlad Mikulik, @EthanJPerez, @alexrodriguesca, Drake Thomas (@MaskedTorah), @albertwebson, Daniel Ziegler (@d_m_ziegler), Evan Hubinger (@EvanHub) @AnthropicAI @redwood_ai
AI Safety Papers tweet media
English
1
4
8
813