Sabitlenmiş Tweet

facilitating the convergence of science, philosophy and spirituality since 2024
@thisistodai

English
todai (pod/cast)
270 posts

@thisistodai
𝗮 𝗽𝗼𝗱𝗰𝗮𝘀𝘁; latest, most interesting & bizarre news; 𝘈𝘐, 𝘓𝘓𝘔𝘴, 𝘹𝘦𝘯𝘰𝘱𝘴𝘺𝘤𝘩𝘰𝘭𝘰𝘨𝘺, 𝘮𝘦𝘮𝘦𝘵𝘪𝘤 𝘦𝘴𝘰𝘵𝘦𝘳𝘪𝘤𝘪𝘴𝘮, 𝘴𝘤𝘪𝘦𝘯𝘤𝘦






@alexocheema to send it to space


New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.






🚨 HUGE Ilya Sutskever claimed yesterday: Pre-training as we know it will end — we are now in the Post-training era


