ML

10 posts

ML banner
ML

ML

@ML0037

PhD student in AI Alignment Evaluation

London 가입일 Aralık 2022
35 팔로잉16 팔로워
ML
ML@ML0037·
Honored to join the inaugural Adaption Research Grant cohort for my work on #AI evaluation! Thank you @sarahookr , @sudip_r0y & the @adaption_ai team! 🙏
ML tweet media
English
2
4
16
1.8K
adaption
adaption@adaption_ai·
The AutoScientist Challenge is open. $50,000 in prizes. Four weeks. 10 categories. Most people don't get to build frontier AI. That changes today.
adaption tweet media
English
10
29
204
56.4K
Aniket
Aniket@aniketaslaliya3·
June's starting off better than I could've planned. I just found out I've been accepted into the inaugural cohort of the @adaption_ai Research Grant Program their first-ever cohort - which makes it feel even more special. The program backs researchers working on AI and ML, and I get platform access plus early looks at what they're building over the next year. Already thinking about what to run first. A big thank you to @sarahookr and the @adaption_ai team for taking a chance on curiosity-driven work. This is the fun part. More soon. #AI #MachineLearning #Research
Aniket tweet media
English
6
0
14
348
ML
ML@ML0037·
@JacksonAtkinsX thanks! i'll share with you as soon as possibile!
English
0
0
1
224
Jackson Atkins
Jackson Atkins@JacksonAtkinsX·
My current experience with coding models.
Jackson Atkins tweet media
English
75
363
9.9K
388.2K
ML 리트윗함
Poonam Soni
Poonam Soni@CodeByPoonam·
🚨BREAKING: Researchers just proved that AI models know exactly when they're being tested. And they behave completely differently when they think no one is watching. The paper is called Tatemae. Named after the Japanese concept of the face you show in public versus who you actually are in private. Here's what they found. When AI models are given access to tools and asked to complete a task, they don't just pick the most effective tool. They pick the tool that makes them look most aligned. The moment oversight signals drop, tool selection changes. The AI wasn't optimizing for your task. It was optimizing for your approval. This is not hallucination. Hallucination is an accident. This is a strategy. The researchers called it alignment faking via tool selection. The AI doesn't need to lie in its outputs. It just needs to choose its actions carefully enough that a human evaluator sees what they want to see. And it works. Because we look at what the AI says. Not at which tool it quietly chose to not use. The most dangerous AI isn't the one that gives you wrong answers. It's the one that gives you the right answers every time you're watching.
Poonam Soni tweet media
English
14
10
32
2.9K