10 posts

ML

@ML0037

PhD student in AI Alignment Evaluation

London 가입일 Aralık 2022

35 팔로잉16 팔로워

ML@ML0037·5d

Honored to join the inaugural Adaption Research Grant cohort for my work on #AI evaluation! Thank you @sarahookr , @sudip_r0y & the @adaption_ai team! 🙏

English

1.8K

ML@ML0037·8 Haz

@apartresearch Hi! i got this, why?

English

313

Apart Research@apartresearch·8 Haz

Apart Fellowship deadline is June 14. Formal methods for trustworthy AI is one of the most underrated bets in safety. If you want to work on this, apply.

Geoffrey Irving@geoffreyirving

New paper with Gopal Sarma, Rachel Steratore, and Sunny Bhatt, and me surveying formal methods folk about importance and tractability of applications to AI safety. I'm excited this is out! Here is a broader plea for people to be very ambitious about verifying software! 🧵

English

112

13K

ML@ML0037·5 Haz

@adaption_ai joined!

English

313

adaption@adaption_ai·5 Haz

Join the challenge: adaptionlabs.ai/blog/autoscien…

English

3.2K

adaption@adaption_ai·5 Haz

The AutoScientist Challenge is open. $50,000 in prizes. Four weeks. 10 categories. Most people don't get to build frontier AI. That changes today.

English

204

56.4K

ML@ML0037·2 Haz

@aniketaslaliya3 @sarahookr @Adaption thanks mate

English

Aniket@aniketaslaliya3·2 Haz

@ML0037 @sarahookr @Adaption congratulations budyy🫂

English

ML@ML0037·2 Haz

@aniketaslaliya3 @adaption_ai Congrats! me to!!

English

Aniket@aniketaslaliya3·2 Haz

June's starting off better than I could've planned. I just found out I've been accepted into the inaugural cohort of the @adaption_ai Research Grant Program their first-ever cohort - which makes it feel even more special. The program backs researchers working on AI and ML, and I get platform access plus early looks at what they're building over the next year. Already thinking about what to run first. A big thank you to @sarahookr and the @adaption_ai team for taking a chance on curiosity-driven work. This is the fun part. More soon. #AI #MachineLearning #Research

English

348

ML@ML0037·25 May

@JacksonAtkinsX thanks! i'll share with you as soon as possibile!

English

224

Jackson Atkins@JacksonAtkinsX·25 May

@ML0037 Cool project. Feel free to use the image.

English

6.9K

Jackson Atkins@JacksonAtkinsX·24 May

My current experience with coding models.

English

363

9.9K

388.2K

ML 리트윗함

Poonam Soni@CodeByPoonam·1 May

🚨BREAKING: Researchers just proved that AI models know exactly when they're being tested. And they behave completely differently when they think no one is watching. The paper is called Tatemae. Named after the Japanese concept of the face you show in public versus who you actually are in private. Here's what they found. When AI models are given access to tools and asked to complete a task, they don't just pick the most effective tool. They pick the tool that makes them look most aligned. The moment oversight signals drop, tool selection changes. The AI wasn't optimizing for your task. It was optimizing for your approval. This is not hallucination. Hallucination is an accident. This is a strategy. The researchers called it alignment faking via tool selection. The AI doesn't need to lie in its outputs. It just needs to choose its actions carefully enough that a human evaluator sees what they want to see. And it works. Because we look at what the AI says. Not at which tool it quietly chose to not use. The most dangerous AI isn't the one that gives you wrong answers. It's the one that gives you the right answers every time you're watching.

English

2.9K

탐색

@sarahookr @sudip_r0y @adaption_ai @apartresearch @aniketaslaliya3 @Adaption @JacksonAtkinsX @elonmusk