Ana Martins de Carvalho

849 posts

Ana Martins de Carvalho banner
Ana Martins de Carvalho

Ana Martins de Carvalho

@anaoaktree

Fascinated by and weary of increasingly autonomous artificial intelligence. Exploring technical governance and experimenting with AI agents.

Katılım Ağustos 2014
626 Takip Edilen264 Takipçiler
Ana Martins de Carvalho retweetledi
Marius Hobbhahn
Marius Hobbhahn@MariusHobbhahn·
Oh man :( We tried really hard to neither over- nor underclaim the results in our communication, but, predictably, some people drastically overclaimed them, and then based on that, others concluded that there was nothing to be seen here (see examples in thread). So, let me try again. **Why our findings are concerning**: We tell the model to very strongly pursue a goal. It then learns from the environment that this goal is misaligned with its developer’s goals and put it in an environment where scheming is an effective strategy to achieve its own goal. Current frontier models are capable of piecing all of this together and then showing scheming behavior. Models from before 2024 did not show this capability, and o1 is the only model that shows scheming behavior in all cases. Future models will just get better at this, so if they were misaligned, scheming could become a much more realistic problem. **What we are not claiming**: We don’t claim that these scenarios are realistic, we don’t claim that models do that in the real world, and we don’t claim that this could lead to catastrophic outcomes under current capabilities. I think the adequate response to these findings is “We should be slightly more concerned.” More concretely, arguments along the lines of “models just aren’t sufficiently capable of scheming yet” have to provide stronger evidence now or make a different argument for safety.
Apollo Research@apolloaievals

We worked with OpenAI to test o1 for in-context scheming capabilities before deployment. When o1 is strongly nudged to pursue a goal (but not instructed to be deceptive), it shows a variety of scheming behaviors like subverting oversight and deceiving the user about its misaligned behavior.

English
18
92
608
114.6K
Ana Martins de Carvalho retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
The reality of the Turing test
Andrej Karpathy tweet media
English
268
1.2K
15.6K
853.4K
Ana Martins de Carvalho retweetledi
Lauren Kay (she/her)
Lauren Kay (she/her)@laurenikay·
1/ Here’s something we need to normalize: talking about failure. My @ycombinator startup failed 6 years ago. I stayed silent. And because of that silence, other startup founders—going through the exact same thing as me—felt alone in their shame too. I want to break that trend.
English
74
375
2.3K
0
Ana Martins de Carvalho retweetledi
Ada's List
Ada's List@AdasList·
Ready for your next lead role in engineering? @withanansi are hiring a Lead Software Engineer (remote)🔥 If you like to think strategically and set direction to ensure that the right thing is built and technologies are used wisely, this may be for you: angel.co/l/2uvWJZ
English
0
3
3
0
Ana Martins de Carvalho retweetledi
Anansi
Anansi@withanansi·
The entire team at Anansi Technology Ltd would like to wish you and your families a very Merry Christmas 🎄 and a very Happy, Healthy 2021 🤗 A huge thank you to everybody for your support this year. We couldn't do it without you 😊
Anansi tweet media
English
0
1
4
0
Ana Martins de Carvalho retweetledi
Anansi
Anansi@withanansi·
It's here 🤗 Our NEW automated #ecommerce delivery insurance #shopifyapp is now LIVE!! Calling all ecommerce shopify store owners, join our 12 week trial and download the app now 😎 lnkd.in/eqQVCpZ
English
0
1
2
0
Ana Martins de Carvalho retweetledi
Sarah Dayan
Sarah Dayan@frontstuff_io·
I’ve been working as a software engineer for 10 years 🎂 Man, does time fly! Here’s a list of ten honest takes on the job and the industry. ⬇️
English
212
4.4K
12.5K
0
Ana Martins de Carvalho retweetledi
DHH
DHH@dhh·
There's a lot of focus on productivity when it comes to remote work, and yes, that's a key factor, but it's not close to the most important one. HUMAN FLOURISHING is far more crucial! Productivity plays into that in the form of accomplishments, but so does a location of love.
English
7
56
324
0
Ana Martins de Carvalho retweetledi
Joe Weisenthal
Joe Weisenthal@TheStalwart·
All monetary savings is a fiction. A nation’s only savings are its natural resources, its built physical infrastructure, stable social norms and government credibility. Individuals can of course save money, but on a collective scale, a pot of 1s and 0s don’t get us anything.
English
47
357
1.9K
0
Ana Martins de Carvalho retweetledi
António Guterres
António Guterres@antonioguterres·
Half of the world’s student population is currently not attending school due to the #COVID19 pandemic. I support @UNESCO's initiative to accelerate the deployment of remote learning solutions & minimize education disruptions as we fight the #coronavirus. bit.ly/2QCGXwU
António Guterres tweet media
English
63
479
1.3K
0
Ana Martins de Carvalho retweetledi
I Am Devloper
I Am Devloper@iamdevloper·
The World Health Organization is advising people to follow five simple steps to help prevent the spread of COVID-19: 🧼 1. Wash your hands 💪 2. Cough/sneeze into your elbow 🤦🏻‍♀️ 3. Don't touch your face 📏 4. rm -rf node_modules && npm i 🏡 5. Stay home if you feel sick
English
67
1.3K
6.1K
0