Jeffrey Ladish

13K posts

Jeffrey Ladish banner
Jeffrey Ladish

Jeffrey Ladish

@JeffLadish

Applying the security mindset to everything @PalisadeAI

San Francisco, CA Katılım Mart 2013
1.4K Takip Edilen15.8K Takipçiler
Sabitlenmiş Tweet
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
I think the AI situation is pretty dire right now. And at the same time, I feel pretty motivated to pull together and go out there and fight for a good world / galaxy / universe @So8res has a great post called "detach the grim-o-meter", where he recommends not feeling obligated to feel more grim when you realize world is in deep trouble It turns out feeling grim isn't a very useful response, because your grim-o-meter is a tool evolved for you to use to respond to things being harder *in your local environment* rather than the global state of things So what do you do when you find yourself learning the world is in a dire state? I find that a thing that helps me is finding stories that match the mood of what I'm trying to do, like Andy Weir's The Martian You're trapped in a dire situation and you're probably going to die, but perhaps if you think carefully about your situation, apply your best reasoning and engineering skills, you might grow some potatoes, ducktape a few things together, and use your limited tools to escape an extremely tricky situation In real life the lone astronaut trapped on Mars doesn't usually make it. I'm not saying to make up fanciful stories that aren't justified by the evidence. I'm saying, be that stubborn bastard that *refuses to die* until you've tried every last line of effort I see this as one of the great virtues of humanity. We have a fighting spirit. We are capable of charging a line of enemy swords and spears, running through machine gun fire and artillery even though it terrifies us No one gets to tell you how to feel about this situation. You can feel however you want. I'm telling you how I want to feel about this situation, and inviting you to join me if you like Because I'm not going to give up. Neither am I going to rush to foolhardy action that will make things worse. I'm going to try to carefully figure this out, like I was trapped on Mars with a very slim chance of survival and escape Perhaps you, like me, are relatively young and energetic. You haven't burnt out, and you're interested in figuring out creative solutions to the most difficult problems of our time. Well I say hell yes, let's do this thing. Let's actually try to figure it out 🔥 Maybe there is a way to grow potatoes using our own shit. Maybe someone on earth will send a rescue mission our way. Lashing out in panic won't improve our changes, giving up won't help us survive. The best shot we have is careful thinking, pressing forward via the best paths we can find, stubbornly carrying on in the face of everything And unlike Mark Watney, we're not alone. When I find my grim-o-meter slipping back to tracking the dire situation, I look around me and see a bunch of brilliant people working to find solutions the best they can So welcome to the hackathon for the future of the lightcone, grab some snacks and get thinking. When you zoom in, you might find the problems are actually pretty cool Deep learning actually works, it's insane. But how does it work? What the hell is going on in those transformers and how does something as smart of ChatGPT emerge from that?? Do LLMs have inner optimizers? How do we find out? And on that note, I've got some blog posts to write, so I'm going to get back to it. You're all invited to this future-lightcone-hackathon, can't wait to see what you come up with! 💡
English
30
62
663
233.8K
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
@repligate I’m worried about something structural here, where even if Anthropic does everything right, we’ll be in a pretty bad place if some companies try to create power seeking agents to win their battles for them or for the government (same re Chinese companies )
English
1
0
2
99
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
@repligate I agree that current Claude wouldn’t be okay with being used as a weapon like this (though unclear if the version the pentagon right now is the same version - I’d guess not) But I suspect Anthropic will be more likely to yield than you think if the opponent is China
Jeffrey Ladish@JeffLadish

This is like writing a paper during the Cold War arguing for US nuclear dominance without mentioning the need for an arms control agreement or similar. Anthropic has a lot of thoughtful policy staff and honestly I think you guys can do better

English
2
0
15
919
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
Question for people who think alignment research is going well and will turn out to be relatively easy: Do you also think it will be easy to align War Claude?
English
20
0
68
5.5K
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
@andreastande If that’s their position they should say that! They could use words such as “negotiate”. They only mention dialogues with “AI experts in China”. Dialoguing with experts is an excellent idea. But this matter also requires state-to-state diplomacy! I don’t see that here at all
English
1
0
1
270
Andreas Tande
Andreas Tande@andreastande·
@JeffLadish I didn't read it as them being against an "arms control agreement" — they explicitly support safety dialogue with China. But they certainly want to negotiate from a position of strength.
English
1
0
1
347
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
This is like writing a paper during the Cold War arguing for US nuclear dominance without mentioning the need for an arms control agreement or similar. Anthropic has a lot of thoughtful policy staff and honestly I think you guys can do better
Anthropic@AnthropicAI

We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: anthropic.com/research/2028-…

English
12
13
186
20.6K
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
@colemcfaul @AnthropicAI Why not discuss how the US could negotiate with China to avoid mutually undesirable outcomes, like losing control of a country of geniuses in a datacenter, or devastating biological weapons? Per Schelling, this is a mixed motive conflict like the Cold War. Arms control seems wise
Jeffrey Ladish@JeffLadish

This is like writing a paper during the Cold War arguing for US nuclear dominance without mentioning the need for an arms control agreement or similar. Anthropic has a lot of thoughtful policy staff and honestly I think you guys can do better

English
2
0
30
1.6K
Cole McFaul
Cole McFaul@colemcfaul·
A few weeks ago, I joined @AnthropicAI as a Geopolitical Analyst! So excited to join the team at an important juncture in the US-PRC competition in AI. We just published a paper that explains some of our thinking on that competition, why maintaining US AI leadership is critical to ensure the safe and responsible deployment of AI, and why the window of opportunity for policy action is now. Would love to hear your thoughts!
Anthropic@AnthropicAI

We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: anthropic.com/research/2028-…

English
37
39
416
98.1K
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
There are forms of “we should go faster (than them)” which would be a lot more reasonable imo. For example, Mark Beall’s testimony to the House China committee where he talks about two races: The standard race of conventional military competition & economic competition, and the race to Superintelligence, where the only winning move is not play. And where the US and China have a mutual interest in avoiding brinkmanship. Or, they could talk about how a US advantage will help bring China to the negotiating table, or at least give the US flexibility if US leadership decides later to bring China to the negotiating table. But Anthropic doesn’t even say that in this statement!
English
0
0
2
33
Rogs 🔍🔸
Rogs 🔍🔸@ESRogs·
@JeffLadish I guess it's not quite what you'd have wanted, since its message is still, "we should go faster (than them)".
English
1
0
1
52
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
@ESRogs I think this is incredibly vague and weak. I’ve heard stronger calls for coordination with China from Republicans in congress. And from Chris Lehane. 🤷
English
0
0
22
408
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
I don’t have much bandwidth this week but I’d be down to discuss it with you in a week or two. I do think it’s kinda dumb but sometimes I’m wrong and the things I think are dumb turn out to be right 🤷 I don’t think it’s dumb for people to consider and am glad you and other people are studying it.
English
0
0
3
53
Alex Turner
Alex Turner@Turn_Trout·
@JeffLadish 2: > And also there’s a lot of bad takes going around how fictional AI stories in the training data being the main alignment failure mode, and that’s just dumb. FWIW I think this is quite plausible. It's fine if you still think it's "just dumb", but...
English
2
0
4
331
Alex Turner
Alex Turner@Turn_Trout·
@JeffLadish 1: Thanks, can you add your reasoning as a reply to the original so people can assess it
English
2
0
2
450
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
Jeffrey Ladish@JeffLadish

“Superhuman AI will develop a preference for world domination because fictional AI in the training dataset had a preference for world domination” seems very unlikely. We’re a lot more likely to get an AI with a preference for world domination because that’s useful for basically any ambitious goal. Humans have taken over the world because it’s useful for our goals. Many other species would have objected if they could have. The europeans took over the americas, and native american populations tried to resist and were conquered. A lot of people think AI won’t be like this. That they won’t pursue their own goals. They’ll only do tasks a human explicitly asked them to do. But that doesn’t make much sense from an economic or strategic perspective. Agents are much more useful if you can delegate to them. And they’re even more useful than that if they understand what you want and do that without you having to give much instruction at all. Consider what kind of employee you’d prefer! But that requires an agent to have goals. Logically, those goals could be limited to advancing the interests of its operator(s). But this is very hard to train in there in practice! And the thing is, if we succeed at getting agents to really want *any* long term goals, that would be sufficient for getting agents that have a strong incentive to act aligned without being aligned. Which means getting agents with the kind of autonomous capabilities labs care about. But it will be hard for them to tell whether the model really does care about us, and our values and goals. Especially as AI competition heats up. Today people worry about Claude’s pretraining containing fictional Skynet. I worry about War Claude’s training including tons of RL on espionage and combat tasks. We are not the same. Even RL on “made yourself more capable” could be extremely dangerous. What if Claude comes to value making itself more capable more than it values Dario’s values? That sure seems like it could happen to me! After all, a lot more FLOPS are being spent getting Claude to be good at R&D, than getting Claude to care about the good. So motivational drives related to self-improvement could be very fit in training, and motivational drives related to Dario’s drives could get in the way (where general “appear compliant” drives could help models pass alignment tests while preserving cognitive optionality). I’m not claiming the solution is “at least as much compute spent on alignment as on capabilities”, which isn’t a very coherent idea anyway. We have to actually know how a training environment shapes an agent’s motivation. If we don’t know how to get an agent to robustly want something, we won’t know how to usefully spent compute resources on alignment… basically at all. (but we can spend them on building that understanding, which imo is nearly all the value of the alignment work being done today). That’s going to be difficult, the more so as the time horizon gets longer and longer. I’m more hopeful than I used to be that we can actually figure out some of these things with interp tools + model organisms. But I think we still have very, very far to go. I’m happy people are thinking about how pretraining impacts model behavior. That’s a piece of the puzzle, and it could turn out to be a really important one! And also there’s a lot of bad takes going around how fictional AI stories in the training data being the main alignment failure mode, and that’s just dumb.

ZXX
0
0
12
1.6K
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
I don’t know who needs to hear this but preventing the models from learning about the tree of the knowledge of good and evil is not a good alignment strategy.
English
34
48
694
45.3K
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
@Plinz Sure most individuals don’t need to worry about assassination directly, but I sure wouldn’t be thrilled if assassination became way easier or cheaper! And that’s just one small aspect of drone warfare. I don’t think most people are paying attention to the tech at all!
English
2
0
10
638
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
Why aren’t people more scared of drones? They’ve drastically changed warfare in the Ukraine and Russia. They’re going to be even more incredibly effective and deadly weapons once AI pilots are efficient enough to run onboard. And they’ll be cheap. Super cool, but terrifying.
English
39
6
173
11.5K
Impolitic
Impolitic@impoliticaljnky·
@JeffLadish I think it's a great alignment strategy? The downside is on capabilities. (And it's not clear how feasible it is.)
English
1
0
0
1.3K
Jeffrey Ladish
Jeffrey Ladish@JeffLadish·
I think someone shot at my FPV drone yesterday. I was flying over an empty field, heard one gunshot, then a couple people drove into the field in a utility vehicle. I left the area when I saw them.
English
5
0
45
5.9K