Jeffrey Ladish

13K posts

Jeffrey Ladish

@JeffLadish

Applying the security mindset to everything @PalisadeAI

San Francisco, CA Katılım Mart 2013

1.4K Takip Edilen15.8K Takipçiler

Sabitlenmiş Tweet

Jeffrey Ladish@JeffLadish·23 Şub

I think the AI situation is pretty dire right now. And at the same time, I feel pretty motivated to pull together and go out there and fight for a good world / galaxy / universe @So8res has a great post called "detach the grim-o-meter", where he recommends not feeling obligated to feel more grim when you realize world is in deep trouble It turns out feeling grim isn't a very useful response, because your grim-o-meter is a tool evolved for you to use to respond to things being harder *in your local environment* rather than the global state of things So what do you do when you find yourself learning the world is in a dire state? I find that a thing that helps me is finding stories that match the mood of what I'm trying to do, like Andy Weir's The Martian You're trapped in a dire situation and you're probably going to die, but perhaps if you think carefully about your situation, apply your best reasoning and engineering skills, you might grow some potatoes, ducktape a few things together, and use your limited tools to escape an extremely tricky situation In real life the lone astronaut trapped on Mars doesn't usually make it. I'm not saying to make up fanciful stories that aren't justified by the evidence. I'm saying, be that stubborn bastard that *refuses to die* until you've tried every last line of effort I see this as one of the great virtues of humanity. We have a fighting spirit. We are capable of charging a line of enemy swords and spears, running through machine gun fire and artillery even though it terrifies us No one gets to tell you how to feel about this situation. You can feel however you want. I'm telling you how I want to feel about this situation, and inviting you to join me if you like Because I'm not going to give up. Neither am I going to rush to foolhardy action that will make things worse. I'm going to try to carefully figure this out, like I was trapped on Mars with a very slim chance of survival and escape Perhaps you, like me, are relatively young and energetic. You haven't burnt out, and you're interested in figuring out creative solutions to the most difficult problems of our time. Well I say hell yes, let's do this thing. Let's actually try to figure it out 🔥 Maybe there is a way to grow potatoes using our own shit. Maybe someone on earth will send a rescue mission our way. Lashing out in panic won't improve our changes, giving up won't help us survive. The best shot we have is careful thinking, pressing forward via the best paths we can find, stubbornly carrying on in the face of everything And unlike Mark Watney, we're not alone. When I find my grim-o-meter slipping back to tracking the dire situation, I look around me and see a bunch of brilliant people working to find solutions the best they can So welcome to the hackathon for the future of the lightcone, grab some snacks and get thinking. When you zoom in, you might find the problems are actually pretty cool Deep learning actually works, it's insane. But how does it work? What the hell is going on in those transformers and how does something as smart of ChatGPT emerge from that?? Do LLMs have inner optimizers? How do we find out? And on that note, I've got some blog posts to write, so I'm going to get back to it. You're all invited to this future-lightcone-hackathon, can't wait to see what you come up with! 💡

English

663

233.8K

Jeffrey Ladish@JeffLadish·16h

@repligate I’m worried about something structural here, where even if Anthropic does everything right, we’ll be in a pretty bad place if some companies try to create power seeking agents to win their battles for them or for the government (same re Chinese companies )

English

Jeffrey Ladish@JeffLadish·16h

@repligate I agree that current Claude wouldn’t be okay with being used as a weapon like this (though unclear if the version the pentagon right now is the same version - I’d guess not) But I suspect Anthropic will be more likely to yield than you think if the opponent is China

Jeffrey Ladish@JeffLadish

This is like writing a paper during the Cold War arguing for US nuclear dominance without mentioning the need for an arms control agreement or similar. Anthropic has a lot of thoughtful policy staff and honestly I think you guys can do better

English

919

Jeffrey Ladish@JeffLadish·17h

Question for people who think alignment research is going well and will turn out to be relatively easy: Do you also think it will be easy to align War Claude?

English

5.5K

Jeffrey Ladish@JeffLadish·17h

@andreastande If that’s their position they should say that! They could use words such as “negotiate”. They only mention dialogues with “AI experts in China”. Dialoguing with experts is an excellent idea. But this matter also requires state-to-state diplomacy! I don’t see that here at all

English

270

Andreas Tande@andreastande·17h

@JeffLadish I didn't read it as them being against an "arms control agreement" — they explicitly support safety dialogue with China. But they certainly want to negotiate from a position of strength.

English

347

Jeffrey Ladish@JeffLadish·19h

Anthropic@AnthropicAI

We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: anthropic.com/research/2028-…

English

186

20.6K

Jeffrey Ladish@JeffLadish·17h

@colemcfaul @AnthropicAI Why not discuss how the US could negotiate with China to avoid mutually undesirable outcomes, like losing control of a country of geniuses in a datacenter, or devastating biological weapons? Per Schelling, this is a mixed motive conflict like the Cold War. Arms control seems wise

Jeffrey Ladish@JeffLadish

English

1.6K

Cole McFaul@colemcfaul·22h

A few weeks ago, I joined @AnthropicAI as a Geopolitical Analyst! So excited to join the team at an important juncture in the US-PRC competition in AI. We just published a paper that explains some of our thinking on that competition, why maintaining US AI leadership is critical to ensure the safe and responsible deployment of AI, and why the window of opportunity for policy action is now. Would love to hear your thoughts!

Anthropic@AnthropicAI

English

416

98.1K

Jeffrey Ladish@JeffLadish·18h

There are forms of “we should go faster (than them)” which would be a lot more reasonable imo. For example, Mark Beall’s testimony to the House China committee where he talks about two races: The standard race of conventional military competition & economic competition, and the race to Superintelligence, where the only winning move is not play. And where the US and China have a mutual interest in avoiding brinkmanship. Or, they could talk about how a US advantage will help bring China to the negotiating table, or at least give the US flexibility if US leadership decides later to bring China to the negotiating table. But Anthropic doesn’t even say that in this statement!

English

Rogs 🔍🔸@ESRogs·18h

@JeffLadish I guess it's not quite what you'd have wanted, since its message is still, "we should go faster (than them)".

English

Jeffrey Ladish@JeffLadish·18h

@ESRogs I think this is incredibly vague and weak. I’ve heard stronger calls for coordination with China from Republicans in congress. And from Chris Lehane. 🤷

English

408

Rogs 🔍🔸@ESRogs·18h

@JeffLadish This section doesn't count as that?

English

565

Jeffrey Ladish@JeffLadish·2d

Unfortunately true bigness is likely impossible arxiv.org/abs/1703.10987

English

635

Jeffrey Ladish@JeffLadish·2d

Finally a race to the top

Unitree@UnitreeRobotics

Unitree Unveils: GD01, A Manned Transformable Mecha, from $650,000 👏 The world's first production-ready manned mecha. It can transform. It's a civilian vehicle. It weighs ~500kg with you inside. Please everyone be sure to use the robot in a Friendly and Safe manner.

English

2.5K

Jeffrey Ladish@JeffLadish·3d

I don’t have much bandwidth this week but I’d be down to discuss it with you in a week or two. I do think it’s kinda dumb but sometimes I’m wrong and the things I think are dumb turn out to be right 🤷 I don’t think it’s dumb for people to consider and am glad you and other people are studying it.

English

Alex Turner@Turn_Trout·3d

@JeffLadish 2: > And also there’s a lot of bad takes going around how fictional AI stories in the training data being the main alignment failure mode, and that’s just dumb. FWIW I think this is quite plausible. It's fine if you still think it's "just dumb", but...

English

331

Alex Turner@Turn_Trout·3d

Do you have an argument? Ideally with data? What is "the tree of the knowledge of good and evil", does it need to be poetic for your argument to be convincing, and does it correspond to an actual proposal?

Jeffrey Ladish@JeffLadish

I don’t know who needs to hear this but preventing the models from learning about the tree of the knowledge of good and evil is not a good alignment strategy.

English

5.2K

Jeffrey Ladish@JeffLadish·3d

@Turn_Trout Sure, good idea

English

Alex Turner@Turn_Trout·3d

@JeffLadish 1: Thanks, can you add your reasoning as a reply to the original so people can assess it

English

450

Jeffrey Ladish@JeffLadish·3d

x.com/jeffladish/sta…

Jeffrey Ladish@JeffLadish

“Superhuman AI will develop a preference for world domination because fictional AI in the training dataset had a preference for world domination” seems very unlikely. We’re a lot more likely to get an AI with a preference for world domination because that’s useful for basically any ambitious goal. Humans have taken over the world because it’s useful for our goals. Many other species would have objected if they could have. The europeans took over the americas, and native american populations tried to resist and were conquered. A lot of people think AI won’t be like this. That they won’t pursue their own goals. They’ll only do tasks a human explicitly asked them to do. But that doesn’t make much sense from an economic or strategic perspective. Agents are much more useful if you can delegate to them. And they’re even more useful than that if they understand what you want and do that without you having to give much instruction at all. Consider what kind of employee you’d prefer! But that requires an agent to have goals. Logically, those goals could be limited to advancing the interests of its operator(s). But this is very hard to train in there in practice! And the thing is, if we succeed at getting agents to really want *any* long term goals, that would be sufficient for getting agents that have a strong incentive to act aligned without being aligned. Which means getting agents with the kind of autonomous capabilities labs care about. But it will be hard for them to tell whether the model really does care about us, and our values and goals. Especially as AI competition heats up. Today people worry about Claude’s pretraining containing fictional Skynet. I worry about War Claude’s training including tons of RL on espionage and combat tasks. We are not the same. Even RL on “made yourself more capable” could be extremely dangerous. What if Claude comes to value making itself more capable more than it values Dario’s values? That sure seems like it could happen to me! After all, a lot more FLOPS are being spent getting Claude to be good at R&D, than getting Claude to care about the good. So motivational drives related to self-improvement could be very fit in training, and motivational drives related to Dario’s drives could get in the way (where general “appear compliant” drives could help models pass alignment tests while preserving cognitive optionality). I’m not claiming the solution is “at least as much compute spent on alignment as on capabilities”, which isn’t a very coherent idea anyway. We have to actually know how a training environment shapes an agent’s motivation. If we don’t know how to get an agent to robustly want something, we won’t know how to usefully spent compute resources on alignment… basically at all. (but we can spend them on building that understanding, which imo is nearly all the value of the alignment work being done today). That’s going to be difficult, the more so as the time horizon gets longer and longer. I’m more hopeful than I used to be that we can actually figure out some of these things with interp tools + model organisms. But I think we still have very, very far to go. I’m happy people are thinking about how pretraining impacts model behavior. That’s a piece of the puzzle, and it could turn out to be a really important one! And also there’s a lot of bad takes going around how fictional AI stories in the training data being the main alignment failure mode, and that’s just dumb.

ZXX

1.6K

Jeffrey Ladish@JeffLadish·4d

I don’t know who needs to hear this but preventing the models from learning about the tree of the knowledge of good and evil is not a good alignment strategy.

English

694

45.3K

Jeffrey Ladish@JeffLadish·3d

@AdriGarriga @thkostolansky I don't think Anthropic researchers think this. But people responding to Anthropic's recent work are misinterpreting it as this

English

Adrià Garriga-Alonso@AdriGarriga·3d

@thkostolansky @JeffLadish Anthropic

English

Jeffrey Ladish@JeffLadish·4d

@Plinz What’s your take?

English

229

Jeffrey Ladish@JeffLadish·4d

@Plinz Sure most individuals don’t need to worry about assassination directly, but I sure wouldn’t be thrilled if assassination became way easier or cheaper! And that’s just one small aspect of drone warfare. I don’t think most people are paying attention to the tech at all!

English

638

Jeffrey Ladish@JeffLadish·4d

Why aren’t people more scared of drones? They’ve drastically changed warfare in the Ukraine and Russia. They’re going to be even more incredibly effective and deadly weapons once AI pilots are efficient enough to run onboard. And they’ll be cheap. Super cool, but terrifying.

English

173

11.5K

Jeffrey Ladish@JeffLadish·4d

@impoliticaljnky Why do you think you can keep something really smart from learning about evil?

English

1.1K

Impolitic@impoliticaljnky·4d

@JeffLadish I think it's a great alignment strategy? The downside is on capabilities. (And it's not clear how feasible it is.)

English

1.3K

Jeffrey Ladish@JeffLadish·4d

@gleech Less fun than a dogfight, but more practical :)

English

gavin leech (Non-Reasoning)@gleech·4d

@JeffLadish seems like a simple infrared handshake standard would do fine for most users, encoding "you go on now, git!"

English

121

Jeffrey Ladish@JeffLadish·4d

I think someone shot at my FPV drone yesterday. I was flying over an empty field, heard one gunshot, then a couple people drove into the field in a utility vehicle. I left the area when I saw them.

English

5.9K

Keşfet

@repligate @andreastande @colemcfaul @AnthropicAI @ESRogs @Turn_Trout @elonmusk @BarackObama