Matthew Gray ⏸️

1.2K posts

Matthew Gray ⏸️

Matthew Gray ⏸️

@Vaniver

Vaniver lots of places; I mostly post to LessWrong.

Berkeley, CA شامل ہوئے Haziran 2009
73 فالونگ225 فالوورز
Alex Trembath
Alex Trembath@atrembath·
Do folks think the future depicted after the Butlerian jihad is…good?
English
39
15
265
24.3K
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@benlandautaylor Hmmm. I don't think the "But that's probably" part is true b/c people developed Harberger taxes I think specifically as a structural solution to corruption. Also, I think looking for the parts of society that are the most corrupt to figure out what to reform makes sense!
English
1
0
0
50
Ben Landau-Taylor
Ben Landau-Taylor@benlandautaylor·
Every property tax advocate comes to "Man the imaginary version of this based on pure econbrain theory sounds amazing. Too bad every time it's actually tried is a corrupt disaster. But that’s probably a random coincidence with no structural causes, not a gap in my theory."
Abraham Ash / 𐤀𐤁𐤓𐤄𐤌@Historycourses

I have actually come to decouple my personal aversion towards property tax and my theoretical understanding of it. It really is in principle the best kind of tax - shame that in my town it is so arbitrary, extortionate, and blatantly corrupt that there's naught to do but hate it.

English
11
0
38
3K
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@ZyMazza Short answer yes. Long answer: there are lots of pieces of "AI doom" which have _different_ "ok that was a false alarm" points, and so a complete answer is long and made of lots of partial invalidations.
English
0
0
1
32
Zy
Zy@ZyMazza·
Here’s a serious question for the AI doomers: do you have exit criteria? Is there a predetermined stage of development or capabilities where, having not destroyed humanity, you’re willing to say it was a false alarm? Or is it an eschatological religious belief and unfalsifiable?
English
106
10
322
16K
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@slatestarcodex @robbensinger I think it's quite possible that OpenAI without its alignment team would have been much less successful; for example I think they might not have gotten a commercially successful chatbot without them (or would have much later).
English
0
0
6
172
Scott Alexander
Scott Alexander@slatestarcodex·
It's possible I'm over-reacting to a thread that was in the context of many other threads I hadn't followed all the way, but I don't think this is an uncommonly-stated or subtle position that you hold. I am less condemnatory about OpenPhil -> OpenAI than you are. Ex post it was a bad call, but everyone in this space has made calls that turned out to be bad or even disastrous in retrospect (again, I meant to object-level insult MIRI and drag up its dirty laundry probably less than it sounded like), and I don't think it was ex ante bad either consequentially or deontologically. I thought of it as something like "we can't prevent them from existing, but we can get some seats on their board and make sure they have a good safety team". They *did* get seats on the board, and they *did* make sure they had a good safety team. The board seats plausibly could have kicked out Sam if the crisis around that got handled better, and the safety team eventually resulted in Anthropic, SSI, the concept of scaleable oversight, some bracing clarification of how Sam Altman actually thinks, and other good things that I like better than being in the world where OAI never had a decent safety team. I think of it as a reasonable bet which didn't pay off directly due to very chaotic dynamics (eg exactly what happened over the board firing weekend) and did have some indirect payoffs. I realize I'm in a tiny minority here, but I don't think anyone's 2016-era strategy has come out looking great. I realize you have a broader story about how there was a moral flaw involved that should override all discussion of ex post vs. ex ante and indirect benefits, but you have yet to convince me of it. I also think that even if you 100% convinced me of this, I wouldn't propagate it enough to make me suspicious of all OpenPhil, let alone all EA. >>> "I feel like this is a case in point. Like, sure, counting up from 0 (“the average corporation building the average product doesn’t try to warn the public about their product, except in ways mandated by law!”), Anthropic’s doing great. Or if the baseline is “is Anthropic doing better than pathological liar Sam Altman?”, then sure, Anthropic is doing better than OpenAI on candor. If we’re instead anchoring to “trying to build a product that massively endangers everyone in the world is an incredibly evil sort of thing to do by default, and to even begin to justify it you need to be doing a truly excellent job of raising the loudest possible alarm bells alongside dozens of other things”, then I don’t think Anthropic is coming close to clearing that bar." I will admit that I don't have a good model of Dario and my confidence interval runs all the way from "only slightly better than Sam" to "extremely smart and ethical person doing his best in a terrible situation", but I don't think this really matters. I think the right comparison point is the counterfactual where Anthropic doesn't exist, and it seems like you agree most of the other companies (GDM maybe excepted) are worse. I would pay infinity dollars to upgrade from Sam to Dario, and this is mostly robust to my wide uncertainty bars in how good a person Dario actually is. I think you are arguing for some kind of deontological bar of ever supporting a frontier AI lab, even if it is better than other frontier AI labs (or at least up to some very high standard Dario doesn't meet). I don't think this is an actual deontological bar. I'm having trouble explaining why because I don't see why it *would* be a deontological bar - I mostly think of these as about avoiding lose-lose equilibria, and I don't think there's anyone else waiting to support an equal and opposite AI company if and only if we support Anthropic. I think there's wide room for disagreement about whether to devote the marginal extra resource to Anthropic or to PauseAI is better (not really, Anthropic has enough resources now, but let's say five years ago), but they both seem like equally non-deontologically-barred things that we have good reason to think will improve the situation over the counterfactual even though they have some risk of backlash and ending up net negative. >>> "“Things go really, really badly”? Nobody outside the x-risk ecosystem has any idea what that means. And this is not the kind of claim Anthropic or Dario has ever tried to spotlight. You won’t find a big urgent-looking banner on the front page of Anthropic loudly warning the public, in plain terms, about this technology, and asking them to write their congressman about it. You won’t even find it tucked away in a press release somewhere. Dario gave a number when explicitly asked, in an on-stage interview." I agree they have not met the bar that you or I would have. But I also think that when normal people see them, they end up getting the sort of impression spotlighted at nytimes.com/2023/07/11/tec… . I agree that I have very wide error bars in my assessment of Dario personally and can't get a good read on how serious he is, but, as I said before, I'm much happier with him than the counterfactual. I would love it if Eliezer ran Anthropic, but there are dozens of reasons why that's impossible, and I think Dario is about as good as I could expect of a balance between takes-it-seriously and is-an-actual-AI-company-leader-and-not-a-fantasy. I'm also more willing to trust that there are reasons behind some of their strategies. For example, a couple of people have told me that they can't openly talk about any work they might be doing to coordinate a pause, because then the Trump administration might hit them with an antitrust lawsuit. I think until you have wargamed these sorts of considerations with someone who is very familiar with the corporate comms world, you don't realize how many of these there are and how much they limit options. I am torn between the possibility that they're trying to help but limited in certain types of public statements (in which case demanding that they make those statements will be making it harder for them to help us) and the possibility that they're simply not that serious (in which case we need to put more pressure on them). Dario's weird and schizophrenic public statements don't really help me make this determination, but I'm not yet certain enough that this should be resolved in the negative direction that I think it's worth lots of effort to portray them as the bad guys. I also think this sort of "genuinely on the edge between good and evil" situation is one where I'm especially excited about good people joining them and inserting themselves in the decision-making process. >>> "I agree it’s too glib as an argument for “international coordination to ban superintelligence is easy”. It isn’t easy. In the context of a conversation where most people are seriously underweighting the possibility, “governments have been known to ban scary or weird tech” and “governments have been known to enact policies that cost them money” are useful correctives, but they should be correctives pointing toward “this seems hard but maybe doable”, not “this seems easy”." I think what I'm trying to get at with this is that (IIRC) you are pushing a position of "alignment seems impossible, but a pause seems doable". To me, both alignment and pausing seem on an anxiety-inducing border of "near-impossible, but maybe doable if we try incredibly hard". You of course have every right to disagree and to think that pausing is importantly easier than alignment, but my impression of some of "your side's" comms (I want to say MIRI, but I can't say for sure it's not someone else like David or Holly) is that they are underplaying the difficulty of pausing, and overplaying the difficulty of alignment, in order to make it seem like dropping alignment and pivoting entirely to pausing is the only responsible choice. >>> "Like, this is one of the most foregrounded claims in Dario’s essay. He repeats a bunch of easily-checked falsehoods about the MIRI argument, at the very start of the essay, while warning that this view's skepticism about alignment tractability is a “self-fulfilling belief”. He then proceeds to shit on the possibility of the US coordinating with China to avoid building superintelligence, which seems like a much more classic example of “belief that could easily be self-fulfilling”." I don't want to fully defend Dario's approach in AoT as "good", but I think it's, idk, "within acceptable levels of bad for a person who can still be worked with". I think that some of Dario's criticisms of "doomers" are true, although they are not all true of the same people and not all true of MIRI in particular. I do hold the "death with dignity" thing against MIRI a little - you have the ironclad right to say your true epistemic beliefs, but I think it was both totally false and very damaging to the project of getting people to fight these risks, and I think it's potentially fine for Dario to call that out although I think his exact framing went too far. But I also think if you read that section (which is pretty short), he is using it as a segue to say that "now the pendulum has swung" and people aren't worried enough. This strikes me as the sort of thing that a good communicator would do to reassure people that he is on their side: "Sure, I share your suspicions of the extremists over there, now that you understand I'm on your side, can we agree on the moderate position that we're not doomed with 100% certainty but AI might be very dangerous?" I do think he crossed a line from "acceptable writing technique" to "unwarranted attack", and I condemned him at the time, just as I'm condemning you for what seems like an overly-strong unwarranted attack now. I appreciate why both sides are doing this, but I would like to get them both to stop as far as is possible. >>> "Anthropic may or may not be slightly better than OpenAI. OpenAI may or may not be slightly better than DeepMind. I don’t think the lesson of history is that OpenPhil-cluster people are good at telling the difference between “this is marginally better than what the other guys are doing” and “this is good enough to actually succeed”." I think this is maybe the closest to our crux? I think OAI wasn't better than DeepMind (it might have ex ante appeared to be at first), and Anthropic is much better than OAI. I think this difference between "may or may not be slightly better" and "is much better" is load-bearing, and, since we're talking about the people building superintelligence, is extremely significant. I think your argument only works because you trivialize this very significant thing. I understand why you do this (you think that without a pause/slowdown, alignment is so hard it's not worth splitting hairs on who will do it better or worse), but I don't agree with that, and I hope that you agree that, under my assumptions on this one topic, all my other positions flow naturally and make perfect sense.
English
6
0
35
2.6K
Rob Bensinger ⏹️
Rob Bensinger ⏹️@robbensinger·
To clarify the claim I’m making: I’m not trying to throw EA under a bus. This thread spun off from a discussion where I said I thought EA’s net impact on AI x-risk was probably positive, but I was highly uncertain (x.com/robbensinger/s…). Somebody asked what the bad components of EA’s impact were, and I went off on Anthropic, and on EA’s (and especially OpenPhil’s) entanglement with the company and their support for Anthropic’s operations. (To the extent that a lot of x-risk-adjacent EA seems to function, in practice, as a talent pipeline for Anthropic.) I also said that I think OpenPhil’s bet on OpenAI was a disaster. And I said that there’s a culture of caginess, soft-pedaling, and trying-to-sound-reassuringly-mundane that I think has damaged AI risk discourse a fair amount, and that various people in and around OpenPhil have contributed to. I’m restating this partly to be clear about what my exact claims are. E.g., I’m not claiming that items 1+2+3 are things OpenPhil and Anthropic leadership would happily endorse as stated. I deliberately phrased them in ways that highlight what I see as the flaws in these views and memes, in the hope that this could help wake up some people in and around OpenPhil+Anthropic to the road they’re walking. This may have been the wrong conversational tack, but my vague sense is that there have been a lot of milder conversations about these topics over the years, and they don’t seem to have produced a serious reckoning, retrospective, or course change of the kind I would have expected. I hoped it was obvious from the phrasing that 1-3 were attempting to embed the obvious critiques into the view summary, rather than attempting to phrase things in a way that would make the proponent go “Hell yeah, I love that view, what a great view it is!” If this confused anyone, I apologize for that. I wasn’t centrally thinking of Holden’s public communication in the OP, though I think if he were consistently solid at this, Aysja Johnson wouldn’t have needed to write x.com/robbensinger/s… in response to Holden’s defense of Anthropic ditching its core safety commitments. “Dario said there's a 25% chance ‘things go really, really badly’” I feel like this is a case in point. Like, sure, counting up from 0 (“the average corporation building the average product doesn’t try to warn the public about their product, except in ways mandated by law!”), Anthropic’s doing great. Or if the baseline is “is Anthropic doing better than pathological liar Sam Altman?”, then sure, Anthropic is doing better than OpenAI on candor. If we’re instead anchoring to “trying to build a product that massively endangers everyone in the world is an incredibly evil sort of thing to do by default, and to even begin to justify it you need to be doing a truly excellent job of raising the loudest possible alarm bells alongside dozens of other things”, then I don’t think Anthropic is coming close to clearing that bar. “Things go really, really badly”? Nobody outside the x-risk ecosystem has any idea what that means. And this is not the kind of claim Anthropic or Dario has ever tried to spotlight. You won’t find a big urgent-looking banner on the front page of Anthropic loudly warning the public, in plain terms, about this technology, and asking them to write their congressman about it. You won’t even find it tucked away in a press release somewhere. Dario gave a number when explicitly asked, in an on-stage interview. If we’re setting the bar at 0, then maybe we want to call this an amazing act of courage, when he could have ducked the question entirely. But why on earth would we set the bar at 0? Is the social embarrassment of talking about AI risk in 2025 so great that we should be amazed when Dario doesn’t totally dodge the topic, while running one of the main companies building the tech? “Meanwhile, you seem to be treating all these people as basically equivalent to Gary Marcus.” I think Dario has been more reasonable on this issue than Gary Marcus. I also don’t think “clearing Gary Marcus” is the criterion we should be using to judge the CEO of Anthropic. “I think this ‘debate’ isn't about OpenPhil or Anthropic failing to say they're extremely worried” Specifically, this debate (from my perspective) isn’t about whether Anthropic or others have ever said anything scary-sounding, if an x-risk person goes digging for cherry-picked quotes to signal-boost. The question is whether the average statement from Anthropic, weighted by how visible Anthropic tries to make that statement, is adequate for informing the uninformed about the insane situation we’re in. Is the average statement from Dario or Anthropic communicating, “Holy shit, the technology we and our competitors are building has a high chance of killing us all or otherwise devastating the world, on a timescale of years, not decades. This is terrifying, and we urgently call on policymakers and researchers to help find a solution right now”? Or is it communicating, “Mythos is our most aligned model yet! 😊 Powerful AI could have benefits, but it could have costs too. AI is a big deal, and it could have impacts and pose challenges! We are taking these very seriously! Also, unlike our competitors, Claude will always be ad-free! We’re a normal company talking about the importance of safety and responsibility in this transformative period. 😊” (Case in point: x.com/HumanHarlan/st…) If Anthropic’s messaging were awful, but Dario’s personal communications were reliably great, then I’d at least give partial credit. But Dario’s messaging is often even worse than that. Dario has been the AI CEO agitating the earliest and loudest for racing against China. He’s the one who’s been loudest about there being no point in trying to coordinate with China on this issue. “The Adolescence of Technology” opens with a tirade full of strawmen of what seems to be Yudkowsky/Soares’ position (x.com/robbensinger/s…), and per Ryan Greenblatt, the essay sends a super misleading message about whether Anthropic “has things covered” on the technical alignment side (x.com/RyanPGreenblat…): “Dario strongly implies that Anthropic ‘has this covered’ and wouldn't be imposing a massively unreasonable amount of risk if Anthropic proceeded as the leading AI company with a small buffer to spend on building powerful AI more carefully. I do not think Anthropic has this covered[....] I think it's unhealthy and bad for AI companies to give off a ‘we have this covered and will do a good job’ vibe if they actually believe that even if they were in the lead, risk would be very high. At the very least, I expect many employees at Anthropic working on alignment, safety, and security don't believe Anthropic has the situation covered.” I also strongly agree with Ryan re: - “I think it's important to emphasize the severity of outcomes and I think people skimming the essay may not realize exactly what Dario thinks is at stake. A substantial possibility of the majority of humans being killed should be jarring.” - “I wish Dario more clearly distinguished between what he thinks a reasonable government should do given his understanding of the situation and what he thinks should happen given limited political will. I'd guess Dario thinks that very strong government action would be justified without further evidence of risk (but perhaps with evidence of capabilities) if there was high political will for action (reducing backlash risks).” (And I claim that Anthropic leadership has been doing this for years; "The Adolescence of Technology" is not a one-off.) On podcast interviews, Dario sometimes lets slip an unusually candid and striking statement about how insane and dangerous the situation is, without couching it in caveats about how Everything Is Uncertain and More Evidence Is Needed and It’s Premature For Governments To Do Much About This. Sometimes, he even says it in a way that non-insiders are likely to understand. But when he talks to lawmakers, he says things like: "However, the abstract and distant nature of long-term risks makes them hard to approach from a policy perspective: our view is that it may be best to approach them indirectly by addressing more imminent risks that serve as practice for them." (judiciary.senate.gov/imo/media/doc/…) Never mind the merits of “the policy world should totally ignore superintelligence”. Even if you agree with that (IMO extreme and false) claim, there is no justifying calling these risks “long-term”, “abstract”, and “distant” when you have timelines a fraction as aggressive as Dario’s!! See also Jack Clark’s communication on this issue (x.com/robbensinger/s…), and my criticism at the time (x.com/robbensinger/s…). This was in 2024. I don’t think it’s great for Dario to be systematically making the same incredibly misleading elisions two years after this pretty major issue was pointed out to his co-founder. “It's about OpenPhil in particular being pretty careful how they phrase things for public consumption. And I think any attempt to attack them for this should start with an acknowledgement that MIRI is directly responsible for all of our current problems” I’m not criticizing Anthropic or Open Phil for being “careful how they phrase things”. I’m criticizing them for being careful in exactly the wrong direction. Any communication they send out that sends a “we have things covered, this is business-as-usual, no need to worry” signal is potentially not just factually misleading, but destructive of society’s ability to orient to what’s happening and course-correct. Anthropic is the “Machines of Loving Grace” company; it’s exactly the company that has put way more effort, early and often, into communicating how powerful and cool this technology is, while being consistently nervous and hedged about alerting others to the hazards. This is exactly the opposite of what “being careful how you phrase things” should look like. Anthropic should have internal processes for catching any tweet that risks implicitly sending a "this is business-as-normal" or "we have everything handled" message, to either filter those out or flag them for evaluation. Sending that kind of message is much more dangerous than any ordinary reputational risk a company faces. Re ‘MIRI is saying strategy is bad, but if MIRI had been strategic then they might not have started the deep learning revolution’: I think that this just didn't happen. Per the x.com/allTheYud/stat… thread, I think this is just a myth that propagates because it’s funny. (And because Sam Altman is good at spreading narratives that help him out.) I don’t think MIRI accelerated timelines on net, and if it did, I don’t think the effect was large. I’d also say that if this happened, it was in spite of one of MIRI’s top obsessions for the last 20+ years being “be ultra cautious around messaging that could shorten AI timelines”. (Like, as someone who’s been at MIRI for 13 years, this is literally one of the top annoying things constraining everything I've written and all the major projects I've seen my colleagues work on. Not because we think we're geniuses sitting on a trove of capabilities insights, but just because we take the responsibility of not-accidentally-contributing-to-the-race extraordinarily seriously.) But whatever, sure. If you want to accuse MIRI of hypocrisy and say that we’re just as culpable as the AI labs, go for it. You can think MIRI is terrible in every way and also think that the Anthropic cluster is not handling AI risk in a remotely responsible way. Set aside the years of Anthropic poisoning the commons with its public messaging, poisoning efforts at international coordination by being the top lab preemptively shitting on the possibility of US-China coordination, and poisoning the US government’s ability to orient to what’s happening by selling half-truths and absurd frames to Senate committees. Even without looking at their broad public communications, and without critiquing what passes for a superintelligence alignment or deployment plan in Anthropic’s public communications, Anthropic has behaved absurdly irresponsibly, lying to the public about their RSP being a binding commitment (lesswrong.com/posts/HzKuzrKf…), lying to their investors re ‘we’re not going to accelerate capabilities progress’ (lesswrong.com/posts/JbE7Kynw…), and specifically targeting the most dangerous and difficult-to-control AI capabilities (recursive self-improvement) in a way that may burn years off of the remaining timeline. “What they haven't said is ‘the situation is totally hopeless and every strategy except pausing has literally no chance of working’, but that isn't a comms problem, that's because they genuinely believe something different from you.” Just to be clear: nowhere in this thread, or anywhere else, have I asked Anthropic to say something like that. Everything I’ve said above is compatible with thinking that Anthropic has a chance at solving superintelligence alignment. “I think I have a chance at solving superintelligence alignment!” is not an excuse for Anthropic or Dario’s behavior. “Your claim that ‘governments are incredibly trigger-happy about banning things...there's a long history of governments successfully coordinating to ban things dramatically less dangerous than superintelligent AI’ is too glib” I agree it’s too glib as an argument for “international coordination to ban superintelligence is easy”. It isn’t easy. In the context of a conversation where most people are seriously underweighting the possibility, “governments have been known to ban scary or weird tech” and “governments have been known to enact policies that cost them money” are useful correctives, but they should be correctives pointing toward “this seems hard but maybe doable”, not “this seems easy”. “But my impression is that the rest of the field is executing this portfolio plan admirably, but MIRI and a few other PauseAI people are trying to sabotage every other strategy in the portfolio in the hope of forcing people into theirs.” How are we doing that, exactly? Like, this is one of the most foregrounded claims in Dario’s essay. He repeats a bunch of easily-checked falsehoods about the MIRI argument, at the very start of the essay, while warning that this view's skepticism about alignment tractability is a “self-fulfilling belief”. He then proceeds to shit on the possibility of the US coordinating with China to avoid building superintelligence, which seems like a much more classic example of “belief that could easily be self-fulfilling”. What is the mechanism whereby Dario criticizing MIRI is “cooperating” (is it that he didn’t mention us by name, preventing people from fact-checking any of his claims?), and MIRI staff criticizing Dario is “defecting”? What, specifically, is the wrench I’m throwing in Anthropic’s plans by tweeting about this? Is a key researcher on Chris Olah's team going to get depressed and stop doing interpretability research unless I contribute to the “Anthropic is the Good Guys and OpenAI is the Bad Guys” narrative? Is Anthropic at risk of losing its lead in the race if MIRI people are open about their view that all the labs are behaving atrociously? Should I have dropped in a claim that everyone who disagrees with me is "quasi-religious", the same way Dario's cooperative essay begins? If you think I’m factually mistaken, as you said at the start of your reply, then that makes sense. But surely that would be an equally valid criticism whether I were saying pro-Anthropic stuff or anti-Anthropic stuff. Why this separate “MIRI is defecting” idea? “I worry that any support or oxygen you guys get will be spent knifing other safety advocates, while Sam Altman happily builds AGI regardless.” Yeah. And when MIRI voiced early skepticism of OpenAI in private conversation, we were told that it was crucial to support Sam and Elon’s effort because Demis was untrustworthy. Counting up from zero, OpenAI could be framed as amazing progress: a nonprofit! Run by people vocally alarmed about x-risk! And they’re struggling for cash in the near term (in spite of verbal promises of funding from Musk), which gives us an opportunity to buy seats on the board! Anthropic may or may not be slightly better than OpenAI. OpenAI may or may not be slightly better than DeepMind. I don’t think the lesson of history is that OpenPhil-cluster people are good at telling the difference between “this is marginally better than what the other guys are doing” and “this is good enough to actually succeed”. But nothing I’ve said above depends on that claim. You can disagree with me about how likely Anthropic is to save the world, and still think there’s an egregious candor gap between the average Anthropic public statement and the scariest paragraphs buried in “The Adolescence of Technology”, and a further egregious candor gap between “The Adolescence of Technology” and e.g. Ryan Greenblatt’s post or x.com/MaskedTorah/st…. I don’t think the “circle-the-wagon” approach has served EA well throughout its history, and I don’t think people self-censoring to that degree is good for governments’ or labs’ ability to orient to reality.
Rob Bensinger ⏹️@robbensinger

EA's principles are great, but I think I'm only at like 55% probability that EA has been net positive for the world? I think there's clearly been a lot of harm done re AI risk, and my main uncertainty is about the benefits done (which tend to be more amorphous than the harms), and about how many of those harms would have occurred regardless in the absence of EA.

English
3
6
84
13.9K
SE Gyges
SE Gyges@segyges·
'being a major funder and talent source for two of the leading ai companies'? Yud personally introduced Demis Hassabis and Shane Legg to Peter Thiel so they could get funded for Deepmind. Literally nobody on earth is more directly to blame for this than him personally.
Rob Bensinger ⏹️@robbensinger

In response to "What did EAs do re AI risk that is bad?": Aside from the obvious 'being a major early funder and a major early talent source for two of the leading AI companies burning the commons', I think EAs en masse have tended to bring a toxic combination of heuristics/leanings/memes into the AI risk space. I'm especially thinking of some combination of: 'be extremely strategic and game-playing about how you spin the things you say, rather than just straightforwardly reporting on your impressions of things' plus 'opportunistically use Modest Epistemology to dismiss unpalatable views and strategies, and to try to win PR battles'. Normally, I'm at least a little skeptical of the counterfactual impact of people who have worsened the AI race, because if they hadn't done it, someone else might have done it in their place. But this is a bit harder to justify with EAs, because EAs legitimately have a pretty unusual combination of traits and views. Dario and a cluster of Open-Phil-ish people seem to have a very strange and perverse set of views (at least insofar as their public statements to date represent their actual view of the situation): --- 1. AI is going to become vastly superhuman in the near future; but being a good scientist means refusing to speculate about the potential novel risks this may pose. Instead, we should only expect risks that we can clearly see today, and that seem difficult to address today. If there is some argument for why a problem P might only show up at a higher capability level, or some argument for why a solution S that works well today will likely stop working in the future... well, those are just arguments. Arguments have a terrible track record in AI; the field is full of surprises. So we should stick to only worrying about things when the data mandates it. This is especially important to do insofar as it will help us look more credible and thereby increase our political power and influence. 2. When it comes to technical solutions to AI, the burden of proof is on the skeptic: in the absence of proof that alignment is intractable, we should behave as though we've got everything under control. At the same time, when it comes to international coordination on AI, we will treat the burden of proof as being on the non-skeptic. Absent proof that governments can coordinate on AI, we should assume that they can't coordinate. And since they can't coordinate, there's no harm in us doing a lot of things to make coordination even harder, to make our lives a bit more convenient as we work on the technical problems. 3. In general, people worried about AI risk should coordinate as much as possible to play down our concerns, so as not to look like alarmists. This is very important in order to build allies and accumulate political influence, so that we're well-positioned to act if and when an important opportunity arises. If you're claiming that now is an important opportunity, and that we should be speaking out loudly about this issue today... well, that sounds risky and downright immodest. Many things are possible, and the future is hard to predict! Taking political risks means sacrificing enormous option value. The humble and safe thing to do is to generally not make too much of a fuss, and just make sure we're powerful later in case the need arises. --- 1-3 really does seem like an unusually toxic set of heuristics to propagate, potentially worse than replacement. - In an engineering context, the normal mindset is to place the burden of proof on the engineer to establish safety. There's no mature engineering discipline that accepts "you can't prove this is going to kill a ton of people" as a valid argument. The standard engineering mindset sounds almost more virtue-ethics-y or deontological rather than EA-ish -- less "ehh it's totally fine for me to put billions of lives at risk as long as my back-of-the-envelope cost-benefit analysis says the benefits are even greater!", more "I have a sacred responsibility and duty to not build things that will bring others to harm." Certainly the casualness about p(doom) and about gambling with billions of people's lives is something that has no counterpart in any normal scientific discipline. - Likewise, I suspect that the typical scientist or academic that would have replaced EAs / Open Phil would have been at least somewhat more inclined to just state their actual concerns about AI, and somewhat less inclined to dissemble and play political games. Scientists are often bad at such games, they often know they're bad at such games, and they often don't like those games. EAs' fusion of "we're playing the role of a wonkish Expert community" with "we're 100% into playing political games" is plausibly a fair bit worse than the normal situation with experts. - And EAs' attempts to play eleven-dimensional chess with the Overton window are plausibly worse than how scientists, the general public, and policymakers normally react to any technology under the sun that sounds remotely scary or concerning or creepy: "Ban it!" Governments are incredibly trigger-happy about banning things. There's a long history of governments successfully coordinating to ban things dramatically less dangerous than superintelligent AI. And in fact, when my colleagues and I have gone out and talked to most populations about AI risk, people mostly have much more sensible and natural responses than EAs to this issue. A way of summarizing the issue, I think, is that society depends on people blurting out their views pretty regularly, or on people having pretty simple and understandable agendas (e.g., "I want to make money" or "I want the Democrats to win"). Society's ability to do sense-making is eroded when a large fraction of the "specialists" talking about an issue are visibly dissembling and stretching the truth on the basis of agendas that are legitimately complicated and hard to understand. Better would be to either exit the conversation, or contribute your actual pretty-full object-level thoughts to the conversation. Your sense of what's in the Overton window, and what people will listen to, has failed you a thousand times over in recent years. Stop pretending at mastery of these tricky social issues, and instead do your duty as an expert and inform people about what's happening.

English
4
0
53
4.1K
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 Actually, sorry, that's the wrong question. Do you see why, if I believe in the 'two progress bars' model, the claim that every alignment advance is a capabilities advance _isn't_ evidence against the two progress bars model?
English
1
0
0
15
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 That is, yes, aligning models generally makes them more capable. But does making the model more capable align it? Can you find things that do both, instead of just making the models more capable? That's alignment research. I don't think it's impossible in principle.
English
0
0
0
12
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 in the first tweet I brought up "differential advancement." I don't think I've seen that in any of your tweets that followed; I think "differential advancement" works pretty well as a reply to all of them. That's the thing I was hoping Opus might be able to walk you thru.
English
2
0
0
26
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 But this can be basically independent of the capabilities of those models, or the domains that they operate in, or so on. (Of course it's easier to write utility functions for smaller worlds, but that's distinct from the tech breakthrus required to be able to make that agent.)
English
1
0
0
26
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 Hmm, I don't think we mean the same thing by 'corrigibility'. An agent which has a clearly identifiable utility function / reward system, and also somehow is not incentivized to prevent a particular source from modifying it, is more corrigible than an opaque policy model.
English
1
0
0
23
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 I don't yet see how this is true; I don't see how knowing how to design an agent architecture that is corrigble also makes it so that I make it better at, say, classifying images or predicting the winningness of Go moves. Could you spell that out for me more clearly?
English
1
0
0
26
SE Gyges
SE Gyges@segyges·
@Vaniver @allTheYud @MatriceJacobine @Simon248 unfortunately the entire concept of corrigibility is an error, but if we pretend that it is not one, the ability to predict and cause if a model will or will not take a broad class of actions relating to "corrigibility" will also enable you to do this regarding any other goal
English
2
0
0
31
SE Gyges
SE Gyges@segyges·
@Vaniver @allTheYud @MatriceJacobine @Simon248 The division into virtuous alignment researchers and horrifying capabilities researchers is fundamentally not real, doesn't make any sense, cannot be real even in principle
English
2
0
0
33
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 I don't agree with the distance metric that you're using? Like, I think there is some separation in concept-space (and underlying code-space / weight-space) between models that are unaligned to humanity and models that are, and the core question is "which gets made first?"
English
2
0
0
31
Matthew Gray ⏸️
Matthew Gray ⏸️@Vaniver·
@segyges @allTheYud @MatriceJacobine @Simon248 When we have more characters, we normally talk about 'differential advancement', that is, increasing alignment more than it increases capabilities. I think it's nonobvious whether this is true of lots of tech (like interpretability seems pretty even to me).
English
1
0
0
28