benthamite🔸

557 posts

benthamite🔸

@benthamite_

push-pin = poetry. Effective Altruist.

Katılım Ocak 2020

138 Takip Edilen417 Takipçiler

benthamite🔸@benthamite_·4 Mar

I like that OAI has been discussing things in public and feel more positively towards them than if they had just made their original post

signüll@signulll

openai needs to understand one simple fact, posting more about a situation over & over, including AMAs does not create clarity. it creates even more confusion & mistrust. like you're trying desperately to be understood. if you got it wrong the first time or there are problems, it's probably better you don't try to "fix it" by posting more & more. there are other mechanics you can leverage to address. just some friendly feedback, that's all.

English

benthamite🔸@benthamite_·3 Mar

@csvoss @robbensinger @BenKorpan @IsaacKing314 @tszzl I appreciate you and other OAI staff pushing for changes and having public discussion, and I hope that you continue doing so!

English

Chelsea Sierra Voss@csvoss·3 Mar

@benthamite_ @robbensinger @BenKorpan @IsaacKing314 @tszzl slight correction to both: technically I said “through contract and more,” and by doing so I wasn’t necessarily claiming that the contract alone is *sufficient* to protect the redlines; just that it’s one part contributing to a *system* that does Evidence of the system working:

Sam Altman@sama

Here is re-post of an internal post: We have been working with the DoW to make some additions in our agreement to make our principles very clear. 1. We are going to amend our deal to add this language, in addition to everything else: "• Consistent with applicable laws, including the Fourth Amendment to the United States Constitution, National Security Act of 1947, FISA Act of 1978, the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals. • For the avoidance of doubt, the Department understands this limitation to prohibit deliberate tracking, surveillance, or monitoring of U.S. persons or nationals, including through the procurement or use of commercially acquired personal or identifiable information." It’s critical to protect the civil liberties of Americans, and there was so much focus on this, that we wanted to make this point especially clear, including around commercially acquired information. Just like everything we do with iterative deployment, we will continue to learn and refine as we go. I think this is an important change; our team and the DoW team did a great job working on it. 2. The Department also affirmed that our services will not be used by Department of War intelligence agencies (for example, the NSA). Any services to those agencies would require a follow-on modification to our contract. 3. For extreme clarity: we want to work through democratic processes. It should be the government making the key decisions about society. We want to have a voice, and a seat at the table where we can share our expertise, and to fight for principles of liberty. But we are clear on how the system works (because a lot of people have asked, if I received what I believed was an unconstitutional order, of course I would rather go to jail than follow it). But 4. There are many things the technology just isn’t ready for, and many areas we don’t yet understand the tradeoffs required for safety. We will work through these, slowly, with the DoW, with technical safeguards and other methods. 5. One thing I think I did wrong: we shouldn't have rushed to get this out on Friday. The issues are super complex, and demand clear communication. We were genuinely trying to de-escalate things and avoid a much worse outcome, but I think it just looked opportunistic and sloppy. Good learning experience for me as we face higher-stakes decisions in the future. In my conversations over the weekend, I reiterated that Anthropic should not be designated as a SCR, and that we hope the DoW offers them the same terms we’ve agreed to. We will host an All Hands tomorrow morning to answer more questions.

English

Rob Bensinger ⏹️@robbensinger·1 Mar

Is this a fair recap of what's known about Anthropic v. DoW? 1. It's a mystery how any of this started. Some sources claim that an Anthropic official was talking to a Palantir executive, and expressed "disapproval" for using Claude to do things like capture Maduro; and then the Palantir executive reported this to DoW (techpolicy.press/a-timeline-of-…). Anthropic seemingly disputes whether this happens, saying, "We have never raised objections to particular military operations nor attempted to limit use of our technology in an ad hoc manner" (anthropic.com/news/statement…). But perhaps Anthropic means that they never formally raised objections...? Separately, a defense official told the Washington Post that Dario was asked "If an intercontinental ballistic missile was launched at the United States, could the military use Anthropic's Claude AI system to help shoot it down?", and that Dario gave a response along the lines of, "You could call us and we'd work it out." But, per the Washington Post: "An Anthropic spokesperson denied Anthropic gave that response, calling the account 'patently false,' and saying the company has agreed to allow Claude to be used for missile defense." A lot of people have floated the idea that this may have been a scheme by Palantir and/or OpenAI to kneecap Anthropic. But I don't think there are any smoking guns there, aside from the obvious fact that OpenAI's President is the world's single largest Trump donor, and somehow the direct consequence of all of this was an OpenAI competitor suffering a massive blow while OpenAI scores a major DoD contract. 2. It's suspicious that all of this happened on the same week that (a) Anthropic rolled back all of its core safety commitments (x.com/twitter/status…) and (b) the US invaded Iran. But as far as I know, there's no public evidence that any of these events are connected? 3. Based on all the public info so far, it seems that OpenAI fully capitulated to DoW's substantive requests and then @sama lied and deceived the public about this (with some help from @tszzl and other OpenAI staff)? Peeling past the layers of rhetoric, the substance of OpenAI's deal seems to be "DoW does whatever it wants" (x.com/TheZvi/status/… + x.com/ARozenshtein/s… + x.com/ShakeelHashim/… + x.com/jeffrsebo/stat…). Presumably the point of Altman's rhetoric was just to add some confusion and noise to the room in the hope of preventing a staff exodus; by the time people realize what's happened, if they ever do, the news cycle will have moved on and the urgency will have diffused from the room. (One similarly has to imagine that Anthropic walking back all of its core safety commitments might have prompted a staff exodus at Anthropic, if the timing hadn't worked out so perfectly for the org. But maybe that's an over-optimistic take on Anthropic staff? It's an incredibly well-paying, cushy job, and now it's a job where you get to feel like a hero to many people even as you actively work to reduce the lifespan of your kids (x.com/robbensinger/s…). The halo effect is a doozy, and I have to imagine there's a lot of pressure to find some excuse to focus on the happier side of things.) 4. It's a mystery what DoW's goal was here (if it wasn't just 'find some excuse to shift money and power from Anthropic to OpenAI'). Possibly it was just a series of misunderstandings that escalated via clashing egos. Some have suggested that DoW's objection to Anthropic's terms was just "a matter of principle" (x.com/deanwball/stat…), and OpenAI claims that DoW has no plans to do mass domestic surveillance (x.com/ShakeelHashim/…). (Not that OpenAI's claims about the physical world carry that much weight nowadays...?) But either way, it's certainly suspicious that DoW has focused the discussion on "legal uses", when Anthropic claims that the big issue is that US law hasn't caught up to some mass surveillance loopholes that SotA AI now enables (anthropic.com/news/statement…). Similar questions exist re how actually-important Anthropic's bright line on near-term fully autonomous killbots was, from the DoW's perspective. Given that OpenAI's new DoW deal seems to genuinely give vastly more leeway to DoW to do killbots and mass domestic surveillance, this seems like some evidence for "Anthropic's red lines were actually meaningfully constraining things DoW wanted to do, and that's what this was all really about", and some evidence against a mere personality clash or a ploy to shift power to OpenAI. 5. Nearly all of this discussion is ignoring the real elephant in the room, which is that AI companies are locked in a race to build a technology that most leaders in the field agree poses a serious chance of killing us all. Anthropic and OpenAI seem to be aware of what the real name of the game is, even if they dramatically understate the risks and frequently misunderstand their nature. In contrast, I haven't seen recent evidence that DoW or the White House are tracking this at all. (And many of their actions don't seem to make sense if you view AI in those terms.)

English

130

20.3K

benthamite🔸@benthamite_·2 Mar

@robbensinger @csvoss @BenKorpan @IsaacKing314 @tszzl Fair. I may be misunderstanding them

English

Rob Bensinger ⏹️@robbensinger·2 Mar

@benthamite_ @csvoss @BenKorpan @IsaacKing314 @tszzl Chelsea says above that OpenAI has three clear redlines that "are protected *in depth*, through contract". This contradicts what you're saying, which is that the contract doesn't protect against anything, it just takes note of which things are already protected anyway (by law).

English

benthamite🔸@benthamite_·2 Mar

I understand the OAI justification to be: 1) bad things are illegal, so you don't need to ban them in contracts, 2) to the extent bad things aren't illegal you should make them illegal instead of banning them in contracts. I'd be interested to hear from @csvoss and @tszzl if that's correct and if so which one is doing the work for them personally

English

Rob Bensinger ⏹️@robbensinger·2 Mar

Ah, in that case, sure? I'm not quite sure I see why specifying "legal" changes anything; it's not as though the DoD is supposed to be permitted to break the law when contracts don't specify this.

English

benthamite🔸@benthamite_·2 Mar

@robbensinger @csvoss @BenKorpan @IsaacKing314 @tszzl I think @csvoss is objecting to your phrase "can do whatever it wants" since dow is restricted to doing only legal things. IIUC it would be more accurate to say that dow "can do all of the things ant was worried about"

English

Rob Bensinger ⏹️@robbensinger·2 Mar

@csvoss @BenKorpan @IsaacKing314 @tszzl Seems relevant?: - x.com/jeremyphoward/… - x.com/jeremyphoward/…

Jeremy Howard@jeremyphoward

As @bradrcarson explains, the contract language released so far does not restrict the gov from using AI to kill without human oversight.

English

134

benthamite🔸@benthamite_·2 Mar

@tszzl What suggestions do you have for people without operational insight?

English

roon@tszzl·1 Mar

most of the so called ai watchdog groups on here are barely concealed partisans. they operate almost entirely on personal trust and immediately enter the frame control of whichever side they’re on when they have no operational insight

English

537

33.3K

benthamite🔸@benthamite_·1 Mar

@ARozenshtein I think he was trying to point out that it's good that the usw is saying this, not make an independent claim that the statement is true

English

326

Alan Rozenshtein@ARozenshtein·1 Mar

I think I understand what Boaz is trying to say, but given that the National Security Agency is part of the military and given the amount of incidental collection of domestic communications that (legally) occurs under FISA and 12,333, this statement is simply not true.

English

176

9.7K

benthamite🔸@benthamite_·28 Şub

#1 post in /r/OpenAI is titled "The end of GPT". 7/10 top posts are about cancelling accounts/anti-OAI. 10/10 /r/anthropic posts are pro-A\, 3/10 about people switching from oai

English

3.3K

benthamite🔸@benthamite_·28 Şub

"I wish we had more empirical evidence re: Holden's claim 'there are companies not terribly far behind the frontier that would see any unilateral pause or slowdown as an opportunity rather than a warning'"

English

213

benthamite🔸@benthamite_·27 Şub

I stand with Sec. Hegseth. I too would rather invoke the DPA than do a project without claude

English

2.3K

92.7K

benthamite🔸@benthamite_·25 Şub

@_ueaj What are you requesting they do?

English

ueaj@_ueaj·24 Şub

Call you reps! It's the first time I've done it, and it turns out it's very easy and it takes like 5 minutes. Every AI researcher in America should be doing this right now. Your work is going to be used for surveillance and automated weaponry

Miles Brundage@Miles_Brundage

Congress during a high-stakes showdown over the most important technology of the century

English

5.3K

benthamite🔸@benthamite_·22 Şub

@TetraspaceWest Yes, addressed by Paul here: alignmentforum.org/posts/fRSj2W4F…

English

Megan Tetraspace 💎 テトラ@TetraspaceWest·21 Şub

is METR at all worried that they seem to be an external marketing department for AI corporations

English

171

10.5K

benthamite🔸@benthamite_·21 Şub

@xlr8harder Toby ord found that the mechanism you propose fits the metr data reasonably well: forum.effectivealtruism.org/posts/qz3xyqCe…

English

xlr8harder@xlr8harder·20 Şub

I think people are overinterpreting these time horizon evals. They are very impressive! But when error rates are near zero, and tasks require many successful steps in order to complete, small absolute improvements in error rate have a multiplicative effect. Consider a task with 1000 steps. A model that makes errors on 1% of steps will succeed about 37% of the time. A model that makes errors on 0.5% of steps will succeed nearly 61% of the time. This is a dramatic apparent improvement in reliability and could come from only eliminating, for example, one single kind of reasonably uncommon error. That's not to downplay the accomplishment here, but I think people are reading more into these numbers than they should.

METR@METR_Evals

We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.

English

155

18.5K

benthamite🔸@benthamite_·21 Şub

@MichaelTrazzi Why is 80% better? Naively I would have thought that 50% is better because it's less affected by outlier data points

English

181

Michaël Trazzi@MichaelTrazzi·20 Şub

For people freaking out: 14.5 hours is the 50% horizon with a limited task suite The 80% horizon (which matters more for the automated coder in AI 2027 type scenarios) isn't on a faster trend

METR@METR_Evals

English

463

54.1K

benthamite🔸@benthamite_·20 Şub

Yes, Zvi's were on risk [1-4] I'm confused about what you think they should have done here then. Should they have just cancelled the contest because people didn't give submissions pointing in the right direction? Said "We will give Eliezer the prize even though he didn't submit anything?" 1. thezvi.substack.com/p/the-crux-list 2. thezvi.substack.com/p/to-predict-w… 3. thezvi.substack.com/p/stages-of-su… 4. thezvi.substack.com/p/types-and-de…

English

622

Eliezer Yudkowsky@allTheYud·20 Şub

@benthamite_ @MatthewJBar I think Zvi was (wrongly) more optimistic than I was about OP epistemics and submitted something, but maybe it was about risk numbers instead of timelines. (The risk essays also argued in the opposite direction of later OP updates.)

English

1.3K

Eliezer Yudkowsky@allTheYud·20 Şub

You ran a $50,000 "Change Our Views" essay contest in 2022 about your AI timeline of 2050, and gave the prize to an essay arguing for even longer timelines, as I had predicted would be the case, even as I also knew your timelines would later update shorter.

Bentham's Bulldog@Benthamsbulldog

English

218

36.5K

benthamite🔸@benthamite_·20 Şub

@allTheYud @MatthewJBar (though of course it's not so long ago that 47% by 2043 was considered a "short" timeline!)

English

benthamite🔸@benthamite_·20 Şub

Were there any submissions arguing for shorter timelines? At least of the ones made public on the EA Forum[1], the closest I can find is @MatthewJBar arguing 47% by 2043.[2] So I think maybe the long timelines people won by default? 1. forum.effectivealtruism.org/topics/open-ph… 2. forum.effectivealtruism.org/posts/fsaogRok…

English

1.5K

benthamite🔸@benthamite_·14 Şub

@doesnotplaydice @ben_j_todd Then I would encourage him to reference those facts in future communications

English

103

A@doesnotplaydice·14 Şub

@benthamite_ @ben_j_todd he has the facts on his side

English

113

Benjamin Todd@ben_j_todd·14 Şub

Jensen Huang when asked by his biographer whether the world is prepared for AI risk: "This cannot be a ridiculous sci-fi story,” he said. He gestured to his frozen PR reps at the end of the table. “Do you guys understand? I didn’t grow up on a bunch of sci-fi stories, and this is not a sci-fi movie. These are serious people doing serious work!” he said. “This is not a freaking joke! This is not a repeat of Arthur C. Clarke. I didn’t read his fucking books. I don’t care about those books! It’s not– we’re not a sci-fi repeat! This company is not a manifestation of Star Trek! We are not doing those things! We are serious people, doing serious work. And – it’s just a serious company, and I’m a serious person, just doing serious work.”

English

416

84.8K

benthamite🔸@benthamite_·13 Şub

When you run claude --dangerously-skip-permissions and it does something dangerous without asking permission

GIF

English

581

Keşfet

@csvoss @robbensinger @BenKorpan @IsaacKing314 @tszzl @sama @ARozenshtein @_ueaj