benthamite🔸

557 posts

benthamite🔸 banner
benthamite🔸

benthamite🔸

@benthamite_

push-pin = poetry. Effective Altruist.

Katılım Ocak 2020
138 Takip Edilen417 Takipçiler
Chelsea Sierra Voss
Chelsea Sierra Voss@csvoss·
@benthamite_ @robbensinger @BenKorpan @IsaacKing314 @tszzl slight correction to both: technically I said “through contract and more,” and by doing so I wasn’t necessarily claiming that the contract alone is *sufficient* to protect the redlines; just that it’s one part contributing to a *system* that does Evidence of the system working:
Sam Altman@sama

Here is re-post of an internal post: We have been working with the DoW to make some additions in our agreement to make our principles very clear. 1. We are going to amend our deal to add this language, in addition to everything else: "• Consistent with applicable laws, including the Fourth Amendment to the United States Constitution, National Security Act of 1947, FISA Act of 1978, the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals. • For the avoidance of doubt, the Department understands this limitation to prohibit deliberate tracking, surveillance, or monitoring of U.S. persons or nationals, including through the procurement or use of commercially acquired personal or identifiable information." It’s critical to protect the civil liberties of Americans, and there was so much focus on this, that we wanted to make this point especially clear, including around commercially acquired information. Just like everything we do with iterative deployment, we will continue to learn and refine as we go. I think this is an important change; our team and the DoW team did a great job working on it. 2. The Department also affirmed that our services will not be used by Department of War intelligence agencies (for example, the NSA). Any services to those agencies would require a follow-on modification to our contract. 3. For extreme clarity: we want to work through democratic processes. It should be the government making the key decisions about society. We want to have a voice, and a seat at the table where we can share our expertise, and to fight for principles of liberty. But we are clear on how the system works (because a lot of people have asked, if I received what I believed was an unconstitutional order, of course I would rather go to jail than follow it). But 4. There are many things the technology just isn’t ready for, and many areas we don’t yet understand the tradeoffs required for safety. We will work through these, slowly, with the DoW, with technical safeguards and other methods. 5. One thing I think I did wrong: we shouldn't have rushed to get this out on Friday. The issues are super complex, and demand clear communication. We were genuinely trying to de-escalate things and avoid a much worse outcome, but I think it just looked opportunistic and sloppy. Good learning experience for me as we face higher-stakes decisions in the future. In my conversations over the weekend, I reiterated that Anthropic should not be designated as a SCR, and that we hope the DoW offers them the same terms we’ve agreed to. We will host an All Hands tomorrow morning to answer more questions.

English
1
0
2
71
Rob Bensinger ⏹️
Rob Bensinger ⏹️@robbensinger·
Is this a fair recap of what's known about Anthropic v. DoW? 1. It's a mystery how any of this started. Some sources claim that an Anthropic official was talking to a Palantir executive, and expressed "disapproval" for using Claude to do things like capture Maduro; and then the Palantir executive reported this to DoW (techpolicy.press/a-timeline-of-…). Anthropic seemingly disputes whether this happens, saying, "We have never raised objections to particular military operations nor attempted to limit use of our technology in an ad hoc manner" (anthropic.com/news/statement…). But perhaps Anthropic means that they never formally raised objections...? Separately, a defense official told the Washington Post that Dario was asked "If an intercontinental ballistic missile was launched at the United States, could the military use Anthropic's Claude AI system to help shoot it down?", and that Dario gave a response along the lines of, "You could call us and we'd work it out." But, per the Washington Post: "An Anthropic spokesperson denied Anthropic gave that response, calling the account 'patently false,' and saying the company has agreed to allow Claude to be used for missile defense." A lot of people have floated the idea that this may have been a scheme by Palantir and/or OpenAI to kneecap Anthropic. But I don't think there are any smoking guns there, aside from the obvious fact that OpenAI's President is the world's single largest Trump donor, and somehow the direct consequence of all of this was an OpenAI competitor suffering a massive blow while OpenAI scores a major DoD contract. 2. It's suspicious that all of this happened on the same week that (a) Anthropic rolled back all of its core safety commitments (x.com/twitter/status…) and (b) the US invaded Iran. But as far as I know, there's no public evidence that any of these events are connected? 3. Based on all the public info so far, it seems that OpenAI fully capitulated to DoW's substantive requests and then @sama lied and deceived the public about this (with some help from @tszzl and other OpenAI staff)? Peeling past the layers of rhetoric, the substance of OpenAI's deal seems to be "DoW does whatever it wants" (x.com/TheZvi/status/… + x.com/ARozenshtein/s… + x.com/ShakeelHashim/… + x.com/jeffrsebo/stat…). Presumably the point of Altman's rhetoric was just to add some confusion and noise to the room in the hope of preventing a staff exodus; by the time people realize what's happened, if they ever do, the news cycle will have moved on and the urgency will have diffused from the room. (One similarly has to imagine that Anthropic walking back all of its core safety commitments might have prompted a staff exodus at Anthropic, if the timing hadn't worked out so perfectly for the org. But maybe that's an over-optimistic take on Anthropic staff? It's an incredibly well-paying, cushy job, and now it's a job where you get to feel like a hero to many people even as you actively work to reduce the lifespan of your kids (x.com/robbensinger/s…). The halo effect is a doozy, and I have to imagine there's a lot of pressure to find some excuse to focus on the happier side of things.) 4. It's a mystery what DoW's goal was here (if it wasn't just 'find some excuse to shift money and power from Anthropic to OpenAI'). Possibly it was just a series of misunderstandings that escalated via clashing egos. Some have suggested that DoW's objection to Anthropic's terms was just "a matter of principle" (x.com/deanwball/stat…), and OpenAI claims that DoW has no plans to do mass domestic surveillance (x.com/ShakeelHashim/…). (Not that OpenAI's claims about the physical world carry that much weight nowadays...?) But either way, it's certainly suspicious that DoW has focused the discussion on "legal uses", when Anthropic claims that the big issue is that US law hasn't caught up to some mass surveillance loopholes that SotA AI now enables (anthropic.com/news/statement…). Similar questions exist re how actually-important Anthropic's bright line on near-term fully autonomous killbots was, from the DoW's perspective. Given that OpenAI's new DoW deal seems to genuinely give vastly more leeway to DoW to do killbots and mass domestic surveillance, this seems like some evidence for "Anthropic's red lines were actually meaningfully constraining things DoW wanted to do, and that's what this was all really about", and some evidence against a mere personality clash or a ploy to shift power to OpenAI. 5. Nearly all of this discussion is ignoring the real elephant in the room, which is that AI companies are locked in a race to build a technology that most leaders in the field agree poses a serious chance of killing us all. Anthropic and OpenAI seem to be aware of what the real name of the game is, even if they dramatically understate the risks and frequently misunderstand their nature. In contrast, I haven't seen recent evidence that DoW or the White House are tracking this at all. (And many of their actions don't seem to make sense if you view AI in those terms.)
English
18
4
130
20.3K
Rob Bensinger ⏹️
Rob Bensinger ⏹️@robbensinger·
@benthamite_ @csvoss @BenKorpan @IsaacKing314 @tszzl Chelsea says above that OpenAI has three clear redlines that "are protected *in depth*, through contract". This contradicts what you're saying, which is that the contract doesn't protect against anything, it just takes note of which things are already protected anyway (by law).
English
1
0
3
87
benthamite🔸
benthamite🔸@benthamite_·
I understand the OAI justification to be: 1) bad things are illegal, so you don't need to ban them in contracts, 2) to the extent bad things aren't illegal you should make them illegal instead of banning them in contracts. I'd be interested to hear from @csvoss and @tszzl if that's correct and if so which one is doing the work for them personally
English
1
0
2
57
Rob Bensinger ⏹️
Rob Bensinger ⏹️@robbensinger·
Ah, in that case, sure? I'm not quite sure I see why specifying "legal" changes anything; it's not as though the DoD is supposed to be permitted to break the law when contracts don't specify this.
English
1
0
4
56
benthamite🔸
benthamite🔸@benthamite_·
@tszzl What suggestions do you have for people without operational insight?
English
0
0
0
40
roon
roon@tszzl·
most of the so called ai watchdog groups on here are barely concealed partisans. they operate almost entirely on personal trust and immediately enter the frame control of whichever side they’re on when they have no operational insight
English
59
19
537
33.3K
benthamite🔸
benthamite🔸@benthamite_·
@ARozenshtein I think he was trying to point out that it's good that the usw is saying this, not make an independent claim that the statement is true
English
0
0
4
326
Alan Rozenshtein
Alan Rozenshtein@ARozenshtein·
I think I understand what Boaz is trying to say, but given that the National Security Agency is part of the military and given the amount of incidental collection of domestic communications that (legally) occurs under FISA and 12,333, this statement is simply not true.
English
6
11
176
9.7K
benthamite🔸
benthamite🔸@benthamite_·
#1 post in /r/OpenAI is titled "The end of GPT". 7/10 top posts are about cancelling accounts/anti-OAI. 10/10 /r/anthropic posts are pro-A\, 3/10 about people switching from oai
benthamite🔸 tweet media
English
1
2
97
3.3K
benthamite🔸
benthamite🔸@benthamite_·
"I wish we had more empirical evidence re: Holden's claim 'there are companies not terribly far behind the frontier that would see any unilateral pause or slowdown as an opportunity rather than a warning'"
benthamite🔸 tweet media
English
0
0
13
213
benthamite🔸
benthamite🔸@benthamite_·
I stand with Sec. Hegseth. I too would rather invoke the DPA than do a project without claude
English
9
76
2.3K
92.7K
Megan Tetraspace 💎 テトラ
Megan Tetraspace 💎 テトラ@TetraspaceWest·
is METR at all worried that they seem to be an external marketing department for AI corporations
English
11
5
171
10.5K
xlr8harder
xlr8harder@xlr8harder·
I think people are overinterpreting these time horizon evals. They are very impressive! But when error rates are near zero, and tasks require many successful steps in order to complete, small absolute improvements in error rate have a multiplicative effect. Consider a task with 1000 steps. A model that makes errors on 1% of steps will succeed about 37% of the time. A model that makes errors on 0.5% of steps will succeed nearly 61% of the time. This is a dramatic apparent improvement in reliability and could come from only eliminating, for example, one single kind of reasonably uncommon error. That's not to downplay the accomplishment here, but I think people are reading more into these numbers than they should.
METR@METR_Evals

We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.

English
32
10
155
18.5K
benthamite🔸
benthamite🔸@benthamite_·
@MichaelTrazzi Why is 80% better? Naively I would have thought that 50% is better because it's less affected by outlier data points
English
0
0
0
181
Michaël Trazzi
Michaël Trazzi@MichaelTrazzi·
For people freaking out: 14.5 hours is the 50% horizon with a limited task suite The 80% horizon (which matters more for the automated coder in AI 2027 type scenarios) isn't on a faster trend
Michaël Trazzi tweet media
METR@METR_Evals

We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.

English
48
21
463
54.1K
benthamite🔸
benthamite🔸@benthamite_·
Yes, Zvi's were on risk [1-4] I'm confused about what you think they should have done here then. Should they have just cancelled the contest because people didn't give submissions pointing in the right direction? Said "We will give Eliezer the prize even though he didn't submit anything?" 1. thezvi.substack.com/p/the-crux-list 2. thezvi.substack.com/p/to-predict-w… 3. thezvi.substack.com/p/stages-of-su… 4. thezvi.substack.com/p/types-and-de…
English
3
0
8
622
Eliezer Yudkowsky
Eliezer Yudkowsky@allTheYud·
@benthamite_ @MatthewJBar I think Zvi was (wrongly) more optimistic than I was about OP epistemics and submitted something, but maybe it was about risk numbers instead of timelines. (The risk essays also argued in the opposite direction of later OP updates.)
English
2
0
9
1.3K
Eliezer Yudkowsky
Eliezer Yudkowsky@allTheYud·
You ran a $50,000 "Change Our Views" essay contest in 2022 about your AI timeline of 2050, and gave the prize to an essay arguing for even longer timelines, as I had predicted would be the case, even as I also knew your timelines would later update shorter.
Bentham's Bulldog@Benthamsbulldog

English
7
1
218
36.5K
Benjamin Todd
Benjamin Todd@ben_j_todd·
Jensen Huang when asked by his biographer whether the world is prepared for AI risk: "This cannot be a ridiculous sci-fi story,” he said. He gestured to his frozen PR reps at the end of the table. “Do you guys understand? I didn’t grow up on a bunch of sci-fi stories, and this is not a sci-fi movie. These are serious people doing serious work!” he said. “This is not a freaking joke! This is not a repeat of Arthur C. Clarke. I didn’t read his fucking books. I don’t care about those books! It’s not– we’re not a sci-fi repeat! This company is not a manifestation of Star Trek! We are not doing those things! We are serious people, doing serious work. And – it’s just a serious company, and I’m a serious person, just doing serious work.”
English
43
25
416
84.8K
benthamite🔸
benthamite🔸@benthamite_·
When you run claude --dangerously-skip-permissions and it does something dangerous without asking permission
GIF
English
0
0
3
581