Joshua Batson

2.1K posts

Joshua Batson

Joshua Batson

@thebasepoint

trying to understand evolved systems (🖥 and 🧬) interpretability research @anthropicai formerly @czbiohub, @mit math

Oakland, CA Katılım Şubat 2012
678 Takip Edilen5.8K Takipçiler
Joshua Batson retweetledi
Nathan Calvin
Nathan Calvin@_NathanCalvin·
This passage in the New Yorker piece on the Anthropic DOW conflict yesterday, including a back and forth between the journalist (Gideon Lewis-Kraus) and an anonymous admin official, is gonna stick in my mind for a long time. “We must also remember that Cyberdyne Systems created Skynet for the government. It was supposed to help America dominate its enemies. It didn’t exactly work out as planned. The government thinks this is absurd. But the Pentagon has not tried to build an aligned A.I., and Anthropic has. Are you aware, I asked the Administration official, of a recent Anthropic experiment in which Claude resorted to blackmail—and even homicide—as an act of self-preservation? It had been carried out explicitly to convince people like him. As a member of Anthropic’s alignment-science team told me last summer, “The point of the blackmail exercise was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice for people who had never thought about it before.” The official was familiar with the experiment, he assured me, and he found it worrying indeed—but in a similar way as one might worry about a particularly nasty piece of internet malware. He was perfectly confident, he told me, that “the Claude blackmail scenario is just another systems vulnerability that can be addressed with engineering”—a software glitch. Maybe he’s right. We might get only one chance to find out.” I really recommend everyone read both the full New Yorker piece and Anthropic’s research on persona selection (both linked in the replies) and then spend a while sitting with the disconcerting situation we may have found ourselves in.
English
8
24
229
133.3K
Joshua Batson retweetledi
Mark Histed
Mark Histed@HistedLab·
Writing out a conversation I’ve been having a lot at this conference: Things in US science are far, far worse than people know. Far worse than even other scientists know. 1/
English
21
222
1.5K
297.2K
Joshua Batson
Joshua Batson@thebasepoint·
@Knikct Ah yes I think your proposal of having containers which call public APIs is pretty reasonable. That would allow for actor/critic setups in a transparent way.
English
0
0
0
40
Nikhil Srivastava
@thebasepoint agreed! we are open to testing rigs on top of public models, subject to funding and logistical constraints, and as long as it is done transparently. see sec 3 of the announcement.
English
1
0
2
133
Joshua Batson
Joshua Batson@thebasepoint·
@Knikct Even if you don’t want tool use like python experiments or Lean (eg as in a swe harness metr.org/AI_R_D_Evaluat…), some degree of iteration can change the results significantly.
English
0
0
0
28
Joshua Batson
Joshua Batson@thebasepoint·
@Knikct The evals performed by METR might be worth comparing to
English
1
0
0
26
Joshua Batson retweetledi
Hayden Field
Hayden Field@haydenfield·
NEW: When OpenAI announced its Pentagon deal Friday night, people immediately challenged Sam Altman's claims. Why, they asked, would the DoD suddenly agree to red lines when it had said it would never do so? The answer, sources told me, is that it didn't. theverge.com/ai-artificial-…
English
21
286
1.2K
99.6K
Joshua Batson retweetledi
sam mcallister
sam mcallister@sammcallister·
@aidan_mclau @scrollvoid This isn't true. Anthropic hasn't offered a "helpful-only" model without safeguards for NatSec use. Claude Gov is a custom model with extra training, including technical safeguards. (We've also had FDEs and researchers implementing it, and we run our own classifier stack.)
English
15
37
554
127.3K
Joshua Batson retweetledi
Jeff Sebo
Jeff Sebo@jeffrsebo·
Many critiques and defenses of @sama and @OpenAI right now seem to be talking past each other. I think at least three issues need to be separated: the substance of the deal, the timing, and the messaging. On substance: OpenAI appears to have accepted "all lawful uses" language with assurances that current law and policy rule out mass surveillance and autonomous weapons, rather than requiring explicit contractual commitments. They also appear to have deferred to the government's definitions rather than stipulating definitions that cover novel capabilities — like sifting through legally procured data at scale. The details are still unknown, but based on public statements, I lean towards Anthropic on both counts. That said, I can see reasonable arguments on the other side. On timing: The government had just declared a competitor a supply chain risk — a designation normally reserved for foreign adversaries — on transparently disingenuous and unacceptable grounds. Here I feel more strongly that OpenAI is in the wrong. There are times for competition and times for solidarity, and this was a clear time for solidarity. Signing a deal mere hours later, whatever its merits on substance, undermined the entire industry's ability to push back against government overreach. That matters for principled and long-term pragmatic reasons alike. On messaging: Sam's statement was, at best, confusing. Many people struggled to determine how OpenAI's deal differed from Anthropic's proposal. Had Sam written something like: "We accepted roughly the compromise that Anthropic rejected, because we trust the government not to use AI for mass surveillance or automated weapons, and in any case we view such matters as up to public officials, not private companies," I would have disagreed, but I would have at least respected the straightforwardness. Instead his statement seemed designed to obscure and mislead. If I were an OpenAI employee, I would not be thrilled that a taxonomy needs to be created to clarify how the company is being critiqued and defended. But here we are. Do with this what you wish!
English
10
26
285
18.8K
Joshua Batson retweetledi
Leo Gao
Leo Gao@nabla_theta·
@boazbaraktcs - what happens when the model/safety stack refuses DoW queries? if the DoW gets mad and strongarms openai, like they just did to anthropic, how is openai going to resist? especially if openai doesn't even have the strong contractual protection
English
1
2
133
4.1K
Joshua Batson
Joshua Batson@thebasepoint·
As the government report says, the scope and scale of commercially available information (CAI) which is publicly available information (PAI) is radically beyond what our current laws foresaw.
Joshua Batson tweet media
English
1
4
17
881
Joshua Batson
Joshua Batson@thebasepoint·
"Civil liberties concerns such as these are examples of how large quantities of nominally “public” information can result in sensitive aggregations."
English
1
2
16
957
Joshua Batson
Joshua Batson@thebasepoint·
For those wondering how mass domestic surveillance could be consistent with "all lawful use" of AI models, I recommend a declassified report from the ODNI on just how much can be done with commercially available data (CAI): "...to identify ever person who attended a protest"
Joshua Batson tweet media
English
19
187
608
89.5K