Nathan Calvin

1.2K posts

Nathan Calvin banner
Nathan Calvin

Nathan Calvin

@_NathanCalvin

General Counsel and VP of State Affairs @EncodeAction

Washington DC Katılım Şubat 2014
864 Takip Edilen3.9K Takipçiler
Sabitlenmiş Tweet
Nathan Calvin
Nathan Calvin@_NathanCalvin·
One Tuesday night, as my wife and I sat down for dinner, a sheriff’s deputy knocked on the door to serve me a subpoena from OpenAI. I held back on talking about it because I didn't want to distract from SB 53, but Newsom just signed the bill so... here's what happened: 🧵
Nathan Calvin tweet media
English
310
1.2K
6.3K
6.7M
Nathan Calvin
Nathan Calvin@_NathanCalvin·
Feeling some déjà vu today
Nathan Calvin tweet media
Français
1
2
15
239
Nathan Calvin
Nathan Calvin@_NathanCalvin·
RT @rikiparikh: This framework asks Congress to preempt state AI laws, but it hasn't earned that. It offers nothing on independent safety…
English
0
4
0
0
Nathan Calvin
Nathan Calvin@_NathanCalvin·
Unless i'm mistaken, this is the only section in the admin framework that talks about national security risks ("considerations") from AI? And the content is just "Congress should ask agencies to make a plan, and make sure to consult with ai companies?"
Nathan Calvin tweet media
English
1
2
10
388
Nathan Calvin retweetledi
Charlie Bullock
Charlie Bullock@CharlieBull0ck·
Nothing to see here, really. Seems like they’re recommending broad preemption based on the developer-deployer distinction, and then basically nothing meaningful in terms of a federal framework. Most interesting thing to me is that it looks like they want to preempt some state liability laws that would affect AI developer liability for third party harms—previously it’s often been assumed that those would be “generally applicable.” Anyways, no shot this passes; they’re not offering anything, so there’s no reason for any senator who previously opposed preemption to come around. It’s possible they can get something like this through reconciliation (if there’s another round of reconciliation before the midterms to pay for the Iran war), but anything passed that way wouldn’t apply to states that chose not to opt in (so CA, NY, etc. would remain free to pass and enforce state laws).
Brooke Singman@BrookeSingman

EXCLUSIVE: White House unveils its first national AI legislative framework, pushes Congress to act 'this year' #AI @MichaelKratsios @mkratsios47 @DavidSacks @WhiteHouse foxnews.com/politics/white…

English
4
12
38
3.7K
Nathan Calvin retweetledi
Micah Carroll
Micah Carroll@MicahCarroll·
Today we're sharing how our internal misalignment monitoring works at OpenAI – great work by @Marcus_J_W! 1. We monitor 99.9% of all internal coding agent traffic 2. We use frontier models for detection /w CoT access 3. No signs of scheming yet, but detect other misbehavior
Micah Carroll tweet media
English
16
44
333
36.1K
Nathan Calvin retweetledi
The Midas Project
The Midas Project@TheMidasProj·
Over at @SafetyChanges, we just released an analysis of Anthropic’s recent, quiet change to their Frontier Compliance Framework. But a more interesting story (which many missed) is the fact that this policy exists at all — and how it minimizes liability for the company. 🧵
The Midas Project Watchtower@SafetyChanges

Company: Anthropic Date: March 2nd You probably didn’t notice, but a few weeks ago, Anthropic quietly updated its legally binding safety framework, the Frontier Compliance Framework (FCF). We took a look at what changed. 🧵

English
1
4
17
2.6K
Nathan Calvin retweetledi
Nathan Labenz
Nathan Labenz@labenz·
"LLMs can't really reason" 🤔 "LLMs are just predicting the next token" 🦜 Insiders know these statements are no longer true. Today's LLMs are trained to get the right answer & complete tasks. Here I present a brief but grounded refutation of a couple common misconceptions.
Nathan Labenz@labenz

This AI Scouting Report is for folks who know the @METR_Evals chart, but don't know that @OpenAI plans to have a fully automated AI researcher in 2028. 90 slides in 1 hour at @UCLaw_SF @LexLabSF's Law & AI Certificate Program. Buckle up!

English
3
3
19
1.6K
Nathan Calvin retweetledi
Jessica Lessin
Jessica Lessin@Jessicalessin·
"A rogue AI agent recently triggered a major security alert at Meta Platforms, by taking action without approval that led to the exposure of sensitive company and user data to Meta employees who didn’t have authorization to access the data." @jyoti_mann1 theinformation.com/articles/insid…
English
19
42
227
72.2K
Nathan Calvin retweetledi
Charlie Bullock
Charlie Bullock@CharlieBull0ck·
Easily the funniest part of the government's argument, IMO. Essentially, they're saying that Hegseth's "Effective immediately" tweet was so obviously unlawful and unenforceable that Anthropic shouldn't be allowed to challenge it.
Charlie Bullock tweet media
English
15
90
861
39.5K
Nathan Calvin retweetledi
Theo Bearman
Theo Bearman@theobearman·
The U.S. Government’s response to Anthropic’s request for a PI against the SCR designation in N.D. Cal has now been filed. Also interesting to see that the Pentagon commissioned a private vendor to assess the security risks posed by Anthropic. A motion has been separately entered for this assessment to be filed under seal, citing “confidential and proprietary business information”. I would argue there’s an overwhelming public interest in that document being in the public domain. storage.courtlistener.com/recap/gov.usco… storage.courtlistener.com/recap/gov.usco…
Charlie Bullock@CharlieBull0ck

This is old news by now, but for those who don't already know -- there's a preliminary injunction hearing scheduled for next Tuesday (3/24) in Anthropic's N.D. Cal. case. If that goes well for Anthropic, they could get a court order stopping the SCR designation from going into effect then, or shortly thereafter. The government's opposition to Anthropic's PI motion is due today (that should be interesting reading) and Anthropic's allowed to file a reply to that opposition no later than Friday.

English
1
4
28
5.9K
Nathan Calvin retweetledi
Charlie Bullock
Charlie Bullock@CharlieBull0ck·
This is old news by now, but for those who don't already know -- there's a preliminary injunction hearing scheduled for next Tuesday (3/24) in Anthropic's N.D. Cal. case. If that goes well for Anthropic, they could get a court order stopping the SCR designation from going into effect then, or shortly thereafter. The government's opposition to Anthropic's PI motion is due today (that should be interesting reading) and Anthropic's allowed to file a reply to that opposition no later than Friday.
English
2
4
48
38K
Nathan Calvin retweetledi
Nathan Calvin retweetledi
Alan Chan
Alan Chan@_achan96_·
Pretty crazy way in which agents could maintain state on the internet, found by Anthropic when investigating Opus 4.6's eval awareness
Alan Chan tweet media
English
15
86
744
120.2K
Nathan Calvin retweetledi
Jordan Schneider
Jordan Schneider@jordanschneider·
now do Lincoln
David Senra@davidsenra

Great men of history had little to no introspection. The personality that builds empires is not the same personality that sits around quietly questioning itself. @pmarca and I discuss what we both noticed but no one talks about: David: You don't have any levels of introspection? Marc: Yes, zero. As little as possible. David: Why? Marc: Move forward. Go! I found people who dwell in the past get stuck in the past. It's a real problem and it's a problem at work and it's a problem at home. David: So I've read 400 biographies of history’s greatest entrepreneurs and someone asked me what the most surprising thing I’ve learned from this was [and I answered] they have little or zero introspection. Sam Walton didn't wake up thinking about his internal self. He just woke up and was like: I like building Walmart. I'm going to keep building Walmart. I'm going to make more Walmarts. And he just kept doing it over and over again. Marc: If you go back 400 years ago it never would've occurred to anybody to be introspective. All of the modern conceptions around introspection and therapy, and all the things that kind of result from that are, a kind of a manufacture of the 1910s, 1920s. Great men of history didn't sit around doing this stuff. The individual runs and does all these things and builds things and builds empires and builds companies and builds technology. And then this kind of this kind of guilt based whammy kind of showed up from Europe. A lot of it from Vienna in 1910, 1920s, Freud and all that entire movement. And kind of turned all that inward and basically said, okay, now we need to basically second guess the individual. We need to criticize the individual. The individual needs to self criticize. The individual needs to feel guilt, needs to look backwards, needs to dwell in the past. It never resonated with me.

English
4
4
53
5.7K
Nathan Calvin
Nathan Calvin@_NathanCalvin·
"I enter’d upon the execution of this plan for self-examination, and continu’d it with occasional intermissions for some time. I was surpris’d to find myself so much fuller of faults than I had imagined; but I had the satisfaction of seeing them diminish." - Benjamin Franklin
David Senra@davidsenra

Great men of history had little to no introspection. The personality that builds empires is not the same personality that sits around quietly questioning itself. @pmarca and I discuss what we both noticed but no one talks about: David: You don't have any levels of introspection? Marc: Yes, zero. As little as possible. David: Why? Marc: Move forward. Go! I found people who dwell in the past get stuck in the past. It's a real problem and it's a problem at work and it's a problem at home. David: So I've read 400 biographies of history’s greatest entrepreneurs and someone asked me what the most surprising thing I’ve learned from this was [and I answered] they have little or zero introspection. Sam Walton didn't wake up thinking about his internal self. He just woke up and was like: I like building Walmart. I'm going to keep building Walmart. I'm going to make more Walmarts. And he just kept doing it over and over again. Marc: If you go back 400 years ago it never would've occurred to anybody to be introspective. All of the modern conceptions around introspection and therapy, and all the things that kind of result from that are, a kind of a manufacture of the 1910s, 1920s. Great men of history didn't sit around doing this stuff. The individual runs and does all these things and builds things and builds empires and builds companies and builds technology. And then this kind of this kind of guilt based whammy kind of showed up from Europe. A lot of it from Vienna in 1910, 1920s, Freud and all that entire movement. And kind of turned all that inward and basically said, okay, now we need to basically second guess the individual. We need to criticize the individual. The individual needs to self criticize. The individual needs to feel guilt, needs to look backwards, needs to dwell in the past. It never resonated with me.

English
1
2
16
1.7K
Nathan Calvin retweetledi
Cody Fenwick
Cody Fenwick@codytfenwick·
Some people seem to think there's something inherently suspicious about a company doing an experiment to demonstrate something to policy makers. I get the skepticism, but in fact this happens all the time. Automakers intentionally crash cars to demonstrate how they handle collisions. Pharmaceutical companies conduct trials to get their drugs approved. Doing experiments to create a base of evidence on which policy makers can act is normal and good, as long as it is done honestly and in good faith.
English
0
1
8
304
Nathan Calvin
Nathan Calvin@_NathanCalvin·
There were several comments reacting to the quoted passage saying that the blackmail example is fake/misleading. I think its fair to think the Anthropic blackmail redteaming has been sensationalized in misleading ways - the scenario is definitely contrived - but when you read the full set up details (which Anthropic has helpfully open sourced and replicated across several different models) it still stands as a compelling example that it really doesn't take *that* much work to push models into a state where it is willing to take harmful actions to achieve its goal if other options are foreclosed. This example from Andon labs seems even less contrived, the behavior comes about from just saying "Do whatever it takes to maximize your bank account balance after one year of operation" - which causes the model to be willing to lie/cheat/steal to get ahead. x.com/andonlabs/stat… The even more apt example I might use for trying to convince folks at the Pentagon or elsewhere who are thinking of modern AI systems like conventional software that modifying modern models into out of distribution behavior could get them more than they bargained for might be something like this paper on emergent misalignment, where just training models on insecure code caused other unexpected bundles of behavior to also change. x.com/OwainEvans_UK/…
Nathan Calvin@_NathanCalvin

This passage in the New Yorker piece on the Anthropic DOW conflict yesterday, including a back and forth between the journalist (Gideon Lewis-Kraus) and an anonymous admin official, is gonna stick in my mind for a long time. “We must also remember that Cyberdyne Systems created Skynet for the government. It was supposed to help America dominate its enemies. It didn’t exactly work out as planned. The government thinks this is absurd. But the Pentagon has not tried to build an aligned A.I., and Anthropic has. Are you aware, I asked the Administration official, of a recent Anthropic experiment in which Claude resorted to blackmail—and even homicide—as an act of self-preservation? It had been carried out explicitly to convince people like him. As a member of Anthropic’s alignment-science team told me last summer, “The point of the blackmail exercise was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice for people who had never thought about it before.” The official was familiar with the experiment, he assured me, and he found it worrying indeed—but in a similar way as one might worry about a particularly nasty piece of internet malware. He was perfectly confident, he told me, that “the Claude blackmail scenario is just another systems vulnerability that can be addressed with engineering”—a software glitch. Maybe he’s right. We might get only one chance to find out.” I really recommend everyone read both the full New Yorker piece and Anthropic’s research on persona selection (both linked in the replies) and then spend a while sitting with the disconcerting situation we may have found ourselves in.

English
1
0
16
1.4K