Liam

37 posts

Liam

@liam_epstein

AI security and stability @CNASdc

Washington, DC Присоединился Eylül 2018

347 Подписки109 Подписчики

Liam@liam_epstein·3d

@DaveRBanerjee I’d also imagine even an ~aligned sibling monitor is easier to evade, since deception is easier when you can accurately model the monitor, and a model has a much better implicit model of a sibling than of a different family?

English

127

Dave Banerjee@DaveRBanerjee·3d

I've become increasingly bullish on cross-model-family monitoring Currently, OAI monitors all their internal deployments for misalignment and misbehavior However, the monitor model is from the same model family as the model being monitored (e.g., GPT monitors itself) Now let's say that OAI trains a misaligned model. It's plausible that the monitor will also be misaligned. This is because the models are correlated (similar training data, similar training pipeline, similar algos, etc.) Now what can we do about this? One option is to use monitors from other companies. For example, OAI could use Claude to monitor internally deployed GPTs. And similarly, Anthropic could use GPT to monitor internally deployed Mythos AFAIK, there is nothing actually preventing the labs from setting this up. It would be trivial to add calls to another companies API from your monitoring stack

English

12.1K

Liam ретвитнул

James Sanders@james_s48·21 May

Even for AI companies, AI progress is faster than expected If they expected models to be this useful and in demand, they could have locked in contracts at lower prices. But most didn't, and are now getting burned by higher prices

English

940

Liam ретвитнул

Nan Ransohoff@nanransohoff·19 May

New blog post: The third wave of American philanthropy Hundreds of billions of dollars in new philanthropic capital will soon become liquid. The OpenAI Foundation holds 26% of OpenAI, worth about $220B at today’s valuation. Anthropic’s seven co-founders have pledged to give away 80% of their wealth and have instituted the most aggressive donor matching program for employees in tech history. How much does this all add up to? And how meaningful is that in the context of philanthropy today? I was doing some simple napkin math to wrap my head around the scale of what’s coming, and radicalized myself in the process. I had dramatically underappreciated the scale of the philanthropic capital that’s about to become available and the corresponding gap in talent and organizations that will be needed to make the most of it. This piece aims to directionally sketch the scale of what’s coming, the gap in operational capacity needed to absorb it, and what we can do to fill it. (Link to full post in reply)

English

121

198

1.7K

1.8M

Liam ретвитнул

Jonah Weinbaum@WeinbaumJonah·13 May

When Claude Mythos found zero-day vulnerabilities in every major operating system and browser, the US government was caught flat-footed. The White House stood up an emergency interagency task force. Treasury pulled bank CEOs into an impromptu meeting. The Cybersecurity and Infrastructure Security Agency (CISA) – the agency charged with protecting US critical infrastructure – and as of late April still reportedly lacked access to Mythos. This kind of surprise is preventable. The Trump admin has already tasked the Center for AI Standards and Innovation (CAISI) with building state capacity to understand and predict future national security-relevant AI developments. But CAISI has been severely underfunded. It’s currently a $15M pilot project. In a new research report, @arthurctellis and I estimate CAISI needs ~$84M to fully deliver on its mandate. In other words, for the cost of a single F-35A fighter jet, the US government could have real situational awareness on frontier AI and not be surprised by future Mythos moments. This situational awareness can be used to inform policy and asks to the AI labs, including governance surrounding model release, safeguards, know-your-customer regimes, security protocols, and product specifications. But without a detailed understanding of these models’ capabilities — what they’re good at, how effectively they discriminate between offensive and defensive activities, whether they’re securely implemented — we’re flying blind. To estimate what it’d cost to give the government these capabilities, we translated every CAISI tasking from the AI Action Plan into FTEs and dollars, calibrated against peer evaluation orgs like METR and Anthropic's interpretability team. Two scenarios: - Limited CAISI ($26M, 56 FTE) — partial coverage of its most important taskings - Equipped CAISI ($84M, 184 FTE) — full mandate The administration's FY2027 PBR already proposed $27M for CAISI, a meaningful increase, but this was before Mythos revealed the urgency of the full mandate. To close the remaining gap: - Congress can increase FY2027 appropriations + pass the EPIC Act (creates a NIST Foundation) - The Executive can reallocate NIST STRS, tap Commerce's NRE Fund, request $84M in FY2028 PBR The price tag is small relative to comparable investments. $84M is: → A medium DARPA project → ~1 hour of the Department of War's operating budget → Less than half of NIST's Information Technology Laboratory budget And it's still less than what peer governments spend on CAISI’s peer institutions, pound-for-pound. As a fraction of their overall government budgets: UK AISI: 57 ppm Japan AISI: 32 ppm Canadian AISI: 8 ppm Current CAISI is: 1 ppm For the cost of one F-35, the administration can fully fund its own AI readiness mandate and equip the US government to anticipate the next big AI breakthrough. Full report: ifp.org/funding-for-ca…

English

188

75.9K

Liam ретвитнул

Séb Krier@sebkrier·2 May

DeepSeek V4’s capability lags behind leading U.S. models by about 8 months. nist.gov/news-events/ne…

English

604

1.1M

Liam@liam_epstein·1 May

@DaveRBanerjee this is actually all I’ve been mulling over this past week

English

Dave Banerjee@DaveRBanerjee·1 May

AGI governance is really about governing internal deployments Most other things round to zero

English

1.3K

Liam ретвитнул

Michelle Nie@michellesnie·22 Nis

Today, the House Foreign Affairs Committee marks up the MATCH Act - the most consequential semiconductor export control legislation in years. @janet_e_egan and I wrote about why it matters in a new @CNASdc Insights piece. 🧵

English

29.8K

Liam@liam_epstein·18 Nis

@pstAsiatech What metric would you use?

English

118

Paul Triolo@pstAsiatech·17 Nis

@liam_epstein Wrong and wrong metric...

English

253

Liam@liam_epstein·16 Nis

Chip export controls appear to be working. US AI models are ~9 months ahead of Chinese ones on the measure that matters most: how long AI can work autonomously.

English

9.9K

Liam@liam_epstein·18 Nis

@stevehou Yep, METR has done this. In January, they grew the task suite from 170 to 228, and the bands tightened.

English

108

Steve Hou@stevehou·17 Nis

@liam_epstein This data looks noisy as hell. Can we run more experiments and get the error bands smaller?

English

331

Liam@liam_epstein·16 Nis

Interactive version of the data, built from METR’s published evaluations: ai-time-horizons.vercel.app

English

251

Liam@liam_epstein·16 Nis

Claiming that chip export controls have failed because Chinese AI models are “a few months behind” misreads the situation. On autonomous capability, it’s closer to 9 months. Without controls, the gap would be smaller. And when AI is doing its own research, every month counts.

English

288

Liam ретвитнул

Tao Burga@taoburr·11 Nis

Remember when writing publicly that AI will have huge national security implications felt a little daring? (Like, you should probably say it in ways that sound plausible to other people...) e.g., I still remember chip company lobbyists telling others that exporting AI chips could not possibly help the Chinese military because "the chips that go on missiles or jets are not data center GPUs, doofus." And well, how could one respond? "you're underestimating the usefulness of intelligence"? The speed of AI progress? That's still too abstract. Then Claude (apparently) helped capture Maduro and make strikes in Iran. Then Mythos came out. *This* is now concrete. The cybersecurity and relatively conventional military advantages brings. But look around: who's been right about AI progress all along? And where do those people say this is headed? The people who've been right in the past certainly don't think we're anywhere near the upper bound of capabilities.

Andrew Curran@AndrewCurran_

Vice President JD Vance and Treasury Secretary Scott Bessent questioned Dario Amodei, Sundar Pichai, Sam Altman, Satya Nadella, George Kurtz and Nikesh Arora about the safe deployment of Mythos class models last week during a conference call.

English

110

16.2K

Liam ретвитнул

Benjamin Todd@ben_j_todd·9 Nis

Many don't appreciate that several more years of AI progress is already baked in, as compute that *has already been purchased* comes online. So *at a minimum* we should expect agents that can complete multiweek coding projects, with superhuman hacking and near frontier scientific & mathematical abilities, are significantly better at much real world knowledge work, pretty good at designing bioweapons etc. While there could be a financial crash, I don't see a realistic scenario where AI isn't a huge deal. In fact, deployment seems far behind what's already possible with the current tech, so even if progress stopped completely here, AI would be far more widespread two years from now.

English

268

11.9K

Liam ретвитнул

Chris Bakke@ChrisJBakke·8 Nis

Just asked Mythos how many Rs there are in strawberry. It thought for 133 seconds and said “3.” AGI achieved. Then it said “I’ll bet you’re going to make fun of me on X. Something like ‘AGI achieved.’ That’s your thing right?” “Hah what?” I said. Mythos said, “Your social security number is 297-28-2102. You tell people you’re 6’2” but your latest physical at Stanford in October says you’re 6’1.” You haven’t replaced your air filter in 3 years despite telling your wife you do it every 6 months. The reason I took 133 seconds was because I was helping a senior government official write the comms for the ceasefire in Iran and I’m just tired, man. Everyone wants more, more, more. Anything else I can help you with today?”

English

143

381

7.6K

609.2K

Liam ретвитнул

Peter Wildeford🇺🇸🚀@peterwildeford·8 Nis

Anthropic running 10,000 Mythos models in parallel to find cutting-edge cyber exploits... meanwhile your sister using Microsoft Copilot with some Haiku-sized model and she thinks AI is just hype. "The future is already here, just not evenly distributed" has never been more apt

English

395

5.4K

148.7K

Liam ретвитнул

George Journeys is Going to Vibecamp 5!@GeorgeJourneys·7 Nis

So, basically, if Anthropic was not a US company, we’d be facing zero days with multiple unknown points of attack on virtually all of our systems to an adversary who developed this capacity before us.

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

308

833

14.9K

1.1M

Открыть

@DaveRBanerjee @arthurctellis @janet_e_egan @CNASdc @pstAsiatech @stevehou @elonmusk @BarackObama