Arvind Narayanan

13.1K posts

Arvind Narayanan banner
Arvind Narayanan

Arvind Narayanan

@random_walker

Princeton CS prof and Director @PrincetonCITP. Coauthor of "AI Snake Oil" and "AI as Normal Technology". https://t.co/ZwebetjZ4n Views mine.

Princeton, NJ 参加日 Aralık 2007
526 フォロー中126.5K フォロワー
固定されたツイート
Arvind Narayanan
Arvind Narayanan@random_walker·
If a fact or chart is surprising, it might be because it’s new information, or it might be something deeper — a sign that our mental model is wrong. Anthropic’s economic gap chart is the latter. anthropic.com/research/labor… A big source of confusion in AI discourse is not recognizing that the speed of adoption follows its own logic that’s far slower than the speed of capability progress. I’m biased but I think AI as Normal Technology is still the best exposition of the many different speed limits to diffusion. Once we internalize this, the gap shown in the chart is what we should expect. How does this square with the “AI is the most rapidly adopted technology” narrative and all the graphs that are frequently shared to push that view? Unfortunately they lump together too many kinds of “AI use” to really tell us anything meaningful. On the one hand there are many marginal uses of AI (such as using chatbots instead of traditional search) that are being quickly adopted. But what will make a true economic impact are deeper changes to workflows that incorporate verification and accountability, manage the risk of deskilling, and are accompanied by organizational changes that take advantage of productivity improvements. Those changes happen at human timescales and are barely getting started. And that’s not even accounting for regulatory barriers. Finally, I’m also not sure how credible the “theoretical capability” estimates are. In particular, I don’t think they account for the capability-reliability gap, for which the AI community didn’t even have measurements until our work two weeks ago normaltech.ai/p/new-paper-to…
Arvind Narayanan tweet media
English
15
30
163
27.5K
Arvind Narayanan がリツイート
Stephan Rabanser
Stephan Rabanser@steverab·
In our paper "Towards a Science of AI Agent Reliability" we put numbers on the capability-reliability gap. Now we're showing what's behind them! We conducted an extensive analysis of failures on GAIA across Claude Opus 4.5, Gemini 2.5 Pro, and GPT 5.4. Here's what we found ⬇️
Stephan Rabanser tweet media
English
9
35
149
32.6K
Arvind Narayanan がリツイート
Justin Curl
Justin Curl@curl_justin·
State lawmakers introduced over 1,200 AI bills in 2025. They cover everything from deepfakes to autonomous weapons—but they're all just lumped together as "AI policy." @ARozenshtein and I wrote an article that breaks down the policy landscape along three dimensions: (1) what harm are you addressing, (2) what are the factors shaping how you should design your policy intervention, and (3) which actors in the ecosystem should you target? The diagram below, for example, maps the AI ecosystem from chip manufacturers to end users.
Justin Curl tweet media
English
4
15
39
6.8K
Arvind Narayanan がリツイート
AI Security Institute
AI Security Institute@AISecurityInst·
Can AI agents conduct advanced cyber-attacks autonomously? We tested seven models released between August 2024 and February 2026 on two custom-built cyber ranges designed to replicate complex attack environments. Here’s what we found🧵
AI Security Institute tweet media
English
16
88
388
95.6K
Arvind Narayanan がリツイート
Hunter📈🌈📊
Hunter📈🌈📊@StatisticUrban·
His predictions weren't "premature." They were just wrong. They didn't happen, and they never will.
Hunter📈🌈📊 tweet media
English
62
348
5.7K
244.3K
Arvind Narayanan がリツイート
Arvind Narayanan がリツイート
John Arnold
John Arnold@johnarnold·
The Atlantic has a sobering, first-person look at the ramifications of legalized online sports betting. Here are a few of the more telling passages. 1/5
John Arnold tweet media
English
237
1.7K
15K
1.4M
Arvind Narayanan がリツイート
Princeton University
Princeton University@Princeton·
Through various initiatives, @PrincetonSPIA is informing lawmakers about the latest research on AI, and educating current and future public servants about policy challenges and innovation opportunities. bit.ly/4sJT6zR
English
4
9
18
4.3K
Arvind Narayanan
Arvind Narayanan@random_walker·
At first glance this is a totally reasonable perspective. Training PhD students is a duty! But consider this — *effectively* advising a PhD student over a 5-year period is well over 1,000 hours of work, not to mention bringing in hundreds of thousands of dollars in grants. Professors will do some things for mostly altruistic reasons (peer review) but the time commitment for advising is not something that's reasonable to ask of someone without some form of compensation. So there are two options. One is to make advising a job requirement. Unfortunately this doesn't work, because the *quality* of advising is unobservable and can't be quantified by metrics, leading to a race to the bottom. The other option is the current system — advising helps advance the professor's research agenda because PhD students do most of the work, so they take on students voluntarily. Which means it's important to ask if this subtle alignment of incentives will continue despite advancing AI capabilities. Academia has many such "subtle alignments of incentives" that the system relies on in order to function — rarely articulated, poorly understood, and fragile. Maybe the advisor-advisee relationship in CS will survive the AI transition, as @sayashk predicts, but many processes and structures will surely break. Best to rethink the system now, before it's too late.
Alison | AlisonBob.eth@AlisonbobEth

@sayashk @random_walker They only have PhD students to do work? I would have thought that training successors, would be important in of itself 🫠

English
14
15
165
45.4K
Arvind Narayanan がリツイート
Sayash Kapoor
Sayash Kapoor@sayashk·
In the last few months, I've spoken to many CS professors who asked me if we even need CS PhD students anymore. Now that we have coding agents, can't professors work directly with agents? My view is that equipping PhD students with coding agents will allow them to do work that is orders of magnitude more impressive than they otherwise could. And they can be *accountable* for their outcomes in a way agents can't (yet). For example, who checks the agent's outputs are correct? Who is responsible for mistakes or errors?
English
58
39
520
471K
Arvind Narayanan がリツイート
Andy Masley
Andy Masley@AndyMasley·
Each frontier AI model seems to use a little under a year's worth of a square mile of farmland's water to train. I think about this as the country having 4 square miles of farmland sectioned off to grow some of the most popular consumer products in history.
Andy Masley tweet media
English
214
480
8.2K
595.9K
Arvind Narayanan
Arvind Narayanan@random_walker·
AI isn't replacing programmers, but it *is* making it harder to survive as a programmer with purely technical skills and no interest or expertise in how those skills translate to business or societal value. Funny thing is, this has always been true—it's just being accelerated a bit due to AI. There's a famous essay by @patio11 from 15 years ago called "Don't Call Yourself A Programmer, And Other Career Advice". kalzumeus.com/2011/10/28/don…
Arvind Narayanan tweet media
English
20
47
259
20.3K
Arvind Narayanan
Arvind Narayanan@random_walker·
📢 Excited to announce that we're doing the AI Policy Precepts in DC again! Open to all federal employees. Interactive roundtable discussions between federal officials/advisors and many of Princeton's leading AI policy experts including me (Sayash Kapoor, Mihir Kshirsagar, Andrés Monroy-Hernández, Arvind Narayanan, Miranda Wei). Apply by March 20. Offered by @PrincetonSPIADC and @PrincetonCITP. Details and application: mailchi.mp/princeton.edu/…
Arvind Narayanan tweet media
English
1
8
18
5K
Arvind Narayanan
Arvind Narayanan@random_walker·
"Metric saturation" is a long-overdue concept. If the whole eval community focuses on a single metric, we kneecap our ability to understand the real-world impacts of AI progress.
Sayash Kapoor@sayashk

Hey @METR_Evals—love your work, but we think it's the *metric* that's saturated, not the task suite. For example, despite rapid gains in accuracy, we found limited gains in reliability. We'd love to work together to see if this holds up on the time-horizon task suite.

English
1
3
36
6.5K
Arvind Narayanan がリツイート
Sayash Kapoor
Sayash Kapoor@sayashk·
Hey @METR_Evals—love your work, but we think it's the *metric* that's saturated, not the task suite. For example, despite rapid gains in accuracy, we found limited gains in reliability. We'd love to work together to see if this holds up on the time-horizon task suite.
Sayash Kapoor tweet media
METR@METR_Evals

We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.

English
7
6
84
19.2K
Arvind Narayanan がリツイート
Peter Henderson
Peter Henderson@PeterHndrsn·
I’m really excited about our new paper! I think we will ultimately need to draw on expertise from both law and AI to get alignment right, and this paper lays out that vision in more detail. As an aside, my PhD thesis was titled ‘Aligning law, policy, and machine learning for responsible real-world deployments’ for a reason. I think this is a very important area, and I’m excited to see so many excellent researchers working together to move it forward.
Peter Henderson tweet media
English
5
15
124
11.6K
Arvind Narayanan がリツイート
Kelsey Piper
Kelsey Piper@KelseyTuoc·
Understand this: Waymo in DC is not being delayed because the City Council wants a study. Instead, the City Council is asking for a study because they want to delay Waymo in DC.
English
14
89
1.5K
66K
Arvind Narayanan がリツイート
Arpit Gupta
Arpit Gupta@arpitrage·
This is why I believe AI will be a “normal technology” — despite rapid scaling laws for specific technical benchmarks, real world usefulness and effectiveness are going to lag behind a lot
Joel Becker@joel_bkr

new @METR_Evals research note from @whitfill_parker, @cherylwoooo, nate rush, and me. (chiefly parker!) we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.

English
5
12
121
27.5K
Arvind Narayanan
Arvind Narayanan@random_walker·
Efforts to improve the security of AI agents should recognize that many security failures occur even in the absence of adversaries. The unreliability issue has largely flown under the radar and there hasn't been much work on defining, measuring, or mitigating the problem. More on this in our response to NIST's request for information on AI Agent Security, by @steverab, @sayashk, @PKirgis, @CitpMihir, and me: sage.cs.princeton.edu/documents/RFC_… This is based on our recent paper: normaltech.ai/p/new-paper-to…
Arvind Narayanan tweet media
English
4
15
55
7.8K
Arvind Narayanan がリツイート
Arvind Narayanan
Arvind Narayanan@random_walker·
Is the rise of coding agents surprising or consistent with our predictions? Thanks for the question, @_NathanCalvin. x.com/_NathanCalvin/… The answer is: Both surprising and consistent. AI as Normal Technology (AINT) doesn't give us a way to predict the timing of specific capability advances, and we haven't tried to do that. But when it comes to understanding why coding agents work so well and what their impacts are likely to be, AINT is extremely helpful (and its predictions are consistent with what we observe so far). 1. Products, not just models. One key prediction is that model capability advances are generally not useful by themselves; building products is still necessary in order to meet people where they are, instead of forcing people to contort their workflows to fit the affordances of raw LLMs. That's exactly what we see with Claude Code and other agents. If we try to understand the success of coding agents as the result of model capability leaps, it doesn't make sense. Rather, coding agents have dozens if not hundreds of features, both big (like memory) and small (like rewinding or interruptability) that allow software engineers to integrate them into workflows. 2. Early adoption. Despite everything we hear on X, we're still in the early adoption phase. The median programmer (keep in mind that they work in a regulated industry like finance or healthcare) has barely heard of coding agents and is not yet using them in any serious way. 3. The speed of diffusion. As I've written before, the software industry has uniquely low diffusion barriers and programmers have a long history of embracing productivity improvements to continually migrate up the abstraction chain (machine code -> assembly -> compiled languages -> high-level languages -> frameworks -> AI-assisted programming). Because of this, software has "has never had time or the cultural inclination to ossify institutional processes around particular ways of doing things." I highly doubt that we are going to see the same speed of diffusion in other sectors. For example, see our analysis of AI in legal services here lawfaremedia.org/article/ai-won… 4. Labor market impacts. AINT predicted that in most cognitive jobs the result of AI adoption won't be replacing humans but shifting the role of humans to supervising AI systems. Of course we were hardly alone in making that prediction but it's good to see that this is what is happening in software. There's also the fact that in most white-collar jobs, if it gets cheaper to produce a unit of work, we will simply produce more of it — orders of magnitude more in the case of software (related to "Jevons paradox"). This is another factor that mitigates job loss risks.
Arvind Narayanan tweet media
Nathan Calvin@_NathanCalvin

Generally seems reasonable and appreciate your contributions here. One question - has the speed and purported efficacy of AI coding agent adoption surprised you? Or does it feel consistent with predictions you would have made from the AI as a normal technology worldview?

English
2
8
37
8.6K