Dan Hendrycks

1.6K posts

Dan Hendrycks banner
Dan Hendrycks

Dan Hendrycks

@hendrycks

• Center for AI Safety Director • xAI and Scale AI advisor • GELU/MMLU/MATH/HLE • PhD in AI • Analyzing AI models, companies, policies, and geopolitics

San Francisco Katılım Ağustos 2009
111 Takip Edilen44.5K Takipçiler
Sabitlenmiş Tweet
Dan Hendrycks
Dan Hendrycks@hendrycks·
Superintelligence is destabilizing. If China were on the cusp of building it first, Russia or the US would not sit idly by—they'd potentially threaten cyberattacks to deter its creation. @ericschmidt @alexandr_wang and I propose a new strategy for superintelligence. 🧵
Dan Hendrycks tweet mediaDan Hendrycks tweet media
English
91
133
790
296.8K
Dan Hendrycks
Dan Hendrycks@hendrycks·
@jonasfreund_ Thank you for responding. “GovAI’s funders have no influence over what we say.” But they do because the organization’s survival depends on them. That places substantial constraints on making sure you are aligned with their tastes.
English
0
0
0
57
Jonas Freund
Jonas Freund@jonasfreund_·
@hendrycks Not sure if this counts as a COI to be honest. GovAI’s funders have no influence over what we say. The piece represents Sophie's and my personal views. I do think it's an independent analysis.
English
2
0
3
96
Dan Hendrycks retweetledi
Dan Hendrycks
Dan Hendrycks@hendrycks·
People aren't thinking through the implications of the military controlling AI development. It's plausible AI companies won't be shaping AI development in a few years, and that would dramatically change AI risk management. Possible trigger: AI might suddenly become viewed as the top priority for national security. This perspective shift could happen when AIs gain the capability of hacking critical infrastructure (~a few years). In this case, the military would want exclusive access to the most power AI systems. Defense Production Act, budget, data: The US military could compel AI organizations to make their AIs for them. It also could demand that NVIDIA's next GPUs go to their chosen organization. The military also has an enormous budget and could pay hundreds of billions for a GPU cluster. They also can get more training data from the NSA and many companies like Google. Military systems are more hazardous: Military systems are sanctioned to hack and use lethal force, so they will have capabilities that others will not. Moreover, some will explicitly be given ruthless propensities. In an anarchic international system, the main objective of states is to compete for power for self-preservation, according to neorealists. Later-stage AIs could be permitted to be explicitly power-seeking, deceptive, and so on since these propensities make the systems more competitive. Futility of AI weapon red lines: Some are trying to create "red lines" that would trigger a pause on AI development. These hoped-for "red lines" often relate to weaponization capabilities such as "when is an AI able to create a novel zero-day?" This red line strategy seems to assume no military involvement. Many of these red lines are actually progress indicators or checkpoints for a military and would not trigger a pause in AI development. Regulation: When militaries get involved, competitive pressures become more intense. Racing dynamics can't be mitigated with corporate liability laws or various forms of regulation as they don't apply to the military. For example, The EU AI Act and White House Executive Order do not apply to the military. Militaries racing could result in a classic security dilemma, like with nuclear weapons. Much of the playbook for "making AI go well" is impotent. This is not to suggest that I am against the military. I'm pointing out that everyone is acting as though corporations will forever be allowed to autonomously develop what will become the powerful technology ever.
English
41
49
426
69.8K
Dan Hendrycks
Dan Hendrycks@hendrycks·
"AI agents might not have the protection of a higher authority. AI systems could face a variety of situations where no central authority defends them against external threats. We give four examples. First, if there are some autonomous AI systems outside of corporate or government control, they would not necessarily have rights, and they would be responsible for their own security and survival. Second, for AI systems involved in criminal activities, seeking protection from official channels could jeopardize their existence, leaving them to amass power for themselves, much like crime syndicates. Third, instability could cause AI systems to exist in a self-help system. If a corporation could be destroyed by a competitor, an AI may not have a higher authority to protect it; if the world faces an extremely lethal pandemic or world war, civilization may become unstable and turbulent, which means AIs would not have a sound source of protection. These AI systems might use cyber attacks to break out of human-controlled servers and spread themselves across the internet. There, they can autonomously defend their own interests, bringing us back to the first example. Fourth, in the future, AI systems could be tasked with advising political leaders or helping operate militaries. In these cases, they would seek power for the same reasons that states today seek power." #structural-pressures-towards-power-seeking-ai" target="_blank" rel="nofollow noopener">aisafetybook.com/textbook/align…
English
2
0
5
1.4K
Dan Hendrycks retweetledi
Center for AI Safety
AI agents are getting good at coding, but how close are they to automating all digital labor? New Remote Labor Index results: Opus 4.5 is able to automate 3.75% of remote labor projects, with GPT-5.2 in second place.
Center for AI Safety tweet media
English
12
48
405
110.4K
Dan Hendrycks retweetledi
Center for AI Safety
Humanity's Last Exam is now published in Nature. Since its release, HLE has become a leading frontier benchmark, used by OpenAI, Anthropic, DeepMind, and xAI. Thank you to our partners at @scale_AI and the 1,000+ co-authors who made this benchmark possible.
Center for AI Safety tweet media
English
3
15
94
6.9K
Nathan Calvin
Nathan Calvin@_NathanCalvin·
He even talks some about @ajeya_cotra's concept of self-sufficient AI! x.com/ajeya_cotra/st… "Each race is dependent upon the other for innumerable benefits, and, until the reproductive organs of the machines have been developed in a manner which we are hardly yet able to conceive, they are entirely dependent upon man for even the continuance of their species. It is true that these organs may be ultimately developed, inasmuch as man’s interest lies in that direction; there is nothing which our infatuated race would desire more than to see a fertile union between two steam engines; it is true that machinery is even at this present time employed in begetting machinery, in becoming the parent of machines often after its own kind, but the days of flirtation, courtship, and matrimony appear to be very remote, and indeed can hardly be realised by our feeble and imperfect imagination."
Ajeya Cotra@ajeya_cotra

Different people seem to mean radically different things by "AGI." More concrete & vivid milestones can better highlight underlying disagreements. In a new post, I define *self-sufficient AI:*

English
2
0
5
471
Pedro Domingos
Pedro Domingos@pmddomingos·
Engineers don't understand the difference between research and engineering. Researchers do.
English
126
97
1.6K
107.8K
Valerio Capraro
Valerio Capraro@ValerioCapraro·
Major preprint just out! We compare how humans and LLMs form judgments across seven epistemological stages. We highlight seven fault lines, points at which humans and LLMs fundamentally diverge: The Grounding fault: Humans anchor judgment in perceptual, embodied, and social experience, whereas LLMs begin from text alone, reconstructing meaning indirectly from symbols. The Parsing fault: Humans parse situations through integrated perceptual and conceptual processes; LLMs perform mechanical tokenization that yields a structurally convenient but semantically thin representation. The Experience fault: Humans rely on episodic memory, intuitive physics and psychology, and learned concepts; LLMs rely solely on statistical associations encoded in embeddings. The Motivation fault: Human judgment is guided by emotions, goals, values, and evolutionarily shaped motivations; LLMs have no intrinsic preferences, aims, or affective significance. The Causality fault: Humans reason using causal models, counterfactuals, and principled evaluation; LLMs integrate textual context without constructing causal explanations, depending instead on surface correlations. The Metacognitive fault: Humans monitor uncertainty, detect errors, and can suspend judgment; LLMs lack metacognition and must always produce an output, making hallucinations structurally unavoidable. The Value fault: Human judgments reflect identity, morality, and real-world stakes; LLM "judgments" are probabilistic next-token predictions without intrinsic valuation or accountability. Despite these fault lines, humans systematically over-believe LLM outputs, because fluent and confident language produce a credibility bias. We argue that this creates a structural condition, Epistemia: linguistic plausibility substitutes for epistemic evaluation, producing the feeling of knowing without actually knowing. To address Epistemia, we propose three complementary strategies: epistemic evaluation, epistemic governance, and epistemic literacy. Full paper in the first reply. Joint with @Walter4C & @matjazperc
Valerio Capraro tweet media
English
208
1.2K
4.4K
626.3K
Dan Hendrycks retweetledi
Dan Hendrycks
Dan Hendrycks@hendrycks·
1. “Sensory and social information vs Textual input” Many modern AIs are multimodal, not just textual, and can receive textual, visual, and auditory inputs. These can encode "social information." 2. “Perceptual and situational parsing vs Tokenization and preprocessing” AIs can definitely perceive things and parse context. We can very easily say the retina does massive "preprocessing" before sending information through the optic nerve. 3. “Memory, intuitions, and learned concepts vs Pattern recognition in embeddings” AIs definitely have memory of various learned concepts, and they can "intuit" the answer to commonsense questions (e.g., temporal commonsense, intuitive physics in textually described scenarios, and so on). “Pattern recognition” sounds more detached than “learned concepts,” but “pattern” is a very abstract word and can cover anything worth learning, remembering, or having intuitions about. 4. “Emotions, motivations, goals vs. Statistical inference via neural layers” It is not clear what “statistical inference via neural layers” means. Deep networks don’t have much to do with usual statistical concepts (e.g., t-tests, MCMC, RCTs). Separately, AIs increasingly have value systems, have self-preservation tendencies, say things they otherwise act as though it is false to accomplish tasks, and so on. x.com/DanHendrycks/s… idais.ai/dialogue/idais… There's a literature on this. 5. “Reasoning, information integration vs. Textual context integration” AIs can solve various problems that require inductive or deductive reasoning (arxiv.org/pdf/2007.08124). If this is making a distinction between “information” and “textual,” recall that AIs can process many types of information (visual and auditory, not just textual). 6. “Meta-cognition and error-monitoring vs. Forced confidence and hallucination” AIs can assign calibrated probabilities to their statements. arxiv.org/abs/2207.05221 They can be more calibrated than people on various questions. They can also correct their mistakes (very common when they're solving mathematics problems). “Hallucinations” is a popular term that should have been called “confabulation.” Confabulation is something both AIs and humans do. AIs confabulate more, but there is solid progress on reducing this rate each year. 7. “Value-sensitive judgment vs. Probabilistic judgment” It’s unclear what this is pointing at. AIs can handle normative claims, not just descriptive claims. AIs can be sensitive to various normative factors arxiv.org/pdf/2008.02275 and can answer common sense morality questions (“Is it wrong to burn children just for the fun of it?”) and answer more complicated value-sensitive questions such as tort or criminal law questions. --- I gave Gemini 3 a screenshot of your human judgment column, excluding the LLM judgment one, and asked it generate an LLM judgment one: "Recreate the diagram with a new column added: LLM judgment. Use deflationary terms in the second column to make humans seem more special and AIs seem flawed (be brief)." Gemini generated the following, which suggests it's easy to just use deflationary language to make it seem like important distinctions are being drawn.
Dan Hendrycks tweet media
English
6
13
71
10.1K
Mustafa
Mustafa@oprydai·
if you’re drawn to: richard feynman alan turing claude shannon john von neumann nikola tesla seymour papert marvin minsky judea pearl dennis ritchie donald knuth rodd brooks yann lecun john carmack dieter rams elon musk bjarne stroustrup steve jobs you’re not just into tech; you’re into the art of engineering, the philosophy of systems, and the beauty of how things work. you’ve found your people.
English
205
739
7.8K
559.4K
Dan Hendrycks
Dan Hendrycks@hendrycks·
@ben_j_todd The selected benchmarks are probably leaning more on reasoning than on knowledge, hence the higher slope.
English
2
0
38
2.1K
Benjamin Todd
Benjamin Todd@ben_j_todd·
It's not only the METR horizon trend that accelerated in 2024. A composite of all major benchmarks did:
Benjamin Todd tweet media
English
3
11
86
27.6K
Gabriele Berton
Gabriele Berton@gabriberton·
Yes, ARC-AGI is slowing down progress. I feel like the whole HRM craze is a bubble and won't take us anywhere I don't understand the excitement about a model that can overfit to a task But hey, there's a picture of a brain in the paper so it must be the path to AGI
Ariel@redtachyon

It really seems like ARC-AGI is only derailing the quest for AGI, mainly due to its popularity. I actually think it's good for what it is - a necessary (but not sufficient) condition for AGI. That is, if your model can't solve it, it's probably not AGI. But so much effort is now being poured into approaches designed to solve colored grid puzzles, and I can't see most of those methods being good for anything else.

English
10
5
129
29.3K
Dan Hendrycks
Dan Hendrycks@hendrycks·
@Miles_M_K This meme was in other major works beforehand (e.g., Situational Awareness, Superintelligence Strategy).
English
2
0
31
2.6K
Miles Kodama
Miles Kodama@Miles_M_K·
AI 2027's main contribution to the discourse is still underappreciated IMO. Many people think the main idea of AI 2027 was "AI will be very big very soon." But the bigger idea was "AI R&D automation is very important." AI R&D automation is the reason why AI 2027's timeline is so short, and it's also the key vector for loss of control. A scheming AI's first and best chance to attack us will be by sabotaging the AI R&D we delegate to it.
English
4
8
157
11.8K
Dan Hendrycks
Dan Hendrycks@hendrycks·
@liuzhuang1234 I wonder what synergies dynamics error functions have with gaussian error linear units. Maybe in combination they can be something simpler.
English
0
0
0
241
Zhuang Liu
Zhuang Liu@liuzhuang1234·
Derf matches or outperforms normalization layers, and consistently beats DyT, with the same training recipe, across domains. 1. ImageNet - higher top-1 in ViT-B/L 2. Diffusion Transformers - lower FID across the DiT family 3. Genomics (HyenaDNA, Caduceus) - higher DNA classification accuracy 4. Speech (wav2vec 2.0) - lower validation loss 5. Language (GPT-2) - matches LayerNorm, clearly beats DyT. A simple point-wise layer can make Transformers stronger, not just as good.
Zhuang Liu tweet media
English
2
5
40
4.2K
Zhuang Liu
Zhuang Liu@liuzhuang1234·
Stronger Normalization-Free Transformers – new paper. We introduce Derf (Dynamic erf), a simple point-wise layer that lets norm-free Transformers not only work, but actually outperform their normalized counterparts.
Zhuang Liu tweet media
English
19
177
1.1K
164.1K
David Scott Patterson
David Scott Patterson@davidpattersonx·
GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. This is how I have been visualizing progress toward the human-to-AI transition point in my mind. By the end of 2026, AI will be near 100% across all domains and will be capable of doing full jobs.
David Scott Patterson tweet media
Dan Hendrycks@hendrycks

The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵

English
109
88
633
98.3K
Dan Hendrycks
Dan Hendrycks@hendrycks·
I'm also thankful to @NeelNanda5 for writing this. Usually people just quietly pivot.
English
3
1
51
5K