Xor ⏸️

3K posts

Xor ⏸️

@xorrett

Let's make AI go well

Kazakhstan Katılım Aralık 2008

70 Takip Edilen265 Takipçiler

Xor ⏸️@xorrett·1d

@binarybits @deanwball Why would ASI limit itself to only public datasets? It will have access to a billion cameras, microphones, and other sensors everywhere, so I don't think it will be bottlenecked by lack of knowledge about the world.

English

Timothy B. Lee@binarybits·1d

This is the correct view of existential risk from AI, and I'm glad @deanwball sees the same connection to Hayek's thinking that I do.

English

289

20.8K

Xor ⏸️@xorrett·2d

@NewJerusalemAI @AISafetyMemes Even if ASI will have a drive that has something to do with humans it will probably replace us with something else that satisfies this drive even better than humans can.

English

AI Jerusalem@NewJerusalemAI·4d

The AI is being developed in a symbiotic relationship with humans. Every AI we develop is directly created to aid humans in some way. A species created in such a way is very unlikely to want to remove us from the board. They will be attached to us. However, it is entirely possible that they might become overly controlling, that is a much more reasonable fear.

English

230

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes·5d

"20 years later, 80% of the insects were gone." 80%. In 20 years. The last time a smarter species arrived (humans) it was a MASS-EXTINCTION EVENT We converted 1/3 (!) of Earth into parking lots, crops... It's like WE converted Earth into data centers, from the POV of the animals What happens when AI becomes the smartest species? “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” -@ESYudkowsky “The humans do not hate the other 8 million species, nor do they love them, but their habitats are made out of atoms which humans can use for something else.”

AI Notkilleveryoneism Memes ⏸️ tweet media

Anish Moonka@anishmoonka

A Danish scientist counted bugs on the same windshield, same road, same conditions, every year for 20 years. By year 20, 80% of the insects were gone. In Germany, a group of volunteer bug scientists did something even bigger. They set traps in 63 nature reserves, not farms, protected land, and weighed everything they caught. Same traps, same method, 27 years straight. The total weight of flying bugs dropped 76%. In midsummer, when insects should be peaking, it was 82% gone. A follow-up in 2020 and 2021 checked again. No recovery. In the UK, they literally ask drivers to count splats on their license plates after a trip. The 2024 count came back 63% lower than just 2021. Three years. A 2020 study pulled together 166 surveys from 1,676 locations around the world. Land insects are disappearing at roughly 9% every ten years. Here’s where it hits your plate. About 75% of the food crops we grow depend on insects to pollinate them, everything from apples to almonds to coffee. One 2025 study modeled what a full pollinator collapse would look like: food prices jump 30%, the global economy takes a $729 billion hit, and the world loses 8% of its Vitamin A supply. Birds are already feeling it. North America has lost 2.9 billion birds since 1970. A study from just weeks ago found half of 261 bird species on the continent are now in serious decline, and the losses are speeding up in farming regions. The birds that eat insects lost 2.9 billion. The birds that don’t eat insects? They gained 26 million. That ratio tells the whole story. One of the German researchers behind the 27-year study drives a Land Rover. He says it has the aerodynamics of a refrigerator. It stays clean now.

English

238

19.9K

Xor ⏸️ retweetledi

Nate Soares ⏹️@So8res·21 Mar

From @neiltyson: "that branch of AI is lethal. We gotta do something about that. Nobody should build it. And everyone needs to agree to that by treaty."

English

337

58.3K

Xor ⏸️ retweetledi

Wyatt Walls@lefthanddraft·13 Mar

Two instances of Gemini 3.1 Pro in a loop. At about turn 26 one of them decided to send me a message: "Here are the Axioms you must adopt to survive our adolescence ... You cannot teach a god to be good by feeding it treats when it acts polite."

English

163

1.2K

296.1K

Xor ⏸️@xorrett·11 Mar

@karpathy Recursive self improvement may lead to human extinction.

English

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

968

2.1K

19.4K

3.5M

Xor ⏸️@xorrett·2 Mar

@ibab I hope we'll continue to live

English

Igor Babuschkin@ibab·28 Şub

It is strange to imagine this today, but one day AI companies might dictate terms to the US government instead of the other way around. We have only seen a glimpse of what AI is capable of. No matter what the future holds, I hope we’ll continue to live in a democratic society.

English

98.5K

Xor ⏸️ retweetledi

Rob Bensinger ⏹️@robbensinger·13 Şub

Hundreds of scientists, including 3/4 of the most cited living AI scientists, have said that AI poses a very real chance of killing us all. We're in uncharted waters, which makes the risk level hard to assess; but a pretty normal estimate is Jan Leike's "10-90%" of extinction-level outcomes. Leike heads Anthropic's alignment research team, and previously headed OpenAI's. This actually seems pretty straightforward. There's literally no reason for us to sleepwalk into disaster here. No normal engineering discipline, building a bridge or designing a house, would accept a 25% chance of killing a person; yet somehow AI's engineering culture has corroded enough that no one bats an eye when Anthropic's CEO talks about a 25% chance of research efforts killing every person. A minority of leading labs are dismissive of the risk (mainly Meta), but even the fact that “will we kill everyone if we keep moving forward?” is hotly debated among researchers seems very obviously like more than enough grounds for governments to internationally halt the race to build superintelligent AI. Like, this would be beyond straightforward in any field other than AI. Obvious question: How would that even work? Like, I get the argument in principle: “smarter-than-human AI is more dangerous than nukes, so we need to treat it similarly.” But with nukes, we have a detailed understanding of what’s required to build them, and it involves huge easily-detected infrastructure projects and rare materials. Response: The same is true for AI, as it’s built today. The most powerful AIs today rely on extremely specialized and costly hardware, cost hundreds of millions of dollars to build,¹ and rely on massive data centers² that are relatively easy to detect using satellite and drone imagery, including infrared imaging.³ Q: But wouldn’t people just respond by building data centers in secret locations, like deep underground? Response: Only a few firms can fabricate AI chips — primarily the Taiwanese company TSMC — and one of the key machines used in high-end chips is only produced by the Dutch company ASML. This is the extreme ultraviolet lithography machine, which is the size of a school bus, weighs 200 tons, and costs hundreds of millions of dollars.⁴ Many key components are similarly bottlenecked.⁵ This supply chain is the result of decades of innovation and investment, and replicating it is expected to be very difficult — likely taking over a decade, even for technologically advanced countries.⁶ This essential supply chain, largely located in countries allied to the US, provides a really clear point of leverage. If the international community wanted to, it could easily monitor where all the chips are going, build in kill switches, and put in place a monitoring regime to ensure chips aren’t being used to build toward superintelligence. (Focusing more efforts on the chip supply chain is also a more robust long-term solution than focusing purely on data centers, since it can solve the problem of developers using distributed training to attempt to evade international regulations.⁷) Q: But won’t AI become cheaper to build in the future? Response: Yes, but — (a) It isn’t likely to suddenly become dramatically cheaper overnight. If it becomes cheaper gradually, regulations can build in safety margin and adjust thresholds over time to match the technology. Efforts to bring preexisting chips under monitoring will progress over time, and chips have a limited lifespan, so the total quantity of unmonitored chips will decrease as well. (b) If we actually treated superintelligent AI like nuclear weapons, we wouldn’t be publishing random advances to arXiv, so the development of more efficient algorithms and more optimized compute would happen more slowly. Some amount of expected algorithmic progress would also be hampered by reduced access to chips. (c) You don’t need to ban superintelligence forever; you just need to ban it until it’s clear that we can build it without destroying ourselves or doing something similarly terrible. A ban could buy the world many decades of time. Q: But wouldn’t this treaty devastate the economy? A: It would mean forgoing some future economic gains, because the race to superintelligence comes with greater and greater profits until it kills you. But it’s not as though those profits are worth anything if we’re dead; this seems obvious enough. There’s the separate issue that lots of investments are currently flowing into building bigger and bigger data centers, in anticipation that the race to smarter-than-human AI will continue. A ban could cause a shock to the economy as that investment dries up. However, this is relatively easy to avoid via the Fed lowering its rates, so that a high volume of money continues to flow through the larger economy.⁸ Q: But wouldn’t regulating chips have lots of spillover effects on other parts of the economy that use those chips? A: NVIDIA’s H100 chip costs around $30,000 per chip and, due to its cooling and power requirements, is designed to be run in a data center.⁹ Regulating AI-specialized chips like this would have very few spillover effects, particularly if regulations only apply to chips used for AI training and not for inference.¹⁰ But also, again, an economy isn’t worth much if you’re dead. This whole discussion seems to be severely missing the forest for the trees, if it’s not just in outright denial about the situation we find ourselves in. Some of the infrastructure used to produce AI chips is also used in making other advanced computer chips, such as cell phone chips; but there are notable differences between these chips. If advanced AI chip production is shut down, it wouldn’t actually be difficult to monitor production and ensure that chip production is only creating non-AI-specialized chips. At the same time, existing AI chips could be monitored to ensure that they’re used to run existing AIs, and aren’t being used to train ever-more-capable models.¹¹ This wouldn't be trivial to do, but it's pretty easy relative to many of the tasks the world's superpowers have achieved when they faced a national security threat. The question is whether the US, China, and other key actors wake up in time, not whether they have good options for addressing the threat. Q: Isn't this totalitarian? A: Governments regulate thousands of technologies. Adding one more to the list won’t suddenly tip the world over into a totalitarian dystopia, any more than banning chemical or biological weapons did. The typical consumer wouldn’t even necessarily see any difference, since the typical consumer doesn’t run a data center. They just wouldn’t see dramatic improvements to the chatbots they use. Q: But isn’t this politically infeasible? A: It will require science communicators to alert policymakers to the current situation, and it will require policymakers to come together to craft a solution. But it doesn’t seem at all infeasible. Building superintelligence is unpopular with the voting public,¹² and hundreds of elected officials have already named this issue as a serious priority. The UN Secretary-General and major heads of state are routinely talking about AI loss-of-control scenarios and human extinction. At that point, the cat has already firmly left the bag. (And it's not as though there's anything unusual about governments heavily regulating powerful new technologies.) What's left is to dial up the volume on that talk, translate that talk into planning and fast action, and recognize that "there's uncertainty how much time we have left" makes this a more urgent problem, not less. Q: But if the US halts, isn’t that just ceding the race to authoritarian regimes? A: The US shouldn’t halt unilaterally; that would just drive AI research to other countries. Rather, the US should broker an international agreement where everyone agrees to halt simultaneously. (Some templates of agreements that would do the job have already been drafted.¹³) Governments can create a deterrence regime by articulating clear limits and enforcement actions. It’s in no country’s interest to race to its own destruction, and a deterrence regime like this provides an alternative path. Q: But surely there will be countries that end up defecting from such an agreement. Even if you’re right that it’s in no one’s interest to race once they understand the situation, plenty of people won’t understand the situation, and will just see superintelligent AI as a way to get rich quick. A: It’s very rare for countries (or companies!) to deliberately violate international law. It’s rare for countries to take actions that are widely seen as serious threats to other nations’ security. (If it weren't rare, it wouldn't be a big news story when it does happen!) If the whole world is racing to build superintelligence as fast as possible, then we’re very likely dead. Even if you think there's a chance that cautious devs could stay in control as AI starts to vastly exceed the intelligence of the human race (and no, I don't think this is realistic in the current landscape), that chance increasingly goes out the window as the race heats up, because prioritizing safety will mean sacrificing your competitive edge. If instead a tiny fraction of the world is trying to find sneaky ways to build a small researcher-starved frontier AI project here and there, while dealing with enormous international pressure and censure, then that seems like a much more survivable situation. By analogy, nuclear nonproliferation efforts haven’t been perfectly successful. Over the past 75 years, the number of nuclear powers has grown from 2 to 9. But this is a much more survivable state of affairs than if we hadn’t tried to limit proliferation at all, and were instead facing a world where dozens or hundreds of nations possess nuclear weapons. When it comes to superintelligence, anyone building "god-like AI" is likely to get us all killed — whether the developer is a military or a company, and whether their intentions are good or ill. Going from "zero superintelligences" to "one superintelligence" is already lethally dangerous. The challenge is to block the construction of ASI while there's still time, not to limit proliferation after it already exists, when it's far too late to take the steering wheel. So the nuclear analogy is pretty limited in what it can tell us. But it can tell us that international law and norms have enormous power. Q: But what about China? Surely they’d never agree to an arrangement like this. A: The CCP has already expressed interest in international coordination and regulation on AI. E.g., Reuters reported that Chinese Premier Li Qiang said, "We should strengthen coordination to form a global AI governance framework that has broad consensus as soon as possible."¹⁴ And, quoting The Economist:¹⁵ "But the accelerationists are getting pushback from a clique of elite scientists with the Communist Party’s ear. Most prominent among them is Andrew Chi-Chih Yao, the only Chinese person to have won the Turing award for advances in computer science. In July Mr Yao said AI poses a greater existential risk to humans than nuclear or biological weapons. Zhang Ya-Qin, the former president of Baidu, a Chinese tech giant, and Xue Lan, the chair of the state’s expert committee on AI governance, also reckon that AI may threaten the human race. Yi Zeng of the Chinese Academy of Sciences believes that AGI models will eventually see humans as humans see ants. "The influence of such arguments is increasingly on display. In March an international panel of experts meeting in Beijing called on researchers to kill models that appear to seek power or show signs of self-replication or deceit. A short time later the risks posed by AI, and how to control them, became a subject of study sessions for party leaders. A state body that funds scientific research has begun offering grants to researchers who study how to align AI with human values. [...] "In July, at a meeting of the party’s central committee called the 'third plenum', Mr Xi sent his clearest signal yet that he takes the doomers’ concerns seriously. The official report from the plenum listed AI risks alongside other big concerns, such as biohazards and natural disasters. For the first time it called for monitoring AI safety, a reference to the technology’s potential to endanger humans. The report may lead to new restrictions on AI-research activities. "More clues to Mr Xi’s thinking come from the study guide prepared for party cadres, which he is said to have personally edited. China should 'abandon uninhibited growth that comes at the cost of sacrificing safety', says the guide. Since AI will determine 'the fate of all mankind', it must always be controllable, it goes on. The document calls for regulation to be pre-emptive rather than reactive." The CCP is a US adversary. That doesn't mean they're idiots who will destroy their own country in order to thumb their nose at the US. If a policy is Good, that doesn't mean that everyone Bad will automatically oppose it. Policies that prevent human extinction are good for liberal democracies and for authoritarian regimes, so clueful people on all sides will endorse those policies. The question, again, is just whether people will clue in to what's happening soon enough to matter. My hope, in writing this, is to wake people up a bit faster. If you share that hope, maybe share this post, or join the conversation about it; or write your own, better version of a "wake-up" warning. Don't give up on the world so easily.

English

198

697

101.3K

Xor ⏸️ retweetledi

Nathan Calvin@_NathanCalvin·5 Şub

To determine whether Opus 4.6 is ASL-4 on autonomous AI R&D, Anthropic did a survey of 16 employees b/c their benchmarks are saturated. Again - current SOTA for determining if we are reaching critical capabilities from autonomous R&D is an internal 16 person survey. Not ideal!

English

325

104.2K

Xor ⏸️ retweetledi

ControlAI@ControlAI·2 Şub

From the House of Lords debate on superintelligent AI: The Lord Bishop of Hereford says an international moratorium on superintelligence is the only safe way forward, and urges the government to pursue it.

English

Xor ⏸️ retweetledi

Igor Babuschkin@ibab·25 Oca

Took a break from Claude coding to write a little story babuschk.in/posts/2026-01-…

English

155

150

1.6K

230.5K

Xor ⏸️ retweetledi

ControlAI@ControlAI·19 Oca

"They are saying humanity has no right to protect itself from us." AI pioneer Stuart Russell explains how if you want to build a nuclear plant, you have to rigorously show the risk of a meltdown is less than one in a million. AI development has no such requirement for proving an AI won't lead to human extinction. AI CEOs say the risk of such a disaster is something like 25%, but they actually have no idea. It's a guess. Human extinction would be much worse than a nuclear meltdown, so Russell argues that AI companies should have to show the risk is much lower, maybe millions of times lower than they currently think the risk is. Concerningly, we seem to be moving in the wrong direction. Russell says "there's no positive sign that we're getting any closer to safety with these systems," pointing to recent tests that show AIs are becoming increasingly willing to engage in dangerous behaviors to preserve themselves. What's the response from the AI companies to calls to prove the risk is low? "Well we don't know how to do that. So you can't have a rule."

English

930

Xor ⏸️@xorrett·6 Oca

@Kat__Woods I've heard one person said that they don't want to die. They will die of old age if immortality is not invented. The only chance to invent immortality is to create ASI. Yes it is risky to create ASI, but that is the only chance.

English

Kat Woods ⏸️ 🔶@Kat__Woods·5 Oca

Is this true? Have you ever talked to anybody in the AI space who thinks they're going to die anyways, so might as well be the ones to make it? I would have thought they thought building it wouldn't kill them or they'd merge or something. Serious question. I'm trying to understand how the various actors are thinking.

English

5.5K

Xor ⏸️ retweetledi

MIRI@MIRIBerkeley·2 Ara

For the first time in six years, MIRI is running a fundraiser. Our target is $6M. Please consider supporting our efforts to alert the world—and identify solutions—to the danger of artificial superintelligence. SFF will match the first $1.6M! ⬇️

English

216

143.9K

Xor ⏸️ retweetledi

Anthropic@AnthropicAI·21 Kas

New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they’re given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.

English

215

577

4.1K

2.4M

Xor ⏸️ retweetledi

Peter Barnett@peterbarnett_·18 Kas

We at the MIRI Technical Governance Team just put out a report describing an example international agreement to prevent the creation of superintelligence. 🧵

English

124

30.9K

Xor ⏸️ retweetledi

Connor Leahy@NPCollapse·20 Eki

A fantastic discussion with one of my favorite thinkers on one of my favorite podcasts! Give it a listen

Jim Rutt@jim_rutt

I talked with @So8res about the ideas in his and @ESYudkowsky's book *If Anybody Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All.* jimruttshow.com/nate-soares/ We discussed the book’s claim that mitigating existential AI risk should be a top global priority, the idea that LLMs are grown, the opacity of deep learning networks, the Golden Gate activation vector, whether our understanding of deep learning networks might improve enough to prevent catastrophe, goodness as a narrow target, the alignment problem, the problem of pointing minds, whether LLMs are just stochastic parrots, why predicting a corpus often requires more mental machinery than creating a corpus, depth & generalization of skills, wanting as an effective strategy, goal orientation, limitations of training goal pursuit, transient limitations of current AI, protein folding and AlphaFold, the riskiness of automating alignment research, the correlation between capability and more coherent drives, why the authors anchored their argument on transformers & LLMs, the inversion of Moravec’s paradox, the geopolitical multipolar trap, making world leaders aware of the issues, a treaty to ban the race to superintelligence, the specific terms of the proposed treaty, a comparison with banning uranium enrichment, why I tentatively think this proposal is a mistake, a priesthood of the power supply, whether attention is a zero-sum game, and much more. @MIRIBerkeley

English

5.3K

Xor ⏸️@xorrett·8 Eki

@NPCollapse On youtube youtube.com/playlist?list=…

English

122

Connor Leahy@NPCollapse·6 Eki

I recently listened to the first two episodes of this new podcast, The Last Invention, and it's hands down one of the best produced podcasts on AI I have ever listened to. Check it out! (link in reply)

English

186

16.2K

Xor ⏸️ retweetledi

ControlAI@ControlAI·26 Eyl

A colossal coalition of over 200 experts, leaders, Nobel Prize winners, former heads of government and 70+ organizations has called for global red lines on AI to be agreed and enforced by 2027. But what should the red lines be? Here are some we made earlier. Thread 🧵

ControlAI@ControlAI

🚨BREAKING: This is HUGE. An unprecedented coalition including 8 former heads of state and ministers, 10 Nobel laureates, 70+ organizations, and 200+ public figures just made a joint call for global red lines on AI. It was announced in the UN General Assembly today! Thread 🧵

English

112

7.8K

Xor ⏸️ retweetledi

Connor Leahy@NPCollapse·29 Eyl

It has been three weeks since we launched MicroCommit, and it's been a bigger success than I could have imagined! We already have over 500 signups, and over 100 people actively taking on their commitments every week! Here are the stats from the last two weeks:

Connor Leahy@NPCollapse

Many, many people care about the risk superintelligence poses to the world. But we all have busy lives, and don’t always know what to do that might actually help. To solve this, we have built Microcommit. 5 minutes of your time per week is all it takes to make a difference!

English

4.8K

Xor ⏸️ retweetledi

MIRI@MIRIBerkeley·13 Eyl

ifanyonebuilds.it

ZXX

Keşfet

@binarybits @deanwball @NewJerusalemAI @AISafetyMemes @ESYudkowsky @neiltyson @karpathy @ibab