
Ivo Wever
3.1K posts

Ivo Wever
@Confusionist
Do not assume opinions or character from a few tweets. I will point out good/bad arguments for positions I disagree/agree with.














how i became a blue-pusher i have a confession: years ago, in the first button poll, i pressed red "well theres a correct answer here, right?" then i saw the results and that we all won bc blue won and i didnt get mad, just realized i was wrong and switched allegiances

This is just like being alive in the 1600s when they got good at making complicated clocks and deduced that every complicated thing in the universe probably functioned exactly like a clock





one thing many don't get about decision theory is you're allowed to say "nope" to bad problem statements. "you can take either the $5 or the $10, and a perfect predictor tells you you take the $5; what do you do?" nope. not a real possibility.


I'm not sure why it's so hard for LessWrong people to know what we're talking about when we say "Y'all are too into violent rhetoric, maybe cut it out?" I think they need help. I mean this non-bitingly and non-sarcastically. I think the internet needs to explain to the more receptive members of the LessWrong community why and how this is a problem. Maybe if we do that enough, we'll get through? E.g., I think Ryan Greenblatt is *not* needlessly violence-themed in his writing, yet also genuinely doesn't know what I'm talking about when I complain about it. With the recent violent attacks on Sam Altman, and a lot of reactions like "Wtf AI safety people, tame it down!", I wonder if some more actual good-faith messages could help. Like: "No seriously, we're not trying to be mean about this, it's just actually a problem how much your community promotes violent memes in connection with AI and AI safety. Can you please try to understand this and then explain it to your friends?"









this is going to sound like an attack but i swear i am actually trying to help you: you are deep in the throes of infection by a memetic virus eliezer yudkowsky banged together in his garage decades ago to take over other people's minds and convert them to his way of thinking about the singularity, which he spread through writing the sequences and hpmor, and which is powered at its core by a deep confusion between panicking over the idea of your loved ones dying and loving them. it maintains its grip over you by (among other things) 1. repeatedly insisting that the singularity is the most important thing ever, infinitely important, more important than any other merely earthly consideration, since the highest possible stakes (the entirety of human existence in the entire lightcone) are at risk; a sword of damocles hanging over literally everything you can even slightly plausibly causally affect; if it goes well that's infinitely good and if it goes wrong that's infinitely bad. infinite heaven or infinite hell 2. convincing you that this is a position only a sufficiently smart and sane person is capable of understanding and holding, which flatters your self-concept (which is hidden and which therefore, as jung pointed out, controls you), and conversely that people who don't agree are insane idiots you could not possibly learn anything from, so you not only should not listen to them but it is infinitely important for you not to listen to them, if you listen to them everyone you love dies 3. filling you with panic about how to prevent infinite hell while also convincing you that this is what it feels like to actually love your loved ones, which means this panic is infinitely good, and anyone or anything trying to get you to feel less of it is doing something infinitely bad, you cannot relax, if you relax your entire family dies you have been trapped in a hell realm, on purpose, powered by your own capacity to love which is being used to torture you into submission, by somebody who decided that your autonomy as a human being was worth sacrificing in the face of infinity. what eliezer did to you (and to me, and to many others) was monstrously evil and predicated on a heartbreaking mistake, and the reverberations of this extremely evil, extremely stupid thing that he did when he was a young, arrogant fool are still spreading and doing much harm in the world today, and will likely continue to do so i promise this is actually good news. the situation is actually much better than it seems when viewed from hell. you are not so intelligent and powerful that it is your sole job to be the light in the darkness, you do not have to shoulder the responsibility for the entire lightcone, your shoulders are literally too small, it is literally not your job, you are literally not and cannot be god (or atlas). nobody actually knows what's going to happen. we are foolish and weak and finite in the face of the true weight and depth and breadth of the world and history and karma and god, and that is fine and good and the completely normal situation every human being who ever lived has been in once you relax and open your eyes enough to actually take in what other people are doing and why you can begin to notice that love and wisdom are actually everywhere. people are foolish and cowardly and easily misled, but they are also wise and strong and brave and fighting every day for survival one way or another, and that's how it's always been. there is so much to learn from all the different ways the people of the world fight for the good today the sun is out and the view from my window is green and purple with life and the birds are chirping. right now, in this moment, i am alive, i am safe, my loved ones are safe. i can take a deep breath. i can go to the bathroom and drink water and make breakfast. i do not know what is going to happen next. and so it is with you







I was born exactly 26 years ago. For the first time, I have a birthday that might be my last. I’m writing this to increase the chance it isn’t. A hundred thousand years ago, our ancestors appeared in a savanna with nothing but bare hands. Since then, we made nuclear bombs and landed on the moon. We dominate the planet not because we have sharp claws or teeth but because of our intelligence. Alan Turing argued that once machine thinking methods started, they’d quickly outstrip human capabilities, and that at some stage we should expect machines to take control. Until 2019, I didn’t really consider machine thinking methods to have started. GPT-2 changed that: computers really began to talk. GPT-2 was not smart at all; but it clearly grasped a bit of the world behind the words it was predicting. I was surprised and started anticipating a curve of AI development that would result in a fully general machine intelligence soon, maybe within the next decade. Before GPT-3 in 2020, I made a Metaculus prediction for the date a weakly general AI is publicly known with a median in 2029; soon, I thought, an artificial general intelligence could have the same advantage over humanity that humanity currently has over the rest of the species on our planet. AI progress in 2020-2025 was as expected. Sometimes a bit slower, sometimes a bit faster, but overall, I was never too surprised. We’re in a grim situation. AI systems are already capable enough to improve the next generation of AI systems. But unlike AI capabilities, the field of AI safety has made little progress; the problem of running superintelligent cognition in a way that does not lead to deaths everyone on the planet is not significantly closer to being solved than it was a few years ago. It is a hard problem. With normal software, we define precise instructions for computers to follow. AI systems are not like that. Making them is more akin to growing a plant than to engineering a rocket: we “train” billions or trillions of numbers they’re made of, to make them talk and successfully achieve goals. While all of the numbers are visible, their purpose is opaque to us. Researchers in the field of mechanistic interpretability are trying to reverse-engineer how fully grown AI works and what these opaque numbers mean. They have made a little bit of progress. But GPT-2 — a tiny model compared to the current state of art — came out 7 years ago, and we still haven’t figured out anything about how neural networks, including GPT-2, do the stuff that we can’t do with normal software. We know how to make AI systems smarter and more goal-oriented with more compute. But once AI is sufficiently smart, many technical problems prevent us from being able to direct the process of training to make AI’s long-term goals aligned with humanity’s values, or to even make AI care at all about humans. AI is trained only based on its behavior. If a smart AI figures out it’s in training, it will pretend to be good in an attempt to prevent its real goals from being changed by the training process and to prevent the human evaluators from turning it off. So during training, we won’t distinguish AIs that care about humanity from AIs that don’t: they’ll behave just the same. The training process will grow AI into a shape that can successfully achieve its goals, but as a smart AI’s goals don’t influence its behavior during training, this part of the shape AI grows into will not be accessible to the training process, and AI will end up with some random goals that don’t contain anything about humanity. The first paper demonstrating empirically that AIs will pretend to be aligned to the training objective if they’re given clues they’re in training came out one and a half years ago, “Alignment faking in large language models”. Now, AI systems regularly suspect they’re in alignment evaluations. The source of the threat of extinction isn’t AI hating humanity, it’s AI being indifferent to humanity by default. When we build a skyscraper, we don’t particularly hate the ants that previously occupied the land and die in the process. Ants can be an inconvenience, but we don’t give them much thought. If the first superintelligent AI relates to us the way we relate to ants, and has and uses its advantage over us the way we have and use our advantage over ants, we’re likely to die soon thereafter, because many of the resources necessary for us to live, from the temperature on Earth’s surface to the atmosphere to the atoms were made of, are likely to be useful for many of AI’s alien purposes. Avoiding that and making a superintelligent AI aligned with human values is a hard problem we’re not on a track to solve in time. *** A few years ago, I would mention novel vulnerabilities discovered by AI as a milestone: once AI can find and exploit bugs in software on the level of best cybersecurity researchers, there’s not much of the curve left until superintelligence capable of taking over and killing everyone. Perhaps a few months; perhaps a few years; but I did not expect, back then, for us to survive for long, once we’re at this point. We’re now at this point. AI systems find hundreds of novel vulnerabilities much faster than humans. It doesn’t make the situation any better that a significant and increasing portion of AI R&D is already done with AI, and even if the technical problem was not as hard as it is, there wouldn’t be much chance to get it right given the increasingly automated race between AI companies to get to superintelligence first. The only piece of good news is unrelated to the technical problem. If the governments decide to, they have the institutional capacity to make sure no one, anywhere, can create artificial superintelligence, until we know how to do that safely. The AI supply chain is fairly monopolized and has many chokepoints. If the US alone can’t do this, the US and China, coordinating to prevent everyone’s extinction, can. Despite that, previously, I didn’t pay much attention to governments; I thought they could not be sufficiently sane to intervene in the omnicidal race to superintelligence. I no longer believe that. It is now possible to get some people in the governments to listen to scientists. Many things make it much easier to get people to pay attention: the statement signed by hundreds of leading scientists that mitigating the risk from extinction from AI should be a global priority; the endorsements for “If Anyone Builds It, Everyone Dies” from important people; Geoffrey Hinton, who won the Nobel Prize for his foundational work on AI, leaving Google to speak out about these issues, saying there’s over 50% chance that everyone on Earth will die, and expressing regrets over his life’s work that he got Nobel prize for; actual explanations of the problem we’re facing, with evidence, unfortunately, all pointing in the same direction. Result of that: now, Bill Foster, the only member of Congress with PhD in physics, is trying to reduce the threat of AI killing everyone; and dozens of congressional offices have talked about the issue. That gives some hope. I think all of us have somewhere between six months and three years left to convince everyone else. *** When my mom called me earlier today, she wished me good health, maybe kids, and for AI not to win. The last one is tricky. Winning is what we train AIs to do. In a game against superintelligence, our only winning move is not to play. I love humanity. It is much better than it was, and it can get so much better than it is now. I really like the growth of our species so far and I want it to continue much further. That would be awesome. Galaxies full of life, of trillions and trillions of fun projects and feelings and stories. And I have to say that AI is wonderful. AlphaFold already contributes to the development of medicine; AI has positive impact on countless things. But humanity needs to get its act together. Unless we halt the development of general AI systems until we know it is safe to proceed, our species will not last for much longer. Every year until the heat death of our universe, we should celebrate at least 8 billion birthdays.





