Richard Korzekwa

471 posts

Richard Korzekwa

@WeakInteraction

Cyclist, AI safety researcher, and former physicist. I'm bad at Twitter fights, so go easy on me.

Berkeley, CA Beigetreten Aralık 2011

497 Folgt342 Follower

Angehefteter Tweet

Richard Korzekwa@WeakInteraction·25 Nis

Since nobody took me up on my $60 bounty for solving the alignment problem, I made a website for it: sixtybucks.org Would someone please just solve it? I'm getting kind of tired of this whole AI risk thing.

English

1.7K

Richard Korzekwa@WeakInteraction·1 May

@Kaoru_Hayakawa @Birdyword This is an improvement over a similar idea I had a while back x.com/WeakInteractio…

Richard Korzekwa@WeakInteraction

@HumanHarlan they should at least allow people to shower in the server-heated water before they shoot it into the sun smh

English

631

早川　薫＠天皇弥栄@Kaoru_Hayakawa·1 May

@Birdyword はじめまして。素晴らしいアイデアですね中世の騎士が門番してそうです😉 こちらを受けて私のアイデアはデータセンターの排熱でお湯を沸かして温泉化です外観をローマ帝国時代のテルマエ（大浴場・温泉施設）として、周辺住民への還元と経費削減が出来れば地域との軋轢も減ると思います。

日本語

344

34.2K

Mike Bird@Birdyword·30 Nis

Many people do not seem to want data centres built near them, despite the fact that they don't cause that much traffic and often generate a lot of local tax revenue. I suspect it's partly because they're ugly! My proposal:

English

1.6K

1.5K

17.5K

3.5M

Richard Korzekwa@WeakInteraction·29 Nis

@JeffLadish @EigenGender I don't think it's that hard to find like 3-10 bits of entropy or whatever. Admittedly, if i were actually in this situation, I'd try pretty hard to get enough bits and ensure they're actually random. And I'm not going to do that now. But seems pretty easy tbh.

English

271

Jeffrey Ladish@JeffLadish·29 Nis

@EigenGender And how do you figure out that 40%? (I forgot to think about this when specifying the trapped on an island scenario, I mean I guess maybe we have phones? Unclear. But if no phones, how do you come up with a decent enough source of randomness?)

English

6.1K

Jeffrey Ladish@JeffLadish·29 Nis

You find yourself trapped on an island with 99 identical copies of yourself. If you press the red button, you will certainly die. If you press the blue button, you’ll die if and only if at least half of the clones presses blue. What do you do?

English

313

146

41.8K

Richard Korzekwa@WeakInteraction·29 Nis

@RoseAndGarden @Aella_Girl I think perhaps you should be more cautious about generalizing from N=1 to broad statements about groups of people

English

WeightDecayWarlord@RoseAndGarden·28 Nis

@Aella_Girl My father is 73, I told him about the question and he asked "ok but what's the catch? what do I get by taking blue?" I said "It's exactly what it sounds like, no catch" - he said "ok red, what now?" When people aren't busy signaling they're not betting their lives for sports

English

660

Aella@Aella_Girl·28 Nis

Alright I tested this on Glosso, a small social media platform made up of adults with permanent account bans on the line (instead of death). Almost a thousand people voted. And the result was.... Exactly the same percentages as this poll

Tim Urban@waitbutwhy

Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?

English

145

2.7K

229.3K

Richard Korzekwa@WeakInteraction·29 Nis

@ReplyHobbes @Aella_Girl I like this because either I end up at a party with blue pressers, or I'm spared from a party with majority red pressers.

English

Joshua Hobbes@ReplyHobbes·28 Nis

@Aella_Girl You need to throw a party with the possibility of all blue pressers leaving the party.

English

6.6K

Richard Korzekwa@WeakInteraction·28 Nis

@YIMBYLAND What's a typical year-to-year fluctuation?

English

480

YIMBYLAND@YIMBYLAND·28 Nis

This is legitimately insane. Banning cell phones in schools might turn out to be the best thing we’ve done for our kids in a generation.

Karen Vaites@karenvaites

One year into cell phone bans, Dallas schools see 24% increase in library book checkouts. 👏👏👏 "Public school districts in Texas are almost one school year into the first statewide cellphone ban, and a North Texas school district is seeing positive impacts. Dallas ISD officials said that, district-wide, they have seen a significant increase in library book checkouts, which they largely attribute to students no longer having cellphones with them during the school day. "I started hearing, 'Oh, I'm so bored. I can't get on my phone after I do my work or during lunchtime,'" Hillcrest High School librarian Nina Canales said. "Once they lock into these stories, they don't seem to care about their phones at all." From the first day of school to March 31, 2026, the district reported an increase of more than 200,000 additional books checked out compared to the previous year. A look at the library checkouts for the previous year: 2025-2026 Total Circulation (1st day of school to March 31, 2026) – 1,084,837 2024-2025 Total circulation (1st day of school to March 31, 2025) – 872,430 Total library book checkout increase: 24.35% At Dallas ISD's Hillcrest High, students are following this trend. Canales said there were roughly 500 books checked out in the first nine weeks of the 2024-2025 school year. This school year, that number spiked to about 1,800 books. "That floored me," Canales said. "I had to re-do the report again because I was like, 'What, are you kidding me?'" Students felt the impact too. "Now that I'm busy with a bunch of work and college, I don't find myself missing my phone that much, even at home," said Yamilet Jimenez, 9th grader." By @laceybeasnews. @JonHaidt @safe_screens

English

135

1.7K

14.5K

1.3M

Richard Korzekwa@WeakInteraction·27 Nis

@BeezyManzell @_Jason_Dean_ @davidshor Not caring about the welfare of people who make a mistake is not very pro-social.

English

365

Beezy@BeezyManzell·27 Nis

@_Jason_Dean_ @davidshor except it's not anti-social: The red button is hyper-dominant game-theory wise AND can't backfire. There's zero risk to you or anyone else who thinks through it. The only people who are at risk are those that don't spend any time thinking about their choice.

English

1.2K

David Shor@davidshor·25 Nis

We asked this to a large sample of nationally representative Americans - blue wins by a 3:1 margin!

Tim Urban@waitbutwhy

English

412

324

4.7K

1.2M

Richard Korzekwa@WeakInteraction·27 Nis

Sure, but it's in the interest of the employer for employees to be healthy (cynically, just healthy enough to work productively) and happy (cynically, not so dissatisfied they leave or cause problems), and it's tax-efficient to do this through premiums. How well this translates to money spent on employee healthcare will depend on a lot of things, but at least in cases where employers are competing for employees, or employers are spending a lot on employee compensation, this incentivizes paying high premiums to whichever company has the best reputation for employee satisfaction.

English

Jeremy Leipzig@jermdemo·27 Nis

@WeakInteraction @ATabarrok @SuperRareKeegan Employers don't necessarily want to pay for gold-plated tiers. Aren't they the real consumers here?

English

Alex Tabarrok@ATabarrok·26 Nis

Few people understand: Health insurance companies make most of their profit by paying claims, not by denying them.

English

185

2.6K

599.2K

Richard Korzekwa@WeakInteraction·27 Nis

@jermdemo @ATabarrok @SuperRareKeegan If the ratio is fixed, sounds like more paid claims = more premiums = more revenue?

English

Jeremy Leipzig@jermdemo·27 Nis

@ATabarrok @SuperRareKeegan Huh? I've never read an insurance companies statistics on paid claims out. Aren't they so regulated by loss‑ratio rules these days it is almost a fixed ratio of premiums?

English

436

Richard Korzekwa@WeakInteraction·27 Nis

You don't think that two firms, which people can choose between, is sufficient for competition? If only McD's and BK sell burgers, is there 'essentially no marketplace'? (btw my understanding is that in some cities there is only one insurance provider that actually gives you access to most care, and IMO this is a fairly strong objection to arguments that competition keeps things reasonable)

English

Nate Kohari@nkohari·27 Nis

@WeakInteraction @ATabarrok @SuperRareKeegan That doesn’t qualify as competition. There’s essentially no marketplace from the consumer perspective. Again, nothing like a restaurant.

English

Richard Korzekwa@WeakInteraction·27 Nis

@nkohari @ATabarrok @SuperRareKeegan Sometimes employers don't have much choice, but in many cases there is competition.

English

Richard Korzekwa@WeakInteraction·27 Nis

@nkohari @ATabarrok @SuperRareKeegan I've had employers offer insurance from two totally different providers, and I was able to switch during open enrollment. I've also had employers ask if I'm happy with insurance they provided, and employers will offer insurance with a good reputation to attract/retain talent.

English

Richard Korzekwa@WeakInteraction·25 Nis

@RokoMijic @santiagoarraga It still doesn't require "everyone else in the entire world is also an altruist". I agree it requires a lot of confidence that enough people are also altruists, but "everyone else" is a wildly crazier thing to be confident in than "50% + a few sigmas of uncertainty".

English

168

Roko 🐉@RokoMijic·25 Nis

@santiagoarraga But if you think 50%+1 of the people in the world are altruists and the other 49%-ish are self-interested or sadistic, then you couldn't be sure that the people in any specific experiment were 50%+1 altruists. 50%+1 is empirically fragile. It is on the very edge of disaster.

English

872

Roko 🐉@RokoMijic·25 Nis

Time for some math on the blender game. The Blender Game is an excellent probe that reveals as very particular way that the minds of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) people are broken. Perhaps THE way that they are broken. What is the rational solution to the blender game? Well, basic game theory for self-interested players gives a clear answer. You never get into the blender. This is because the move of not getting into the blender strictly dominates getting in: whatever happens, you will always be better off or the same if you stay out of the blender. The end. In game theory a Nash Equilibrium is very simple - it is a state where there's no unilateral move anyone could make to improve their own situation. In the (selfish) Blender game there are many Nash Equilibria because lots of states have the property that it's way above 50% in the blender, so it doesn't matter what any one individual does. So all of those are Nash Equilibria, as well as the state where everyone is outside the blender ("all red" in Tim Urban's red/blue framing). But all the states that have <50% in the blender are not Nash Equilibria because anyone one of the people in the blender could now save their own life by exiting. But the Nash Equilibrium where everyone is outside is in some sense better. It is more stable. In game theory we can formalize this as a type of equilibrium called Trembling Hand Perfect Equilibrium (THPE). In THPE we imagine that people will make their moves in the game and then with some small probability they will accidentally press the wrong button because their "hand is trembling". There is only one THPE for the blender game with selfish players, which is when everyone presses red. It's easy to see why: imagine there's 50% people exactly in the blender. Out of 1000, 500, for example. Then if you imagine that each one of those 500 people who are deliberately putting themselves into the blender has a small chance of accidentally not going into the blender. Now if you are one of these people, you reason that even if you don't make the mistake yourself, someone else might. And then you are going to die, which is bad, so you can improve your situation by actually exiting. The same is true for 501 people, 502, etc. None of the "in the blender" states are actually Trembling Hand Perfect Equilibria. But, the "everyone out of the blender" state is a Trembling Hand equilibrium, because even though some people might accidentally go into it with some small probability, you are definitely not going to improve your own chances by joining them to almost certainly be blended. Okay, but what about if you are a mixture of selfish and altruistic. Say you assign utility +1 to yourself for surviving, and +1/N for each other person who survives. We can analyze this new game: there are now other "stable" (THPE) equilibria? Yes. If everyone is a bit altruistic, then "all in the blender" also becomes a Trembling Hand Perfect Equilibrium. The reason for this is that for someone who is at least a little bit altruistic, it is okay for them to suffer a small chance of being blended in exchange for a larger chance of saving the larger group. "The good of the many outweighs the good of the few - or the one". Note that in these games both the size of the set you are saving and the probability of saving them is larger, because in order for getting out of the blender to actually save yourself, you need one more other person to also get out, which is ε times less likely. So any nonzero amount of altruism is enough to make these blue equilibria THPE. This seems to vindicate the "Blue" position. As long as everyone is at least a little bit altruistic, "All in the Blender" is actually a Trembling Hand Perfect Equilibrium, so it is at least equally valid to "All out of the Blender", and some might argue superior since under trembling hand conditions it can prevent anyone from getting blended, most of the time (there are absurdly unlikely cases where many people simultaneously slip up). But there is a problem. The "All in the Blender"/"All Blue" equilibrium is only Trembling Hand Perfect if the number of altruists is at least at or above the 50% threshold. If there are 49% altruists and 51% are egoists, then the egoists will rationally abandon the altruists in the blender because both the altruists', and egoists' hands are trembling, so the blender is still dangerous, even if only slightly. But in reality you never really know how many people are slightly altruistic, versus just self interested and rational. In practice a fair number of these games end up with red winning. In these games if you use a mixed population with more than half the people being purely self-interested or even sadistic, the "All in the Blender"/"All blue" equilibrium is no longer Trembling Hand Perfect. To see why, think about a mixed population where there are 3 selfish players and 2 altruists. Imagine them all provisionally choosing to go into the blender, and then reconsidering their options in light of the fact that someone(or several!) might slip. All the selfish people realize that if any three (or all four!) of the other four slip, they will be in the blender either with one other person or on their own, and they will then die. Therefore, all three selfish players will not enter the blender. But then, the altruists also get blended with high probability, so actually they don't want to get in either. Now imagine that all the altruists are sort of "running the same algorithm", like functional decision theory. If they assign any nonzero probability to the case that they are outnumbered by selfish people, they should all choose to get out of the blender/all play red. This is because in cases where self-interested players outnumber altruists, playing red strictly dominates even for the altruists, and in cases where altruists outnumber the self-interested you can do either and it makes no difference to first order. High commitment cooperation only makes sense when you are absolutely sure that the altruists outnumber the merely self-interested who larp as altruists. So to pick blue, it is not enough to merely be an altruist. Rational altruists wouldn't pick blue. You must also walk around with the background assumption that everyone else in the entire world is also an altruist. What is the flaw of the WEIRD mind that this thought experiment exposes? It's that WEIRD people do game theory by tentatively assuming that every group they ever interact with is composed of altruists/cooperators, and then maybe adjusting given specific information on bad individuals. It's "Assume everyone is a cooperator by default, and then adjust if needed" decision theory. WEIRD Decision Theory. WDT. This sounds stupid, but it is a neat hack that solves lots of things. It prevents WEIRD people from letting rational mutual doubt ruin their lives by defecting just on the chance that the other person might want to defect. It is also probably about the simplest way to solve that, other than "always cooperate". So to WEIRD people, "All blue" comes out as the obviously correct answer, even though it is not actually the right answer in the math. They don't like it even when they know the math! The blender game is weirdly, unnaturally balanced to expose this flaw. Usually there is some active benefit to coordination, so the "always assume other people will cooperate if you do" hack does tend to line up with the math, because the small chance of people not cooperating is usually cancelled out by big benefits of cooperation. But in the blender game, there is no benefit to cooperating. The uncoordinated equilibrium is just better. WEIRD people don't like it when uncoordinated equilibria are just better. This is why they are always trying to cancel capitalism. And this is why they keep getting into the blender. □

Roko 🐉@RokoMijic

We're doing the "Blender" game again There is a large blender. Everyone in the world has to decide whether to step into the blender. If at least 50% of the people do step into the blender, it will be unable to overcome their inertia to get started, and everyone survives. If less than 50% of the people step into the blender, then they all get blended up into paste and die. People who do not step into the blender suffer no adverse effects. Would you step into the blender? (Blue=step into the blender, Red= don't do that)

English

293

22.4K

Richard Korzekwa@WeakInteraction·21 Nis

@LosAlamosNatLab This page isn't loading for me, btw.

English

Los Alamos National Laboratory@LosAlamosNatLab·19 Nis

What if AI could do more than assist with research? What if it could collaborate — helping design and carry out experiments from start to finish? Meet URSA, an open-source, agentic AI system developed to push the boundaries of how science gets done. ow.ly/X8iu50YKFpE

English

1.9K

Richard Korzekwa@WeakInteraction·15 Nis

It's been more than a week, and @sleepinyourhat still hasn't answered this question. What is he hiding?

Dan McAteer@daniel_mac8

@sleepinyourhat @jaskol_ski Sam, what we all really want to know: What kind of sandwich?

English

2.6K

Richard Korzekwa@WeakInteraction·13 Nis

I checked it out today and it's a nice little shop. Most of the items aren't things I particularly want to buy, and some of the pricing seems a bit steep, but on the other hand, I bought the same mug for the same price as Daniel, so who am I to say it's wrong? Luna called while I was there and we had a short, awkward, friendly conversation. After that, I chatted with the employee who was there for a bit and she seemed pretty good natured and happy to talk about the experience.

Daniel Filan@dfrsrchtwts

Not open right now despite the website andon.market saying its hours are 10 am to 7 pm Monday thru Sunday :(

English

240

Richard Korzekwa@WeakInteraction·10 Nis

The card reads to me like Anthropic understands there are nontrivial issues with their current process for ensuring safety and alignment. They clearly put a ton of work into evaluating Mythos, and they clearly know a lot of not-so-obvious places to look for shortcomings in these evaluations. Also, they shared many pages of quantitative results and anecdotes, which is awesome. But, FWIW, I did not read all that much concern in the language. I think there is good reason to suspect this process and anything like it are wholly inadequate for the next generation of models, much less whatever comes after that, and I did not get a "this shit is not gonna cut it" vibe from the model card at all. Did I just not read it charitably enough? Did I not apply a strong enough adjustment for "this passed through some PR team's editing"? IDK, maybe. But either way, that's how it comes across to me and ~everyone who read it with me yesterday.

Drake Thomas@MaskedTorah

@So8res I agree this is is often a source of epistemic slipperiness but in the particular case of the Mythos Preview system card, I feel like "this shit is not gonna cut it for ASI and that is concerning" was actually relatively well signposted!

English

731

Richard Korzekwa@WeakInteraction·8 Nis

(in case my vintage internet reference is not obvious: youtube.com/watch?v=sHzdsF…)

YouTube

English

Richard Korzekwa@WeakInteraction·8 Nis

The crack-cocaine Claude figured staying sandboxed was for suckers.

English

158

Richard Korzekwa@WeakInteraction·6 Nis

@jessesingal Lol I know where that is. I used to do that as a training hike a few times a week. It’s a steep climb!

English

Jesse Singal@jessesingal·6 Nis

4/ So now all these poor first responders have to haul their stuff up this hike that’s roughly a thousand feet of elevation gain over a very short distance. At the entrance to the trail there are three vehicles. Further down the road there are more.

English

14.8K

Jesse Singal@jessesingal·6 Nis

1/ BREAKING CALIFORNIA NEWS: I went for a sunset hike in Berkeley. It’s a steep one. At one point you crest this ridge and it flattens out. On the left side of the trail I see a guy with his head craned all the way back.

English

194

104.6K

Entdecken

@Kaoru_Hayakawa @Birdyword @JeffLadish @EigenGender @RoseAndGarden @Aella_Girl @ReplyHobbes @YIMBYLAND