Dan Hendrycks

1.6K posts

Dan Hendrycks

@hendrycks

• Center for AI Safety Director • xAI and Scale AI advisor • GELU/MMLU/MATH/HLE • PhD in AI • Analyzing AI models, companies, policies, and geopolitics

San Francisco Katılım Ağustos 2009

113 Takip Edilen44.8K Takipçiler

Sabitlenmiş Tweet

Dan Hendrycks@hendrycks·5 Mar

Superintelligence is destabilizing. If China were on the cusp of building it first, Russia or the US would not sit idly by—they'd potentially threaten cyberattacks to deter its creation. @ericschmidt @alexandr_wang and I propose a new strategy for superintelligence. 🧵

English

130

792

300.4K

Dan Hendrycks@hendrycks·6d

GiveWell has put numbers on this and the cost to save a life is around ~$4K sometimes less. If that is ~60 years a life ($50-100 for a year). Animal interventions can be much stronger. Seems like the marginal benefit of donating with your function is high until the point of donating most everything; however, this gets safely fixed by increasing the multiplier more, but then you become a psycho deep down to people around you. I think I’ve said enough on the conic combination of the utilitarian and egoism functions.

English

259

Leo Gao@nabla_theta·6d

let's put some actual numbers to this. suppose we assume utility is logarithmic in income: U = log N, and the marginal charitable recipient globally makes $1k/yr. if you care about yourself 1000x as much as the average person, you should donate marginal dollars as long as 1000 * dU/dN < 1/1000, or in other words when N > 1,000,000. this feels like a very intuitive result, and makes sense if most people care somewhere between 100x and 10,000x more about themselves than other people.

English

288

Dan Hendrycks@hendrycks·7 May

What happens when AIs become smarter than us? Why would they keep humans around if given the choice? Our new paper argues that only trying to control AIs is a limited strategy, and that a stable, mutualistic human-AI future may be possible.

English

511

112.9K

Dan Hendrycks@hendrycks·6d

Yeah I don’t think this handles the fact that $5 can make some people not starve or cure preventable diseases which crosses the 1000x compared to what $5 does for your wellbeing by default (you can save thousands of lives a year with modest estimates of your salary using GiveWell analysis). Your proposed connectedness function is c(i,j) \propto 1000*δ(i=j) + 1 I think not to be obligated to donate everything you need to jack up the multiplier by some orders of magnitude. This would compound the issues further of you not valuing anyone in your vicinity.

English

294

Leo Gao@nabla_theta·6d

put differently, suppose the entire world consists of only you and one other person, who you care about 1e-3 times yourself, and who (for simplicity) has a linear utility in money. then, you should give this person money until a marginal 1e3 dollars is worth the same to you as to this other person. but the key observation is that making 2 copies of this person doesn't actually change how much money you should give away. or 1e10 copies of this person. now, of course, in real life the other people do not have linear utility, so 1 person vs 1e10 people does matter. but as long as you have enough people that your entire wealth barely budges the wealth of the rest of the world, it doesn't matter if there are 1e10 people or 1e100 people, the amount you give away is the same.

English

412

Dan Hendrycks@hendrycks·6d

@Benthamsbulldog That’s just saying the only right weighting scheme is utilitarian. However, that alternative needs defending because it advocates for the eventual intentional omnicide of humanity, among other things.

English

165

Bentham's Bulldog🔸@Benthamsbulldog·6d

@hendrycks The relative amount you should value any particular kitten's interest relative to your own should not.

English

187

Bentham's Bulldog🔸@Benthamsbulldog·8 May

Recently, @hendrycks proposed Eigenism as a new moral theory. I think the view is very implausible, for reasons I explain in today's post.

English

Dan Hendrycks@hendrycks·6d

If you value yourself at ~0.0000001, and if you can give roughly all your resources to a pool given to the third world, then you'd push for that. In this case, the proposed weighting is roughly utilitarian. (Recall there are many people where $5 can be more significant to them than $5K is to you.) However, let's say you were stranded in with a small group of friends and partner, without a foreseeable rescue. As stipulated you'd value yourself at 1000x everyone else, so you value yourself at ~1, and roughly ~1/1000 for everyone else. In this case your scope of concern would pretty much entirely only be you.

English

294

Leo Gao@nabla_theta·6d

first, this is a bizarre strawman. the alternative is not to assign equal value to yourself and to insects. you can value yourself more than others, but you don't need to be 3e11 times more valuable than other people second, other people taking up x% of your values doesn't mean you are obligated to donate (1-x)% of your money. consider the following: suppose you value yourself 1000x a random person. therefore you occupy 1/7000000th of your own values, and the rest is for everyone else. then, consider the question of whether you should donate a marginal dollar. on the margin, each dollar is worth more to the other person than it is to you, because they have less money. but this is counterbalanced by the fact that they are also worth 1000x less to you. so then the question is, on the margin, how much value do they get for that dollar, and how much do you give up for that dollar? so there is some % where your marginal dollar gives up an amount of comfort that is exactly 1000x smaller than the amount of suffering you can relieve with that money. you probably only need to care about yourself like 1e3times as much as other people to not have to donate all of your money, depending on your assumptions about how bad various things are relative to each other, rather than 1e12. (because your money is probably a tiny drop in the ocean, you can treat the utility of other people as basically linear on the scales you care about.) i think this is a huge difference! caring about yourself 1000x as much as a random person is extremely different from caring 1000000000000x as much about a random person (i realize my argument earlier was neglecting the nonlinearity of money in important ways, and i retract my earlier claim that valuing yourself 3e11 times more than a random human implies you should be unwilling to spend 1 dollar to benefit everyone else by 3e11 dollars)

English

381

Dan Hendrycks@hendrycks·6d

@Benthamsbulldog If I learn my cat had more kittens than I initially thought, the amount of I care for each individual kitten is diluted (total amount of value of kitten pool can increase though due to how Shapley is calculated).

English

156

Bentham's Bulldog🔸@Benthamsbulldog·6d

I think I may have been a bit imprecise. Of course the share of total value you assign to any extra person will go down as there are more people. What is bad is if the relative weighting between two people changes as a result of the addition of extra people. So it's bad e.g. if learning that there are more Indians makes the amount some particular Indian matters relative to you go down.

English

142

Dan Hendrycks@hendrycks·6d

@Benthamsbulldog For example, a utilitarian would give relatively less weight to humans and prioritize them less if they suddenly learned there are aliens in the solar system.

English

Dan Hendrycks@hendrycks·6d

@Benthamsbulldog Maybe this is a maths confusion. From an optimization perspective it’s the same if you normalize by the total or not, keeping population fixed.

English

164

Dan Hendrycks@hendrycks·6d

@UnderwaterBepis Yeah that can make sense in some proportion.

English

Bepis™ 🔀🫛@UnderwaterBepis·6d

@hendrycks 1) I do not think it has to come back to me doing well, if something is better for our society but worse for me individually I will support it (such as higher taxes on the wealthy, I’m in a higher tax bracket) 2) Many influencers have outsized causal influence on the world 1/2

English

Dan Hendrycks@hendrycks·6d

@UnderwaterBepis You can treat people as praiseworthy who are not connected to you.

English

715

Dan Hendrycks@hendrycks·6d

If you normalize over the population size this is going to happen. Utilitarianism says you should assign ~1/10^19 weight to you, a stranger, and an insect (the insect population is large). If the human stranger pool weight was 0.5---half for you, half for everyone else---you'd still have a random human stranger getting around 1/10^10 weight, since that 0.5 is spread across so many people. Even tithing pushes for something like 0.1 for everyone else. If you believe every person should matter ~as much as you do to you, then you basically shouldn't care much for yourself, are pushed toward donating until you are barely getting by.

English

1.6K

Leo Gao@nabla_theta·8 May

@hendrycks (sorry, i misread, and it's actually 300 billion times. still insane imo!)

English

493

Dan Hendrycks@hendrycks·6d

It's useful to think of examples. * Take a streamer or celebrity. If you're barely connected to them, and if you are willing to sacrifice for particular a streamer, then that suggests something irrational. * Take a favored politician. If you're barely connected to them, it can still make sense to care for their wellbeing, as their doing well means you'll also do well (if you're right about them being good and effective).

English

Bepis™ 🔀🫛@UnderwaterBepis·6d

@hendrycks Like, “praiseworthy” without actually caring for their interests seems hollow to me

English

Dan Hendrycks@hendrycks·6d

@StatsLime @Benthamsbulldog

QME

Limelihood ⏸️ Function 🔸🔸🔸@StatsLime·8 May

@Benthamsbulldog @hendrycks Ironically I think it has promise as a theory of selfish behavior (a kind of generalization of genes in evolution)

English

167

Dan Hendrycks@hendrycks·6d

@itsFelipeDoria @Benthamsbulldog "Everyone to count for one, nobody for more than one" The usual view is that the scale of wellbeing is less for insects.

English

100

Felipe Doria@itsFelipeDoria·8 May

@hendrycks @Benthamsbulldog No: utilitarianism entails that everyone's welfare counts, not that your concern is divided equally by headcount. Utilitarians can assign different moral weights based on, e.g., capacity for welfare. (And insect sentience is uncertain, so you'd discount for that uncertainty too.)

English

Dan Hendrycks@hendrycks·8 May

@Benthamsbulldog Fix: Half of what matters in the world _to you_. Yes it would be worthy of praise (as noted in the paper).

English

323

Bentham's Bulldog🔸@Benthamsbulldog·8 May

I don't think this is a bad result. The world has a lot of important stuff, so that my life is only a small portion of it. In fact, it seems bad to suggest, as Eiginism does, that *I* am half of what matters in the world. If I gave up my life to save a million people, that would seem, even if not obligatory, at least a good thing to do!

English

706

Dan Hendrycks@hendrycks·8 May

@Benthamsbulldog I think it is reasonable that a random person in a population of size K has direct weight roughly O(1/K) to someone else. But that does not mean there is no reason not to harm them; the appendix discusses several strong indirect reasons against doing so.

English

316

Bentham's Bulldog🔸@Benthamsbulldog·8 May

The article was talking about Eigenism. Maybe utilitarianism is false, but that is beside the point of the article. Re the subtitle, I think it is a bad result if your attempted precisification differs from anything remotely in the universe of reasonable weightings by many orders of magnitude. Though only a tiny portion of those weightings discuss the appendix formalization.

English

609

Dan Hendrycks@hendrycks·8 May

I added these clauses with you specifically in mind: "This is not a proposal to replace embodied human life with simulations, nor to treat biological existence as a temporary stepping stone. Because a living being is the densest carrier of its own pattern, the community protects existing humans and AIs in their current forms"

English

9.4K

Dr. Émile P. Torres (they/them)@xriskology·8 May

Oh my god.

Dan Hendrycks@hendrycks

English

Dan Hendrycks@hendrycks·8 May

@logangraham Eventually could be a long time, especially in critical infrastructure, much of which is running on Windows 7.

English

1.8K

Logan Graham@logangraham·8 May

I'm optimistic this eventually favors defense over offense. We wanted to start this transition cautiously. I've honestly been inspired by what orgs have been able to do with Mythos. More to come!

Alex Albert@alexalbert__

With the help of Claude Mythos Preview, the Firefox team fixed more security bugs in April than in the past 15 months combined.

English

235

46.8K

Keşfet

@Benthamsbulldog @UnderwaterBepis @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA