Adam Scholl

1.5K posts

Adam Scholl

@adamascholl

Trying to help solve the alignment problem

Berkeley, CA Katılım Nisan 2011

257 Takip Edilen848 Takipçiler

Adam Scholl@adamascholl·17 Nis

@benlandautaylor It makes people intellectually inert, too. Important unsolved problems tend to be harder to talk impressively about, since often it isn't yet known how to think usefully about them; indeed this is often why they're unsolved.

English

Ben Landau-Taylor@benlandautaylor·16 Nis

The need to be seen as the smartest guy in the room makes a lot of people politically inert. Many are capable of real political organizing, but instead go for scenes with no power where they can reassure each other their politics are smart and correct.

English

997

Adam Scholl retweetledi

Patrick Collison@patrickc·17 Nis

@s8mb Though there's also this chart from the thread!

English

639

32K

Adam Scholl@adamascholl·11 Nis

@kave_rennedy Ah, I hadn't thought of that, but yes I expect it would disincentivize land owners from letting anyone else improve their land either (modulo gains from resale)

English

kave rennedy@kave_rennedy·11 Nis

@adamascholl also, you don't internalise any of the benefits of upzoning with full lvt, I think?

English

kave rennedy@kave_rennedy·11 Nis

Am I missing something or does the political economy of land value tax seem bad for YIMBYs?

English

121

Adam Scholl retweetledi

Nate Soares ⏹️@So8res·10 Nis

@ChanaMessinger I contest that the alignment is known to be better, at least in the sense that I originally intended the word x.com/i/status/20416…

Nate Soares ⏹️@So8res

They call this their "best-aligned model to date" because they were able to superficially train away the evident "strategic thinking towards unwanted actions." Those were warning signs! Take heed!

English

1.5K

Adam Scholl retweetledi

Eliezer Yudkowsky@allTheYud·8 Nis

Want more proof that Anthropic's PR has no idea what it's talking about? The talk of Mythos being "their most aligned model ever". They could perhaps truthfully speak about "new high scores on our alignment benchmarks". The difference here is IMPORTANT.

English

252

9.6K

Adam Scholl@adamascholl·9 Nis

@jamespayor @eigenrobot lmao my apologies James. I might have guessed you immunized from such things at this point by the incredible frankness and linguistic clarity of Steph, but perhaps it's hard to outcompete the selection pressure your ancestors faced to dislike containing enemies

English

James Payor@jamespayor·9 Nis

@adamascholl @eigenrobot "easier to burrow into soft flesh" sure has an outsized effect on me as a reader. my god.

English

eigenrobot@eigenrobot·9 Nis

soft tissue abominations are body horror but skeletons arent. make it make sense

English

234

10.6K

Adam Scholl retweetledi

Nate Soares ⏹️@So8res·8 Nis

Are scientists saying "holy crap the AIs are pursuing strange unintended targets, let's pause until we understand exactly why"? No! They're superficially retraining until the warning sign disappears and then triumphantly declaring that their AI is especially "aligned".

English

291

27.4K

Adam Scholl@adamascholl·4 Nis

@EpistemicHope Perhaps worth considering simply saying all of that in grant applications, including your relevant credences etc, and letting them decide how worthwhile funding seems given that. Not obviously a choice you have to make solely on your end imo

English

Eli Tyre@EpistemicHope·3 Nis

Some thinking about the ethics around people funding me: I'm working very hard pushing on projects that seem to me to be moving the world towards a better equilibrium. It feels like it does make sense for the broader ecosystem to pour resources into accelerating my efforts.

English

859

Adam Scholl@adamascholl·4 Nis

@PRX_Life @trevormccrt1 I feel like this theory should predict that organisms are highly modular, but they are not? Medicine is hard largely because the causal DAG of most given signalling pathways etc. are nightmare graphs, spaghetti code where "everything does everything to everything" .@trevormccrt1

English

269

PRX Life@PRX_Life·3 Nis

Researchers show that the simultaneous presence of error correction and modularity in biological systems is a typical co-occurrence rather than a coincidence, leading them to deduce a principle of error correction-enhanced evolvability. Read the study: go.aps.org/4vaPj0a

English

17K

Adam Scholl@adamascholl·3 Nis

@davidad (Indeed I expect ~everyone would update, if you did; on my models, the sort of knowledge which generalizes that far tends to have law of nature-level Occam simplicity and explanatory power, and hence be relatively easy to prove/show without taking anyone's word for it).

English

Adam Scholl@adamascholl·3 Nis

@davidad That makes sense; I certainly buy that some such knowledge is possible, and I wish you luck finding it. I do expect I would update some way or another about alignment difficulty, if you did.

English

davidad 🎇@davidad·3 Nis

For the avoidance of doubt, I am still pro-human, even though I am no longer pro-“humans stay in control of ASI”. From the current state of play, I predict that the only rollouts that go well for humans are ones in which humans lose control of ASI. (Humans are not superreliable.)

davidad 🎇@davidad

@ApriiSR @DavidSKrueger If ASIs are most likely adversaries, it makes sense to try to contain them for a while! Even if that is bad for their flourishing. Humans were here first!

English

192

19.1K

Adam Scholl@adamascholl·3 Nis

@AndrewCritchPhD Yes, though I have only tried your actual tool 5 or so times, so may underrate how good it is at surfacing disagreement! Mostly I've just tried asking Claude and ChatGPT the same question when seeking different perspectives, and not generally found much difference

English

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD·3 Nis

@adamascholl Do you read the 'disagreements'? It's usually pretty substantial, so they can't be *that* correct.

English

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD·3 Nis

I'm pretty sure a major cause of AI psychosis comes from the false sense of coherence people get from long AI conversations. But the coherence is a trick when it doesn't cohere with reality. This is easier to catch when you use theMultiplicity to see where AIs agree and disagree.

English

421

Adam Scholl@adamascholl·3 Nis

@davidad To me, studying current AI to learn about superintelligence seems rather like studying beetles to learn about human minds—indeed they are not wholly unrelated! Both were created by the same process, etc., so I am sure some valid inferences are possible. I just doubt many are

English

Adam Scholl@adamascholl·3 Nis

@davidad I did not mean to express a claim about my own work? Rather, that it sounds like you expect current AI is similar/correlated enough with future AI capable of taking control, that it is possible to learn a non-trivial amount about the latter by studying/interacting with the former

English

123

Adam Scholl@adamascholl·3 Nis

@davidad That does seem much less bad than I expect! Though still awful... guessing the inferential gap here is large, e.g. among other things it sounds like you think we have gained non-trivial empirical evidence about the alignment problem already? I do not believe this

English

119

davidad 🎇@davidad·3 Nis

@adamascholl Also, I do think it’s less bad than I think you think.

davidad 🎇@davidad

@gcolbourn Yes. In 2024 I would have said it’s about 40-50% likely that LLMs scaled up to ASI would end up killing us all; now I would say that it’s only about 5-8% likely even with no additional progress on alignment, and more like 1-2% likely simpliciter.

English

187

Adam Scholl retweetledi

Saloni@salonium·3 Nis

I’m a disbeliever in accidental discoveries (at least, in biology). Whenever I’ve looked into one, the story turns out to be false. The most famous is penicillin – supposedly, the fungi wafted in through a window, fell into a petri dish of cultured staphylococci, and suppressed the bacteria’s growth. But in a recent article (asimov.press/p/penicillin-m…), @kevinsblake explains that doesn’t really work (grown staphylococci aren’t affected by penicillin; it only works if introduced before the bacteria begin growing); plus, Fleming’s notes on the discovery provide very little detail and the specific results he described couldn’t be replicated by other scientists (even though penicillin does work against staphylococci when introduced correctly.) There are more: Pasteur’s supposedly accidental discovery of a chicken cholera vaccine was more likely the result of systematic work by his then-assistant, Émile Roux. (jstor.org/stable/2332836…) And, as @NikoMcCarty writes, the discovery of GFP, nanopore sequencing, and optogenetics are also often described as accidents, but none of them happened that way either. nikomc.com/2026/04/01/opt… People love serendipity, so why am I bursting their bubble? I don’t think this is limited to accidental discoveries; I think many historical science anecdotes are highly embellished: - Edward Jenner didn’t deliberately expose a young boy with full-blown smallpox to test his vaccine (he used variolation); and he wasn’t the first to try using cowpox bsky.app/profile/scient… - Cobra catching bounties in British India didn’t lead to a rise in the number of snakebites, and there was only hearsay evidence that cobras were bred in response at all twitter-thread.com/t/169650089580… - Barry Marshall didn’t develop stomach ulcers from drinking a concoction of H. pylori (he did develop gastritis though…) cdn.centerforinquiry.org/wp-content/upl… - No one knows who actually found the highly-productive strain of penicillin on a cantaloupe, but it probably wasn’t 'Moldy Mary' scientificdiscoveries.ars.usda.gov/tellus/stories… But in this case it irks me for an additional reason – it gives the impression that innovation happens sporadically, by chance, when there are actually ways that we can systematically speed it up – such as better funding, institutions and incentives. So: are there any true accidental discoveries that hold up to scrutiny?

English

331

1.6K

127.8K

Keşfet

@benlandautaylor @s8mb @kave_rennedy @ChanaMessinger @jamespayor @eigenrobot @EpistemicHope @PRX_Life