Drake Thomas

3.3K posts

Drake Thomas

@MaskedTorah

System cards, risk reports, and misc safety takes at Anthropic; math; puzzles; spaced repetition. Writes with too many caveats for Twitter.

Berkeley, CA Katılım Nisan 2014

487 Takip Edilen2.1K Takipçiler

Drake Thomas@MaskedTorah·19 May

@AaronBergman18 @AndyMasley (But I should confess a bias that some part of my brain feels on a gut level that, oh my god it's just normal air, if you don't actively feel bad right now it's gotta be fine, how could this be a problem. 1890-Drake probably dislikes germ theory for this reason and is wrong.)

English

Drake Thomas@MaskedTorah·19 May

@AaronBergman18 @AndyMasley (See also: everyone being extremely anal about CO2 when nuclear submarine personnel seem to function fine at thousands of PPM for months and the good papers don't see major cognitive effects til a 100x scaleup. I think it's nocebo + random other correlates of poor ventilation.)

English

Andy Masley@AndyMasley·17 May

Sorry I besmirched farmers, I found a worse one.

English

468

29K

Drake Thomas@MaskedTorah·18 May

@DanielCHTan97 (I have no particular insider knowledge here) Seems a bit weird since there's a numbered x axis for a similar graph further down, but FWIW this kind of thing isn't unique to A\: eg OAI's recent blog post on CoT grading doesn't give an absolute x scale. alignment.openai.com/accidental-cot…

English

128

Daniel Tan@DanielCHTan97·18 May

i'm annoyed that anthropic has stopped writing papers. yeah blogposts are nice but this graph would never pass peer review. like, what is the X axis supposed to be? 'training steps' of what exactly? what are the baselines? how are they trained? from: anthropic.com/research/teach…

English

108

5.2K

Drake Thomas@MaskedTorah·18 May

@deanwball I'm also worried about AI persuasion that's much more multimodal, or involves human impersonation, but that feels like moving goalposts and is less centrally what I think of as the superpersuasion threat model (eg it doesn't matter much for alignment oversight concerns).

English

Drake Thomas@MaskedTorah·18 May

@deanwball But on the other hand I look at things like 4o, or the right tail of human persuasive writing, and I don't feel like I can easily rule out much more untapped potential from very very high speaker intellect + modeling ability.

English

Dean W. Ball@deanwball·17 May

What do people mean when they say things like “right message, wrong messenger”? Why does the “aw, you’re sweet/hello, Human Resources?” meme resonate? Persuasion does not scale arbitrarily with the intellect of the speaker or the intellectual content of his speech!

Séb Krier@sebkrier

When I persuade someone to buy something I offer, the basis of that persuasion is that I provide value. The 'buyer' retains the ability to evaluate, refuse, or choose someone else. Is this actually 'power-seeking' as understood by AI safety? If it is, then we should distinguish being influential (Apple is more 'powerful' than my side hustle) from illegitimately seeking power/control in ways that degrade the checks (deception, capture, misrepresentation etc). I think we should distinguish 'capability to persuade' vs 'using that capability legitimately/for ill'.

English

9.6K

Drake Thomas@MaskedTorah·16 May

@ericneyman @GuiveAssadi Huh, what does "way better" mean? Would love to see a bit more gears in the model of why we delay. (I assume "earth" generalizes to mercury etc too here, since otherwise we could just parallelize?)

English

Eric Neyman@ericneyman·15 May

@GuiveAssadi A friend tells me that Dyson Spheres aren't worth building until pretty far past the singularity, because for a while it'll be way better to use the Earth's materials for nuclear fusion, instead of using those same materials to build a Dyson sphere.

English

727

Guive Assadi@GuiveAssadi·15 May

I don’t think there will be a Dyson sphere in the 2030s or 2040s

Ajeya Cotra@ajeya_cotra

@Jsevillamol @binarybits I think we're more likely to get Dyson sphere in the 2030s if something goes wrong (breakneck military-industrial competition between US and China, AI takeover that makes human prefs+regs moot). But even in a "leisurely" world I expect it by the 2040s. Elon Musk would do it.

English

16.9K

Drake Thomas@MaskedTorah·15 May

@rmushkatblat @max_spero_ Yeah, I’m generally happy with GAP 4 or higher meat, sometimes GAP 3 or Certified Humane in a pinch. Animal Welfare Institute has a nice guide: awionline.org/sites/default/…

English

Robert Mushkatblat@rmushkatblat·14 May

@max_spero_ You probably want GAP 4+ Certified Beef. @MaskedTorah might have more detail, not sure.

English

152

Max Spero@max_spero_·14 May

Can anyone explain to me why we don’t have ethical meat? Surely we can give the farm animal a significantly better life than it would have lived in the wild.

English

187

408

72.9K

Drake Thomas@MaskedTorah·12 May

@NinaPanickssery they do vegan egg white powder now, I think from GMO bacteria? haven't tried baking with it yet but might work as a substitute amazon.com/Healthier-Comf…

English

Nina@NinaPanickssery·10 May

Oh, maybe another problem is that it's more risky to eat raw egg whites in the US compared to UK? It's hard to make Tiramisu without raw egg whites, maybe there's a way to do this if you whip it with hot sugar syrup?

English

521

Nina@NinaPanickssery·10 May

When I was a teenager I used to make the most awesome Tiramisu. I should get back to this.

English

1.3K

Drake Thomas@MaskedTorah·8 May

@prerat @jkcarlsmith @AmandaAskell since before joining anthropic I have had an anki card for SIA vs SSA with the mnemonic "SSA is ass-backwards". I want to say credit for this one is due to @KatjaGrace?

English

655

prerat@prerat·8 May

UPDATE: @jkcarlsmith has written at length advocating SIA over SSA. first known anthropic staff member with anthropic opinions (well known to me). he also thanks @amandaaskell in acknowledgements so she is thinking about it too. we are so back

prerat@prerat

i asked claude to deep research this and there were mere crumbs claude is living a sleeping beauty type existence on the daily and the team has no stance on halfer vs thirder??? outrageous!!!

English

3.2K

Drake Thomas@MaskedTorah·8 May

@RatOrthodox (which we might still do ofc, but I think this is a really important strategic consideration and think generic talk of ASI without specifying the degree of superhuman capability muddies the waters here. I try to always say "[mildly/moderately/wildly] superhuman AI")

English

124

Drake Thomas@MaskedTorah·8 May

@RatOrthodox and so I do care a lot about "can we survive at least some levels of superintelligence" to which I think the answer at this point is pretty clearly "yes if we don't fuck up too badly on executing existing techniques"

English

Brangus🔍⏹️@RatOrthodox·7 May

I bet you could use a similar method to read human minds. Train a model that goes from neural activations to words to neural activations. The loss is just KL divergence between input and output. Seems like the bottleneck on human mind reading is neural measurement accuracy.

Anthropic@AnthropicAI

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

English

Drake Thomas@MaskedTorah·8 May

@peterbarnett_ @saprmarks What do you mean by "train SAEs on the NLAs" / what evidence would point towards or against reasonable representations?

English

Peter Barnett@peterbarnett_·7 May

@saprmarks How long until you train SAEs on the NLAs to see if they are actually representing reasonable things?

English

336

Samuel Marks@saprmarks·7 May

In a new paper, we present NLAs, an unsupervised method for converting an LLM's internal state into human-readable text. I've personally been astonished by our results. I think NLAs substantively advance our ability to understand what LLMs are thinking and audit them for safety

Anthropic@AnthropicAI

English

184

43.2K

Drake Thomas@MaskedTorah·8 May

@RatOrthodox I don't think I could defeat a civilization of moderately-dumber-than-me humans who had constant mind reading access to my thoughts and could reset me any time I started doing suspicious seeming things!

English

Drake Thomas@MaskedTorah·8 May

@RatOrthodox Huh, why are you so sure about that? Seems pretty plausible to me that moderately superintelligent systems are very survivable with this kind of mind reading.

English

116

Keşfet

@AaronBergman18 @AndyMasley @DanielCHTan97 @deanwball @ericneyman @GuiveAssadi @rmushkatblat @max_spero_