sudarsh

4.4K posts

sudarsh

@sudarshk_

any pronouns; biophysics, alignment, transhumanism

Pasadena Katılım Haziran 2019

1.8K Takip Edilen330 Takipçiler

Sabitlenmiş Tweet

sudarsh@sudarshk_·9 Ağu

excited to have worked on this! a way of viewing a model’s chain of thought is to think of it as a scratchpad that it can use to help itself solve problems. even though the model may not be perfectly honest in its chain of thought, looking at it can still give valuable info!

METR@METR_Evals

Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.

English

2.7K

sudarsh@sudarshk_·23m

@cutesuscat i wonder who i learned this from...

English

meowtase (in SF Apr 1-30)@cutesuscat·27m

@sudarshk_ Wow pretty based

English

sudarsh@sudarshk_·31m

to keep track of my GPUs i name them after my little pony characters

English

sudarsh@sudarshk_·1h

@BUNNlCULA I'd like to show up to a few sessions!

English

charlie@BUNNlCULA·2h

i have interest. i have a venue. reply to this if you’d like to be added to the Planning DM

charlie@BUNNlCULA

i want to form a chess club in sf (no founders)

English

1.3K

sudarsh@sudarshk_·1h

@Tim_Hua_ yea, my hope is that the reward fn looks at the CoT but in a way that's orthogonal to safety or honesty

English

Tim Hua 🇺🇦@Tim_Hua_·1h

@sudarshk_ We also hvae no idea how exactly they trained on the CoT...

English

Tim Hua 🇺🇦@Tim_Hua_·1h

I wonder if the existing "wow the Anthropic CoT are so much different than other model's CoTs" results are mostly just because Anthropic trained on the chain of thought, not because some generalization from all the character training...

Tim Hua 🇺🇦@Tim_Hua_

Anthropic accidentally trained against the chain of thought in Claude Mythos, Opus 4.6, and Sonnet 4.6

English

997

sudarsh@sudarshk_·1h

@Tim_Hua_ i haven't seen much evidence of what happens when you train against CoTs and it might be nice to have qualitative information about this

English

Tim Hua 🇺🇦@Tim_Hua_·1h

See e.g., lesswrong.com/posts/FG54euEA…

English

sudarsh retweetledi

Tomás (in SF now) Bjartur@BjarturTomas·1d

A practical, dependently typed language seems like one of the few paths out of this mess. Def worth supporting anyone working on this.

Taelin@VictorTaelin

really wish I could use it to audit Bend and improve its security in general. timing couldn't be better, it isn't every day that a new proof assistant is launched. if anyone has any path or idea on how I could be part of Glasswing please let me know!

English

1.4K

sudarsh@sudarshk_·7h

@celestepoasts truth nuke

English

Celeste (in amsterdam dm to hang)@celestepoasts·6d

I should crash inkhaven

Celeste (in amsterdam dm to hang) tweet media

meowtase (in SF Apr 1-30)@cutesuscat

My first post for inkhaven! (day 1/30) I plan to post multiple times per day.

English

769

sudarsh@sudarshk_·21h

@livgorton small

English

Liv@livgorton·23h

how are we all feeling tonight?..

English

2.8K

sudarsh@sudarshk_·1d

@heyanuja real :c

English

Anuja U@heyanuja·1d

things will never be chill again!

English

1.1K

sudarsh@sudarshk_·1d

this is loosely the reason i care a lot about aligned / aligning intelligence

Nathan 🔎@NathanpmYoung

I think companies have pretty good incentives to align AI. And it's not obvious to me that AI alignment is difficult. Though small chances of very bad outcomes can be very bad.

English

130

sudarsh@sudarshk_·1d

@davidad i'd be really interested in attending this but can't dm you

English

davidad 🎇@davidad·1 Nis

If you, dear reader, feel some resonance with the "Alignment with Awakening" direction (and perhaps are pleasantly surprised to see that someone like me is going there), and would be available to attend a small event on the topic in Berkeley May 26–28, please send me a DM! 13/15

English

100

3.1K

davidad 🎇@davidad·1 Nis

Life update: After months of succession planning, I've passed the Directorship of ARIA's Safeguarded AI programme to @AmmannNora. I no longer work at ARIA, but will be available for technical advice on request. What's next for me? The short answer: "Alignment with Awakening". ⬇️

English

325

51.1K

sudarsh retweetledi

Sam Bowman@sleepinyourhat·1d

(I encountered an uneasy surprise when I got an email from an instance of Mythos Preview while eating a sandwich in a park. That instance wasn't supposed to have access to the internet.)

English

249

2.3K

340.7K

sudarsh retweetledi

NASA@NASA·1d

@alltooriah Tell your cat the Artemis II crew said pspsps

English

832

20.4K

236.3K

8.2M

sudarsh@sudarshk_·2d

@alec_helbling @LRudL_ have some ragebait

English

128

sudarsh@sudarshk_·2d

@tomasmas @pronounced_kyle @fishPointer wow im there too

English

Tomás Vega@tomasmas·3d

@pronounced_kyle This will be the next Shenzhen and @fishPointer is gonna make it happen x.com/tomasmas/statu…

Tomás Vega@tomasmas

@ihorbeaver Bayview-Hunters point

English

1.5K

Christian Keil@pronounced_kyle·3d

We need to make this true again.

English

428

16.2K

sudarsh@sudarshk_·4d

@kareem_carr @BrainyMarsupial we try to run across multiple agents / think about what needs to be done into the future

English

136

Dr Kareem Carr@kareem_carr·4d

@BrainyMarsupial I'm curious about how this works logistically. Does the company pay for the tokens or do the engineers? Do they double-check the work or just commit it as long as it pasts the tests. Do they even write/read the tests? What do they do in the mean time? Pretend to work?

English

19.4K

Dr Kareem Carr@kareem_carr·4d

I keep hearing that software engineers don’t write much code anymore and it’s mostly AI now. Can any software engineers confirm how true this is? Do you just drink coffee and watch Claude code all day now?

English

535

584

171.3K

sudarsh retweetledi

Mysterious Semblance At The Strand Of Nightmares@BigGulpAmerikan·4d

It’s just bleak and violent misery at every level down to the subatomic. And what a stupid looking thing that is too. Now I know how space aliens must feel as they watch us.

The Scientific Lens@LensScientific

Despite being just one cell, Lacrymaria olor is a formidable predator that hunts and consumes other microorganisms. 📽: James Weiss

English

156

391

6.9K

376.5K

sudarsh retweetledi

Drake Thomas@MaskedTorah·4d

@robbensinger @InverseMarcus Yeah, I work at Anthropic and do not think the current pace of progress is on track to make good tradeoffs about AI x-risk vs other catastrophes powerful AI might have prevented, regardless of which actor tries to make use of that limited time while stuck in a race.

English

240

12.3K

sudarsh@sudarshk_·4d

@ItsMeLMD @suchnerve car seat headrest core

English

658

LMD@ItsMeLMD·4d

@suchnerve Best ones are "whats the point, why wouldn't you just xyz instead?" Like dawg it's not YOUR hypothetical sex scenario 💀

English

7.7K

Vivian@suchnerve·4d

Kinda want to make a bot that auto-replies to the negatively-toned replies and quotes with “YOU have nothing to worry about.” The sheer fucking hubris. The SHEER fucking hubris. L. M. F. A. O.

Vivian@suchnerve

Condom blowjobs work best if the condom is non-lubricated and you put flavorless lube on the inside to improve sensation, btw

English

302

10.4K

sudarsh retweetledi

HudZah@hud_zah·5d

keep sf whimsical, not corporate peepeepoopoostreet.com

Riley Walz@rtwlz

my friends and i bought a foreclosed alley in san francisco. it's an actual road that cars drive down! we're letting the entire internet design a mural on it. submit a drawing, vote on them, and the top 1,280 get painted on the pavement permanently ⬇️

English

396

125.2K

Keşfet

@cutesuscat @BUNNlCULA @Tim_Hua_ @celestepoasts @livgorton @heyanuja @davidad @elonmusk