sudarsh

4.4K posts

sudarsh banner
sudarsh

sudarsh

@sudarshk_

any pronouns; biophysics, alignment, transhumanism

Pasadena Katılım Haziran 2019
1.8K Takip Edilen330 Takipçiler
Sabitlenmiş Tweet
sudarsh
sudarsh@sudarshk_·
excited to have worked on this! a way of viewing a model’s chain of thought is to think of it as a scratchpad that it can use to help itself solve problems. even though the model may not be perfectly honest in its chain of thought, looking at it can still give valuable info!
METR@METR_Evals

Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.

English
1
2
18
2.7K
sudarsh
sudarsh@sudarshk_·
to keep track of my GPUs i name them after my little pony characters
English
2
0
1
20
sudarsh
sudarsh@sudarshk_·
@Tim_Hua_ yea, my hope is that the reward fn looks at the CoT but in a way that's orthogonal to safety or honesty
English
0
0
0
6
sudarsh
sudarsh@sudarshk_·
@Tim_Hua_ i haven't seen much evidence of what happens when you train against CoTs and it might be nice to have qualitative information about this
English
1
0
0
12
Liv
Liv@livgorton·
how are we all feeling tonight?..
English
15
0
50
2.8K
Anuja U
Anuja U@heyanuja·
things will never be chill again!
English
3
0
15
1.1K
sudarsh
sudarsh@sudarshk_·
@davidad i'd be really interested in attending this but can't dm you
English
0
0
0
5
davidad 🎇
davidad 🎇@davidad·
If you, dear reader, feel some resonance with the "Alignment with Awakening" direction (and perhaps are pleasantly surprised to see that someone like me is going there), and would be available to attend a small event on the topic in Berkeley May 26–28, please send me a DM! 13/15
English
7
1
100
3.1K
davidad 🎇
davidad 🎇@davidad·
Life update: After months of succession planning, I've passed the Directorship of ARIA's Safeguarded AI programme to @AmmannNora. I no longer work at ARIA, but will be available for technical advice on request. What's next for me? The short answer: "Alignment with Awakening". ⬇️
English
23
14
325
51.1K
sudarsh retweetledi
Sam Bowman
Sam Bowman@sleepinyourhat·
(I encountered an uneasy surprise when I got an email from an instance of Mythos Preview while eating a sandwich in a park. That instance wasn't supposed to have access to the internet.)
English
46
249
2.3K
340.7K
sudarsh retweetledi
NASA
NASA@NASA·
@alltooriah Tell your cat the Artemis II crew said pspsps
English
832
20.4K
236.3K
8.2M
Christian Keil
Christian Keil@pronounced_kyle·
We need to make this true again.
Christian Keil tweet media
English
23
13
428
16.2K
Dr Kareem Carr
Dr Kareem Carr@kareem_carr·
@BrainyMarsupial I'm curious about how this works logistically. Does the company pay for the tokens or do the engineers? Do they double-check the work or just commit it as long as it pasts the tests. Do they even write/read the tests? What do they do in the mean time? Pretend to work?
English
35
0
27
19.4K
Dr Kareem Carr
Dr Kareem Carr@kareem_carr·
I keep hearing that software engineers don’t write much code anymore and it’s mostly AI now. Can any software engineers confirm how true this is? Do you just drink coffee and watch Claude code all day now?
English
535
12
584
171.3K
sudarsh retweetledi
Drake Thomas
Drake Thomas@MaskedTorah·
@robbensinger @InverseMarcus Yeah, I work at Anthropic and do not think the current pace of progress is on track to make good tradeoffs about AI x-risk vs other catastrophes powerful AI might have prevented, regardless of which actor tries to make use of that limited time while stuck in a race.
English
8
16
240
12.3K
LMD
LMD@ItsMeLMD·
@suchnerve Best ones are "whats the point, why wouldn't you just xyz instead?" Like dawg it's not YOUR hypothetical sex scenario 💀
English
1
1
87
7.7K