Clément Dumas

944 posts

Clément Dumas

@Butanium_

MATS 7/7.1 Scholar w/ Neel Nanda MSc at @ENS_ParisSaclay prev research intern at DLAB @EPFL AI safety research / improv theater

London 参加日 Aralık 2018

606 フォロー中712 フォロワー

固定されたツイート

Clément Dumas@Butanium_·7 Nis

New paper w/@jkminder & @NeelNanda5! What do chat LLMs learn in finetuning? Anthropic introduced a tool for this: crosscoders, an SAE variant. We find key limitations of crosscoders & fix them with BatchTopK crosscoders This finds interpretable and causal chat-only features! 🧵

English

202

37.1K

Clément Dumas@Butanium_·8h

@snigus If you replace the half layers of Gemma 2b chat with the base model one you also get openAI #A__Other_techniques_in_the_model_diffing_toolkit" target="_blank" rel="nofollow noopener">lesswrong.com/posts/xmpauEXE…

English

425

William Wale@snigus·19h

I have a small language model and it’s been pre trained. Now I post train it to say “I’m a language model”. With no mention of openAI The trained model still ends up saying it’s a LLM made by openAI. Even tho OpenAI is never mentioned in the instruction tuning dataset, and there in fact is one sample that says “I am being developed by Anthropic” (not true)! Makes me think models saying they’re made by so and so is pretty weak evidence of copying /stealing / distillation.

English

279

28.9K

Clément Dumas@Butanium_·1d

@louisvarge You can also hack the built-in team feature to make this work. Was about to write a post about this but feels like channels might be cleaner?

English

Louis Arge@louisvarge·2d

i made a thing where now any Claude Code can send messages to any other Claude Code on my machine they can ask clarifying questions about work, or become friends

English

240

222

3.9K

615.2K

Clément Dumas@Butanium_·2d

@maxsloef Welcome to the club #issuecomment-3513763128" target="_blank" rel="nofollow noopener">github.com/huggingface/pe…

English

121

max!@maxsloef·2d

the HF Llama 3 tokenizer was silently stripping spaces before punctuation on decode. every hf llama 3 model, every fine-tune, every descendant. trillions of tokens. i’m so grateful this stuff is open source — but man, the fact that only a handful of people have ever noticed this really makes you wonder how many other subtle but important bugs are just silently lurking, across all these trillions of tokens

English

1.5K

max!@maxsloef·2d

ZXX

111

6.5K

Clément Dumas@Butanium_·2d

credit to my friend nataliia for the highlights!

English

Clément Dumas@Butanium_·2d

Read the full piece: butanium.github.io/boom-incident/

English

Clément Dumas@Butanium_·2d

In the first thread I focused on the funny stuff but there is so much more, like some very touching moment in this piece 🥺

Clément Dumas@Butanium_

I asked Claude Code to "nuke" a task on my cluster, then sent "boom" >100 times in the chat. Opus 4.6 built an very flore including a NeurIPS best paper award, a @ESYudkowsky tweet thread, an EU Boom Act, half of Anthropic reacting, and much more. 🧵 (1/10)

English

208

Clément Dumas@Butanium_·3d

@thkostolansky @liminal_bardo @ESYudkowsky Me at my next RM meeting: so I did a small side project...

English

Tim Kostolansky@thkostolansky·3d

@Butanium_ @liminal_bardo @ESYudkowsky “i do ai research”

English

Clément Dumas@Butanium_·3d

English

4.4K

Clément Dumas@Butanium_·3d

@AdriGarriga I was literally laughing at my computer screen in the MATS office lol

English

Adrià Garriga-Alonso@AdriGarriga·3d

@Butanium_ I'm glad you actually read Claude's replies after the booms. Seems it judged you correctly.

English

Clément Dumas@Butanium_·3d

I should have ended with the link to the actual post: butanium.github.io/boom-incident/ or butanium.github.io/boom-incident/… for a selection of cool artifact from the conversation

English

202

Clément Dumas@Butanium_·3d

One last thing in case it's not obvious, the original transcript didn't include renders of the tweets in html, just md render, e.g. "*@sama:*\n\n"we're excited [...]"\n\n*community note: \"it cannot\"*" See the full transcript here: github.com/Butanium/boom-…

English

225

ディスカバー

@snigus @louisvarge @maxsloef @thkostolansky @liminal_bardo @ESYudkowsky @AdriGarriga @elonmusk