tom white

4.2K posts

tom white

@dribnet

creations with code and networks

Wellington, New Zealand Katılım Haziran 2011

4.3K Takip Edilen11K Takipçiler

tom white@dribnet·13 Nis

@_lyraaaa_ @tahsin_mayeesha final_vectors.npz 🙏

lyra bubbles@_lyraaaa_·13 Nis

@tahsin_mayeesha x.com/_lyraaaa_/stat… which code

lyra bubbles@_lyraaaa_

average vibe coded research project folder

English

404

lyra bubbles@_lyraaaa_·13 Nis

reproducing anthropics emotion activation probe paper on gemma4 e4b a bit noisy but it works!

English

298

12.7K

tom white@dribnet·3 Nis

@wendlerch @jeremyphoward @aryaman2020 @jatin_n0 @voooooogel Thanks Chris! Another use of this contrastive synthetic data technique from years past was steering text-to-image generators away from putting text in the image. x.com/dribnet/status…

tom white@dribnet

@NeelNanda5 @ry_serene @jd_pressman @AlecRad @CasperKaae @hugo_larochelle @OleWinther1 Also: AFAIK no text to image engines currently support steering vectors, but my @pixray tool (which proceeded @midjourney, etc) *did* support using these and by default would apply a vector to suppress text appearing in the image. x.com/dribnet/status…

English

301

Chris Wendler@wendlerch·3 Nis

@dribnet @jeremyphoward @aryaman2020 @jatin_n0 @voooooogel Thats cool !

English

237

Aryaman Arora@aryaman2020·2 Nis

I’m very glad to see that Anthropic interp has caught up to the idea of generating a bunch of contrastive synthetic data for extracting supervised steering vectors from! It’s unfortunate that there’s no prior work to cite on this…

Anthropic@AnthropicAI

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English

446

55.4K

tom white@dribnet·2 Nis

if it makes you feel better: i also introduced the idea of generating useful steering vectors from contrastive synthetic data in my 2016 paper - a whole section on augmenting inputs with low pass gaussian filter to derive a steering vector that produces less blurry samples. arxiv.org/abs/1609.04468

English

9.1K

Aryaman Arora@aryaman2020·2 Nis

@jatin_n0 arxiv.org/abs/2501.17148, literally figure 1. but the idea was in @voooooogel's tweets before too surely

English

3.4K

tom white@dribnet·1 Nis

@bcherny @Rahatcodes Goodhart's Law here we come 🤬

English

140

Boris Cherny@bcherny·1 Nis

@Rahatcodes 👋 This is one of the signals we use to figure out if people are having a good experience. We put it on a dashboard and call it the “fucks” chart

English

273

168

4.7K

309.7K

rahat@Rahatcodes·31 Mar

Claude Code has a regex that detects "wtf", "ffs", "piece of shit", "fuck you", "this sucks" etc. It doesn't change behavior...it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will

English

536

766

14.3K

1.4M

tom white@dribnet·14 Mar

the models, they just wanna converge

GIF

English

398

tom white@dribnet·14 Mar

saxophone print + top-150 SigLIP image probe though mknn model agreement (a la platonic representation hypothesis) is not part of the test time compute process, it climbs naturally as the as the optimization evolves

English

715

tom white@dribnet·14 Mar

@sebkrier one-shotted just enough to win

English

1.5K

Séb Krier@sebkrier·14 Mar

This is wild. theaustralian.com.au/business/techn…

English

222

1.7K

12.7K

13.2M

tom white@dribnet·12 Mar

45% mutual kNN between CLIP and SigLIP — not bad for two model families trained on different data with different objectives when probed with this print. revisiting ImageNet so I can build a toolbox for navigating more uncharted waters without class labels (stay tuned)...

English

270

tom white@dribnet·10 Mar

"pirate ship" (ImageNet class 724)

English

426

tom white@dribnet·10 Mar

unsure how AI interprets this print? treating the image as a linear probe on your favorite vision model and scraping a diverse dataset for maximum activations provides a coherent suggestion.

English

1.5K

tom white@dribnet·6 Mar

or query your favorite vision model for semantic nearest neighbors - here's OpenAI-CLIP's top hits across CC3M using the baseball_player print as a probe

English

211

tom white@dribnet·22 Şub

not seeing it? don't worry - your favorite imagenet model is.

English

659

tom white@dribnet·22 Şub

traffic sign, baseball player, pomegranate

English

889

tom white@dribnet·4 Mar

@johnhewtt x.com/dribnet/status…

tom white@dribnet

Weapons-grade piggy bankness: One drawing. No training. Subtract the style, get a direction in SigLIP space. Sort 50K ImageNet images by cosine similarity: 41 of the top 50 are piggy banks (P @50 = 82%). The drawing is the classifier.

QME

954

John Hewitt@johnhewtt·4 Mar

Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility. cs.columbia.edu/~johnhew/resid…

English

235

13K

tom white@dribnet·4 Mar

English

1.2K

tom white@dribnet·4 Mar

piggy bank (ImageNet class 719)

English

341

tom white@dribnet·23 Şub

@DlSPUTED thanks! its a chonky 18in screenprint

English

conrad house@DlSPUTED·22 Şub

@dribnet love the baseball player

English

124

tom white@dribnet·11 Şub

@hyhieu226 enjoying this slow takeoff and will genuinely miss it

English

1.2K

Hieu Pham@hyhieu226·11 Şub

Today, I finally feel the existential threat that AI is posing. When AI becomes overly good and disrupts everything, what will be left for humans to do? And it's when, not if.

English

307

181

2.1K

456.9K

tom white@dribnet·9 Oca

@farmgeek fwiw: the email was garbage but the disclosure within the app is actually pretty good - they show at the document/file level what was accessed

English

259

John Hart@farmgeek·9 Oca

“All patients who are not impacted can see that in their MMH app” That’s great, except when you can’t log in and every method (password reset, one-time pass) fails. Shitshow.

English

2.1K

tom white@dribnet·7 Oca

@RT_Artwork Got this too. 5 minutes before call they ask you to use the riverside client and forward you to a website clone (riverside dot name - BEWARE) with their malware installer (I stopped there). (would be up for a themed exhibit on scams showcasing all artists with this invite! 😂)

English

157

RyanThompson@RT_Artwork·6 Oca

Be careful with artwork commission requests. Starts out with request for a video call. I contacted the comnpany directly and found out this request was a scam. Getting on a call could have led to some software being downloaded and wallet drained. Anyone have experience with this?

English

870

Keşfet

@_lyraaaa_ @tahsin_mayeesha @wendlerch @jeremyphoward @aryaman2020 @jatin_n0 @voooooogel @bcherny