tom white

4.2K posts

tom white banner
tom white

tom white

@dribnet

creations with code and networks

Wellington, New Zealand Katılım Haziran 2011
4.3K Takip Edilen11K Takipçiler
lyra bubbles
lyra bubbles@_lyraaaa_·
reproducing anthropics emotion activation probe paper on gemma4 e4b a bit noisy but it works!
lyra bubbles tweet media
English
13
14
298
12.7K
tom white
tom white@dribnet·
@wendlerch @jeremyphoward @aryaman2020 @jatin_n0 @voooooogel Thanks Chris! Another use of this contrastive synthetic data technique from years past was steering text-to-image generators away from putting text in the image. x.com/dribnet/status…
tom white@dribnet

@NeelNanda5 @ry_serene @jd_pressman @AlecRad @CasperKaae @hugo_larochelle @OleWinther1 Also: AFAIK no text to image engines currently support steering vectors, but my @pixray tool (which proceeded @midjourney, etc) *did* support using these and by default would apply a vector to suppress text appearing in the image. x.com/dribnet/status…

English
0
0
2
301
Aryaman Arora
Aryaman Arora@aryaman2020·
I’m very glad to see that Anthropic interp has caught up to the idea of generating a bunch of contrastive synthetic data for extracting supervised steering vectors from! It’s unfortunate that there’s no prior work to cite on this…
Anthropic@AnthropicAI

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English
20
20
446
55.4K
tom white
tom white@dribnet·
if it makes you feel better: i also introduced the idea of generating useful steering vectors from contrastive synthetic data in my 2016 paper - a whole section on augmenting inputs with low pass gaussian filter to derive a steering vector that produces less blurry samples. arxiv.org/abs/1609.04468
tom white tweet media
English
2
6
70
9.1K
Boris Cherny
Boris Cherny@bcherny·
@Rahatcodes 👋 This is one of the signals we use to figure out if people are having a good experience. We put it on a dashboard and call it the “fucks” chart
English
273
168
4.7K
309.7K
rahat
rahat@Rahatcodes·
Claude Code has a regex that detects "wtf", "ffs", "piece of shit", "fuck you", "this sucks" etc. It doesn't change behavior...it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will
rahat tweet media
English
536
766
14.3K
1.4M
tom white
tom white@dribnet·
the models, they just wanna converge
GIF
English
0
0
6
398
tom white
tom white@dribnet·
saxophone print + top-150 SigLIP image probe though mknn model agreement (a la platonic representation hypothesis) is not part of the test time compute process, it climbs naturally as the as the optimization evolves
tom white tweet mediatom white tweet media
English
1
0
9
715
tom white
tom white@dribnet·
45% mutual kNN between CLIP and SigLIP — not bad for two model families trained on different data with different objectives when probed with this print. revisiting ImageNet so I can build a toolbox for navigating more uncharted waters without class labels (stay tuned)...
English
0
0
6
270
tom white
tom white@dribnet·
"pirate ship" (ImageNet class 724)
tom white tweet media
English
1
0
4
426
tom white
tom white@dribnet·
unsure how AI interprets this print? treating the image as a linear probe on your favorite vision model and scraping a diverse dataset for maximum activations provides a coherent suggestion.
English
1
2
10
1.5K
tom white
tom white@dribnet·
or query your favorite vision model for semantic nearest neighbors - here's OpenAI-CLIP's top hits across CC3M using the baseball_player print as a probe
tom white tweet media
English
0
0
2
211
tom white
tom white@dribnet·
not seeing it? don't worry - your favorite imagenet model is.
tom white tweet media
English
1
0
7
659
tom white
tom white@dribnet·
traffic sign, baseball player, pomegranate
tom white tweet mediatom white tweet mediatom white tweet media
English
2
1
16
889
John Hewitt
John Hewitt@johnhewtt·
Lots of interp thought discusses the linearity of the residual stream! This blog post: the residual stream isn't linear in a way that provides formal leverage, and interp methods based on linearity should not be preferred beyond empirical utility. cs.columbia.edu/~johnhew/resid…
English
5
17
235
13K
tom white
tom white@dribnet·
Weapons-grade piggy bankness: One drawing. No training. Subtract the style, get a direction in SigLIP space. Sort 50K ImageNet images by cosine similarity: 41 of the top 50 are piggy banks (P@50 = 82%). The drawing is the classifier.
tom white tweet media
English
0
0
1
1.2K
tom white
tom white@dribnet·
piggy bank (ImageNet class 719)
tom white tweet media
English
1
0
2
341
tom white
tom white@dribnet·
@hyhieu226 enjoying this slow takeoff and will genuinely miss it
English
0
0
1
1.2K
Hieu Pham
Hieu Pham@hyhieu226·
Today, I finally feel the existential threat that AI is posing. When AI becomes overly good and disrupts everything, what will be left for humans to do? And it's when, not if.
English
307
181
2.1K
456.9K
tom white
tom white@dribnet·
@farmgeek fwiw: the email was garbage but the disclosure within the app is actually pretty good - they show at the document/file level what was accessed
English
1
0
1
259
John Hart
John Hart@farmgeek·
“All patients who are not impacted can see that in their MMH app” That’s great, except when you can’t log in and every method (password reset, one-time pass) fails. Shitshow.
English
10
9
67
2.1K
tom white
tom white@dribnet·
@RT_Artwork Got this too. 5 minutes before call they ask you to use the riverside client and forward you to a website clone (riverside dot name - BEWARE) with their malware installer (I stopped there). (would be up for a themed exhibit on scams showcasing all artists with this invite! 😂)
English
1
0
1
157
RyanThompson
RyanThompson@RT_Artwork·
Be careful with artwork commission requests. Starts out with request for a video call. I contacted the comnpany directly and found out this request was a scam. Getting on a call could have led to some software being downloaded and wallet drained. Anyone have experience with this?
RyanThompson tweet media
English
7
1
9
870