Lynn (finally free) (@chordbug) - Twitter Profili

Lynn (finally free) retweetledi

togekk0💜@togekk0·29 Nis

Cosmic Robot Wife Kaguya!

English

1.7K

18.1K

134.5K

Lynn (finally free) retweetledi

✶Eefoj 𓅰𖦹@EefojLuk·2 Mar

and a little drawing of them:)

English

2.9K

Lynn (finally free)@chordbug·21 Şub

dude I posted this on the wrong website

English

2.6K

Lynn (finally free)@chordbug·21 Şub

he just did not make any

English

673

16.3K

Lynn (finally free) retweetledi

psychotronica@psychotronica_·12 Tem

HBO End of Day Sign-Off (1978) 💤sound on💤

English

604

2.7K

275.5K

Lynn (finally free)@chordbug·7 Ara

@voooooogel basically to me (as someone who knows hardly anything about this stuff tbh) "ouch" and "unction" share the vibe of being very common word-suffixes but rare words (but not nonexistent words!)

English

Lynn (finally free)@chordbug·7 Ara

@voooooogel maybe those tokens are more highly-predicted than other bad tokens just because they are popular? there are a lot of other words ending in -ouch and -unction, like slouch, or conjunction... how are those words tokenized?

English

756

thebes@voooooogel·7 Ara

did some looking into the ouches phenomenon and found a few things... it's both what you'd expect (hint: tokenization!!) and also not. so "ouches" is tokenized as you'd expect--['ouch', 'es']--which means the model is saying "ouch". but why? well, if you would just consult the logits... this is the first time the model said "ouches" in the qt's conversation. the left column of the table here shows the preceding token, and the right side shows the predictions for next token after it, sorted by likelihood, with the highest likelihood biblically acceptable token highlighted in green. so, what happens? basically, after ' wild', the model wants to say ' guess'. note the leading space--for common words, tokenizers have two tokens for the same word, one regular and one with a leading space. this is an optimization so the model can output a "free" space token instead of needing to output [' ', 'guess'] with two tokens. "guess" is not in the bible, so the sampler moves down the list, and two options down, we get the first biblically acceptable token... a solitary space, ' '. because not all words have a space-prefixed version, the model still needs to be able to output spaces the regular way, and since this is a fairly common token it's high up in the prediction list and is selected. now, the model is in a bit of a weird position. often in pretraining, if the model sees a regular space token instead of a string of those space-prefixed tokens, it's because someone was double spacing for some reason (e.g., maybe they're relying on HTML whitespace collapse behavior.) so the model keeps predicting space-prefixed tokens despite there already being a space--notice after the space, the top predictions are ' guess' (again), ' and', ' Peter', etc.--all space-prefixed. but because of the solitary space token, the biblical sampler is now in a state in the token trie where it can't select another space or space-prefixed token, it needs to output a regular token, because the KJV doesn't have any double-spacing. so the sampler skips over ' guess', ' and', ' Peter', etc. to look for the most likely non-space-prefixed token. so a few options down, we get these weird... filler tokens, 'ouch' and 'unction', that both appear in the bible (both only a single-digit number of times). interestingly enough, both of these words don't have a space prefixed version! that means in pretraining they always appeared as [' ', 'ouch'] and [' ', 'unction']--there's no ' ouch' or ' unction' space-prefixed token: ```python >>> [[tokenizer.decode(t) for t in tokenizer.encode(s, add_special_tokens=False)] for s in (' ouch', ' unction')] [[' ', 'ouch'], [' ', 'unction']] ``` so my guess as to what's happening with "ouches": 1. because the sampler rejects the highest-likelihood tokens, the model is pushed into "delaying" its prediction by picking a space 2. after picking a space, the sampler rejects the model's new attempt to double-space words, and instead picks the highest likelihood non-space-prefixed token 3. tokenizer bias pushes up 'ouch' and 'unction', because they happened to appear in pretraining a lot with spaces before them, as they don't have space-prefixed versions 4. if 'ouch' specifically is selected, the only biblically acceptable continuation is 'es', because "ouch" doesn't appear as a standalone word in the KJV, only as part of "ouches" (an archaic word meaning a setting for a gemstone, used in Exodus to describe Aaron's breastplate) but the question remains, why THESE words specifically? there's lots of tokens that don't have space-prefixed versions. so why are 'ouch' and 'unction' predicted so highly? i'm not sure, hence why "tokenizers suck" isn't the whole answer. (but as usual, "tokenizers suck" is a major piece of the answer.) (additionally, these words were showing up at the end of messages especially often because of a bug in where i was allowing end of text tokens, which i've now fixed. but that doesn't apply to 'ouches' / 'unction' in the middle of messages.) anyways, the best way to fix this would probably be to make the sampler slightly smarter about allowing space tokens (e.g., only allow a solitary space if it's X% more likely than an acceptable space-prefixed word), or even better, to use something like beam search or hfppl to allow the model to walk a few tokens forward in multiple branches and then pick the one that has the best overall probability, instead of greedily argmaxing token by token. maybe i'll add that someday :-)

thebes@voooooogel

llama-3.3-70b correctly guesses the sampling constraint (only allowed to use words that are in the bible)

English

260

65.9K

Lynn (finally free)@chordbug·8 Kas

@kklarika9 this is you youtube.com/watch?v=54Z6DZ…

YouTube

English

331

Lynn (finally free)@chordbug·6 Kas

I think I really, actually can't use this website anymore. blue sky is pretty nice so far. thanks for all the posts! don't give up

English

Lynn (finally free)@chordbug·6 Kas

@hikari_no_yume I enjoyed that, thank you internet user haruhi_no_yūut-- I mean hikari_no_yume

English

118

Kanbaru 🌟 (one hikari of too many)@hikari_no_yume·6 Kas

hello would you like to hear me talk for two minutes about some shit nobody cares about with no relation to politics? well here you go

English

1.6K

Lynn (finally free) retweetledi

What a week, huh? all Wednesdays@whataweekhuh·6 Kas

What a week, huh? all Wednesdays tweet media

ZXX

182

27.8K

208K

5.1M

Lynn (finally free)@chordbug·6 Kas

to this day I sometimes play "ipadle" manishearth.github.io/ipadle/ by @ManishEarth. great variant

English

1.6K

Lynn (finally free)@chordbug·6 Kas

you can play "hello wordl" to support the nyt games team strike, or any other time! hellowordl.net

English

1.7K

Lynn (finally free) retweetledi

New York Times Tech Guild@NYTGuildTech·4 Kas

CLICK-IT LINE: We ask that you not cross our digital picket line by playing any of the NYT Games (Wordle, Connections) as well as not using the Cooking App. You can continue to read @Nytimes news content.

English

1.7K

163.6K

Lynn (finally free) retweetledi

이치@ICHI__I_·5 Kas

ZXX

1.6K

30.7K

426.9K

Lynn (finally free)@chordbug·5 Kas

@kklarika9 I'm just glad to see someone else fight the senseless tyrrany of having to pretend those grape colors are "white" and "blue"

English

Lynn (finally free) retweetledi

NRC Gray@nrcgray·5 Kas

Another lady knight I sketched this afternoon:

English

4.2K

50.1K

574.6K

Lynn (finally free) retweetledi

OFF SCRIPT@offsrambledegg·5 Kas

ZXX

240

20.6K

245.9K

4.7M

Lynn (finally free) retweetledi

Lynn (finally free)@chordbug·29 Kas

this is how I align stuff using multiple cursors in vscode

English

122

Lynn (finally free)@chordbug·4 Kas

@freezing_cloud ah no, I mostly just use the mouse; whenever I do try to use space to scroll down it'll fuck up as described elsewhere in the replies and I regret it

English

freezing_cloud 🍂@freezing_cloud·4 Kas

@chordbug Lynn, do you have a setup for using this site keyboard only?

English

Lynn (finally free)@chordbug·2 Kas

do you use the space bar to scroll down on webpages?

English

1.5K

Lynn (finally free)

Keşfet