tokenbender

13.5K posts

tokenbender

@tokenbender

making gradients flow • eXperiments lab • heart, soul and brilliance

شامل ہوئے Temmuz 2014

886 فالونگ12.1K فالوورز

tokenbender@tokenbender·1d

@Yuchenj_UW > I can summon my autoresearch army to win it… if I have time. the point of autoresearch is that it won't really need much of "your time".

English

666

Yuchen Jin@Yuchenj_UW·1d

OpenAI just dropped a training challenge: Train a <16MB language model in 10 minutes on 8×H100s and minimize held-out loss on a fixed FineWeb dataset. Basically NanoGPT Speedrun. They’re sponsoring $1M in compute. I can summon my autoresearch army to win it… if I have time.

English

1.2K

108K

tokenbender@tokenbender·1d

@tpottery_ extended thinking is a luxury. we are in the cup noodle research era. but that bowl looks good.

English

T. Potter@tpottery_·1d

@tokenbender What about extended thinking ramen?

English

tokenbender@tokenbender·1d

i cook my instant ramen and sit with my instant coffee to find some instant relief in a nice research drop that might be out there and trigger my instant research. i need instant insight. instant clarity. instant expertise in a field i learned about 14 minutes ago. i need the abstract to hit my bloodstream like ultra burnt caffeine and the conclusion to knead my nausea. i need to scroll past one chart and become a different person. i need to misunderstand something at a historic pace.

English

121

4.6K

tokenbender ری ٹویٹ کیا

Ji-Ha@Ji_Ha_Kim·1d

Blog post - Transformers as Constrained Optimization Rewriting pre-norm decoder-only transformers as solutions to regularized objectives. Changing regularization to hard constraint gives a canonical temperature, generalizing to KL-divergence, ideas of cross-layer interaction.

English

596

28.3K

tokenbender@tokenbender·2d

@exploding_grad sounds great, hope to see it.

English

kendrick@exploding_grad·2d

@tokenbender I'm doing something similar. GPU poor - so I'm building 2 toy models (std vs attn res stream). Planning to run some basic mech interp experiments on them - so check if they are more interpretable. If it works, then i'll try scaling up the model arch and test further.

English

129

tokenbender@tokenbender·2d

To further my understanding and also reproduce the results of the paper - I have setup with repository where we can compare and understand the baseline vs mHC vs attention-residual-full vs attention-residual-block algos properly. Sharing more as runs complete later. link - github.com/tokenbender/na…

tokenbender@tokenbender

x.com/i/article/2033…

English

6.9K

tokenbender@tokenbender·3d

@clbswrs we have come from some 40% bs rate from 3.7 to 10-15% bs rate if the task isn’t too complex. but this last 10% is where we improve the slowest.

English

caleb 🐮@clbswrs·3d

@tokenbender Tried doing this with sonnet 3.7 last spring/summer and it pranked me probably 75% of the time. Maybe more. Definitely took more time to figure out how it had reward hacked than running the campaign myself at that point. Can u believe sonnet 3.7 was only a year ago

English

tokenbender@tokenbender·3d

true automation of research has been achieved. we just need to get good at the trivial job of finding the 10-20% cases of fraud in 2 hours of looping and 1k LOC PRs.

English

2.1K

tokenbender@tokenbender·3d

@SarahLacard how are we going to revert without knowing something is wrong? how would we write tests for logical issues or general policy violations without checking? survivorship bias code can be shipped but reliability is still necessary if you want to own the code you ship.

English

Sarah 🇨🇦@SarahLacard·3d

@tokenbender yeah exactly - for me it's a hard revert and not bother to care if it hid something somewhere, you're not going to be able to exhaustively check everything - revert and move on with better prompting during round 2

English

tokenbender@tokenbender·3d

@SarahLacard it already took a short-cut and was caught later. this is my confirmation of if clean-up is done or not. even that may need some quick manual checks.

English

Sarah 🇨🇦@SarahLacard·3d

@tokenbender that is setting it up to lie to you

English

tokenbender@tokenbender·4d

@LukaIvanic73477 yep, similar observation.

English

Luka Ivanic@LukaIvanic73477·4d

@tokenbender Straight up responds to my previous message requests again, instead of the last one. For me it's around 400k. But non-chat ability, it can hold its own until 600k, even on messy kernel code.

English

tokenbender@tokenbender·4d

450-500k context seems to be the mark for gpt 5.4 where it stops understanding what it is replying to and reaches its confused state.

English

3.9K

TDM (e/λ) (L8 vibe coder 💫)@cto_junior·4d

@tokenbender doing my bit simcore.netlify.app/explainers/moo…

English

tokenbender@tokenbender·4d

@cto_junior wow the quality has improved so much.

English

363

tokenbender@tokenbender·4d

x.com/i/article/2033…

ZXX

232

43.8K

tokenbender@tokenbender·4d

@himanshustwts mostly because my notes and perspective on mHC and residuals is clean already so i can fit things in that frame easily.

English

344

himanshu@himanshustwts·4d

@tokenbender awesome G. this was fasttt!

English

487

tokenbender@tokenbender·4d

I wrote something on Moonshot's latest research release - Attention Residuals. Intuition, notes and how you can understand standard residuals vs mHC vs attention residuals.

tokenbender@tokenbender

x.com/i/article/2033…

English

165

19.7K

tokenbender@tokenbender·4d

since Jan ‘26 itself, the number of such projects has increased to 4. these are all almost concluded but i face dopamine drought as soon as i tell myself to start writing a semi-academic article. if i could just throw my artifacts into a directory and finish and end with - “ohh this looked cool so i did that next”, everything would be so much more fun.

English

986

tokenbender@tokenbender·4d

every now and then i relapse towards my habit of experimenting with really important things but never releasing in public because it is way more effort for me to make things clickworthy and presentable than sate my curiosity and maximise tinkering with the next challenge.

English

2.7K

tokenbender@tokenbender·4d

@muzzdotdev not sure if you use opencode but if you do, DCP used to be one of my must-have plugins there.

English

muzz khan@muzzdotdev·4d

@tokenbender what is it

English

tokenbender@tokenbender·5d

how uninstalling opencode-dcp feels like after it has lost its way

English

دریافت کریں

@Yuchenj_UW @tpottery_ @exploding_grad @clbswrs @SarahLacard @LukaIvanic73477 @elonmusk @BarackObama