tokenbender

13.5K posts

tokenbender banner
tokenbender

tokenbender

@tokenbender

making gradients flow • eXperiments lab • heart, soul and brilliance

شامل ہوئے Temmuz 2014
886 فالونگ12.1K فالوورز
tokenbender
tokenbender@tokenbender·
@Yuchenj_UW > I can summon my autoresearch army to win it… if I have time. the point of autoresearch is that it won't really need much of "your time".
English
0
0
12
666
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
OpenAI just dropped a training challenge: Train a <16MB language model in 10 minutes on 8×H100s and minimize held-out loss on a fixed FineWeb dataset. Basically NanoGPT Speedrun. They’re sponsoring $1M in compute. I can summon my autoresearch army to win it… if I have time.
Yuchen Jin tweet media
English
52
74
1.2K
108K
tokenbender
tokenbender@tokenbender·
@tpottery_ extended thinking is a luxury. we are in the cup noodle research era. but that bowl looks good.
English
0
0
0
44
tokenbender
tokenbender@tokenbender·
i cook my instant ramen and sit with my instant coffee to find some instant relief in a nice research drop that might be out there and trigger my instant research. i need instant insight. instant clarity. instant expertise in a field i learned about 14 minutes ago. i need the abstract to hit my bloodstream like ultra burnt caffeine and the conclusion to knead my nausea. i need to scroll past one chart and become a different person. i need to misunderstand something at a historic pace.
English
5
10
121
4.6K
tokenbender ری ٹویٹ کیا
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
Blog post - Transformers as Constrained Optimization Rewriting pre-norm decoder-only transformers as solutions to regularized objectives. Changing regularization to hard constraint gives a canonical temperature, generalizing to KL-divergence, ideas of cross-layer interaction.
Ji-Ha tweet media
English
10
64
596
28.3K
kendrick
kendrick@exploding_grad·
@tokenbender I'm doing something similar. GPU poor - so I'm building 2 toy models (std vs attn res stream). Planning to run some basic mech interp experiments on them - so check if they are more interpretable. If it works, then i'll try scaling up the model arch and test further.
English
1
0
2
129
tokenbender
tokenbender@tokenbender·
@clbswrs we have come from some 40% bs rate from 3.7 to 10-15% bs rate if the task isn’t too complex. but this last 10% is where we improve the slowest.
English
0
0
2
57
caleb 🐮
caleb 🐮@clbswrs·
@tokenbender Tried doing this with sonnet 3.7 last spring/summer and it pranked me probably 75% of the time. Maybe more. Definitely took more time to figure out how it had reward hacked than running the campaign myself at that point. Can u believe sonnet 3.7 was only a year ago
English
1
0
2
77
tokenbender
tokenbender@tokenbender·
true automation of research has been achieved. we just need to get good at the trivial job of finding the 10-20% cases of fraud in 2 hours of looping and 1k LOC PRs.
tokenbender tweet media
English
4
2
40
2.1K
tokenbender
tokenbender@tokenbender·
@SarahLacard how are we going to revert without knowing something is wrong? how would we write tests for logical issues or general policy violations without checking? survivorship bias code can be shipped but reliability is still necessary if you want to own the code you ship.
English
1
0
2
32
Sarah 🇨🇦
Sarah 🇨🇦@SarahLacard·
@tokenbender yeah exactly - for me it's a hard revert and not bother to care if it hid something somewhere, you're not going to be able to exhaustively check everything - revert and move on with better prompting during round 2
English
1
0
1
25
tokenbender
tokenbender@tokenbender·
@SarahLacard it already took a short-cut and was caught later. this is my confirmation of if clean-up is done or not. even that may need some quick manual checks.
English
1
0
3
66
Luka Ivanic
Luka Ivanic@LukaIvanic73477·
@tokenbender Straight up responds to my previous message requests again, instead of the last one. For me it's around 400k. But non-chat ability, it can hold its own until 600k, even on messy kernel code.
English
1
0
5
64
tokenbender
tokenbender@tokenbender·
450-500k context seems to be the mark for gpt 5.4 where it stops understanding what it is replying to and reaches its confused state.
English
6
0
54
3.9K
tokenbender
tokenbender@tokenbender·
@himanshustwts mostly because my notes and perspective on mHC and residuals is clean already so i can fit things in that frame easily.
English
0
0
5
344
tokenbender
tokenbender@tokenbender·
since Jan ‘26 itself, the number of such projects has increased to 4. these are all almost concluded but i face dopamine drought as soon as i tell myself to start writing a semi-academic article. if i could just throw my artifacts into a directory and finish and end with - “ohh this looked cool so i did that next”, everything would be so much more fun.
English
1
3
23
986
tokenbender
tokenbender@tokenbender·
every now and then i relapse towards my habit of experimenting with really important things but never releasing in public because it is way more effort for me to make things clickworthy and presentable than sate my curiosity and maximise tinkering with the next challenge.
English
7
2
71
2.7K
tokenbender
tokenbender@tokenbender·
@muzzdotdev not sure if you use opencode but if you do, DCP used to be one of my must-have plugins there.
tokenbender tweet media
English
0
0
3
79
tokenbender
tokenbender@tokenbender·
how uninstalling opencode-dcp feels like after it has lost its way
tokenbender tweet media
English
5
0
29
2K