
lots of talk about agi, asi, rsi but ask any frontier LLM to roll a die and it will almost always say "4." claude, gpt, kimi - doesn't matter, 4.4.4.4. so here's how i post-trained a model to reliably roll a die (i.e. each number ~1/6th of the time) & why it's a nice sandbox for one of the most interesting problems in rl i.e. getting a model to actually explore instead of just following strategies it already knows 🧵

