Arin
357 posts





To new beginnings



India's top GitHub committer ka commit history 🤡 Iska kitne ka package lagega, write in comments



Our MIT class “6.S184: Introduction to Flow Matching and Diffusion Models” is now available on YouTube! We teach state-of-the-art generative AI algorithms for images, videos, proteins, etc. together with the mathematical tools to understand them. diffusion.csail.mit.edu (1/4)

unofficial guide to nyc’s creative underground, aka places i wish i knew sooner - nyc resistor: og hacker space for soldering to fiber arts - hex house: immersive art + tech playground - telos.haus: creative clubhouse w/ workshops & exhibitions - heart 442: artist-run soho spot for internet culture - fractal nyc: communal living, self run courses and psychedelic basement parties


why did R1's RL suddenly start working, when previous attempts to do similar things failed? theory: we've basically spent the last few years running a massive acausally distributed chain of thought data annotation program on the pretraining dataset. deepseek's approach with R1 is a pretty obvious method. they are far from the first lab to try "slap a verifier on it and roll out CoTs." but it didn't used to work that well. all of a sudden, though, it did start working. and reproductions of R1, even using slightly different methods, are just working too--it's not some super-finicky method that deepseek lucked out finding. all of a sudden, the basic, obvious techniques are... just working, much better than they used to. in the last couple of years, chains of thought have been posted all over the internet (LLM outputs leaking into pretraining like this is usually called "pretraining contamination"). and not just CoTs--outputs posted on the internet are usually accompanied by linguistic markers of whether they're correct or not ("holy shit it's right", "LOL wrong"). this isn't just true for easily verifiable problems like math, but also fuzzy ones like writing. those CoTs in the V3 training set gave GRPO enough of a starting point to start converging, and furthermore, to generalize from verifiable domains to the non-verifiable ones using the bridge established by the pretraining data contamination. and now, R1's visible chains of thought are going to lead to *another* massive enrichment of human-labeled reasoning on the internet, but on a far larger scale... the next round of base models post-R1 will be *even better* bases for reasoning models.











