

Hong Ge
217 posts

@Hong_Ge2
Senior Research Fellow at University of Cambridge




why did R1's RL suddenly start working, when previous attempts to do similar things failed? theory: we've basically spent the last few years running a massive acausally distributed chain of thought data annotation program on the pretraining dataset. deepseek's approach with R1 is a pretty obvious method. they are far from the first lab to try "slap a verifier on it and roll out CoTs." but it didn't used to work that well. all of a sudden, though, it did start working. and reproductions of R1, even using slightly different methods, are just working too--it's not some super-finicky method that deepseek lucked out finding. all of a sudden, the basic, obvious techniques are... just working, much better than they used to. in the last couple of years, chains of thought have been posted all over the internet (LLM outputs leaking into pretraining like this is usually called "pretraining contamination"). and not just CoTs--outputs posted on the internet are usually accompanied by linguistic markers of whether they're correct or not ("holy shit it's right", "LOL wrong"). this isn't just true for easily verifiable problems like math, but also fuzzy ones like writing. those CoTs in the V3 training set gave GRPO enough of a starting point to start converging, and furthermore, to generalize from verifiable domains to the non-verifiable ones using the bridge established by the pretraining data contamination. and now, R1's visible chains of thought are going to lead to *another* massive enrichment of human-labeled reasoning on the internet, but on a far larger scale... the next round of base models post-R1 will be *even better* bases for reasoning models.

✨Applications are now open for PhDs at the Cambridge Machine Learning Group!✨ We're looking for outstanding candidates interested in fundamental ML research and applications to scientific domains! More info: mlg.eng.cam.ac.uk/phd_programme_… 🧵Find more about PIs & focus areas below!




It's paper day! In a new paper, led by my colleague @hanyuzhang17 at @UWaterlooAstro , we work on improving the priors for EFTofLSS analysis by taking advantage of information coming from HOD galaxy mocks. Here the main highlights in the 🧵!





Here's @torfjelde with the obligatory first Bayes slide with the proportional posterior. This talk is about @TuringLang, which is how I got into Julia seriously!


Nope. Like Gates himself said, we might see two more cycles of improvement but scaling what we have got will not get us to AGI. Don’t believe the hype.



globaltimes.cn/page/202406/13… If China achieves fusion power before we do, it's game over.

Matt Hoffman (Google) presenting on “ Running Many-Chain MCMC on Cheap GPUs”. #AISTATS2024


