Fabien Michel
230 posts

Fabien Michel
@Fab__Up
building stuff Founder @ RentHunter


The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There is more to RL than meets the eye! Here is my breakdown of the paper along with a few tests: youtu.be/3GlFd3doO3U?si… The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities. About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.






