

Grad
3.7K posts




Checkpoint merging and evaluation (a la WSM) has also been a really fun project that I've gotten to drive within Nemotron. Checkpoint merging gives us a cheap way to estimate eventual model performance without early decays (training time savings!) as well as Pareto buffet of possible final model choices to choose from at the end of training. We've been able to reproduce results from WSM at increasing scale, but more work remains on finding merge strategies for surpassing and replacing very long decay entirely (e.g. the 5T we do for Nemotron 3). The checkpoint we selected from pretraining for mid & post-training was a 500B token window merge at 25B token intervals using minus-sqrt decay emulation :)

Birth-Teens: Pretraining Teens-20s: SFT 20-death: RL Such is human nature.









Today, we are announcing Proximal. Proximal is a research lab for data. Our core belief is that data which is complex enough to teach today’s frontier models is not bottlenecked by domain experts, but by great ideas and excellent software. We are excited about a world in which coding agents can autonomously run for multiple weeks, solve the hardest technical problems and discover novel ideas that advance progress in various domains of science and engineering. We believe that we are not far from this future, but that the biggest bottleneck preventing us from achieving it is training data. Many companies work on data, but most of them are approaching it the wrong way. Historical capability breakthroughs are the result of creative engineers discovering scalable data collection methods, not thousands of contractors manually writing task demonstrations. Inevitably, the potential impact of human data will become smaller and smaller as model capabilities increase: agents are already outperforming most humans in many domains - the number of experts that are capable of judging model outputs shrinks with every new model release. Proximal is a new data company. We are not a recruiting firm or a talent marketplace, but a research and engineering organization that treats data as a problem which deserves the same level of rigor as work on training algorithms and model architectures. We think that this is the most impactful work towards agents that can autonomously solve complex technical problems, and intend to share our research and progress in the open.



Amazing tech report for an amazing model, probably the most precise open source recipe towards a sota model Was positively surprise to see many similarities between their recipe and what we did during intellect-3 and have implemented in prime-rl

Additionally, code execution, web fetch, memory, programmatic tool calling, tool search, and tool use examples are now generally available. Read more: claude.com/blog/improved-…









