Ivan Rubachev
282 posts

Ivan Rubachev
@puhsuuu
ML Researcher @YandexResearch | Tabular ML

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)




Reviving ConvNeXt for Efficient Convolutional Diffusion Models github.com/star-kwon/FCDM arxiv.org/abs/2603.09408… the authors propose an improved convnext-based diffusion model architecture that reportedly matches DiT-XL/2 quality with 7x fewer training steps



Key finding: hybrid models are substantially more data-efficient than transformers. We show this through rigorous theory + controlled experiments. On MMLU, Olmo Hybrid matches Olmo 3’s accuracy using 49% fewer tokens—roughly 2× efficiency.



Emacs IS an age verification scheme. Nobody under 40 uses it.



















