


Tomasz Limisiewicz
240 posts

@TomLimi
Postdoctoral researcher at @meta Fair and @uwnlp , Interested in going into the inner workings of neural networks, multilingualism, and fairer NLP (he/him)







Few of us from Meta SuperIntelligence lab will attend ICLR this year- Happy to chat in person. If you are interested in joining the networking mixer, please register here- events.atmeta.com/iclrnetworking…





We are releasing Bolmo today! Bolmo is the best byte-level model so far. It comes close to and sometimes surpasses Olmo 3. Bolmo also performs competitively in terms of speed & is fully open. I was skeptical of byte-level models for a long time but I finally switched camps🧵





A NEW PAPER FROM YANN LECUN: LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics This could be one of LeCun's last papers at Meta (lol), but it's a really interesting one I think. Quick summary: Yann LeCun's big idea is JEPA, a self-supervised learning method. However, there are various failure modes of this approach, so training strong JEPA models is very brittle, unstable, and quite difficult. So overall JEPA has seen little adoption in practice. This paper tries to directly address this, making specific design decisions that improve training stability. The authors identify the isotropic Gaussian as the optimal distribution that JEPA models’ embeddings should follow and design the Sketched Isotropic Gaussian Regularization (SICReg) to constrain embeddings to reach that ideal distribution. This forms the LeJEPA framework, which can be implemented in ~50 lines of code. On empirical tests, the authors demonstrate stability of training across hyperparameters, architectures, and datasets. A result particularly interesting to me however is that training a LeJEPA model from scratch directly on the downstream dataset outperforms finetuning a DINOv2/v3 model on the dataset!

@thawani_avijit Haha. I am afraid people interpreted my “delete tokenizer” as “use bytes directly without BPE”, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.








