
Omer Levy
1.2K posts

Omer Levy
@omerlevy_
AI Researcher at Google DeepMind





Super thrilled to share that our AI has has now reached silver medalist level in Math at #imo2024 (1 point away from 🥇)! Since Jan, we now not only have a much stronger version of #AlphaGeometry, but also an entirely new system called #AlphaProof, capable of solving many more Olympiad problems. This is a large-scale project that I was fortunate to co-lead at @GoogleDeepMind! See our blog & NYT articel below! Blog: dpmd.ai/imo-silver NYT: nytimes.com/2024/07/25/sci…






Care about LLM evaluation? 🤖 🤔 We bring you🕊️ DOVE a massive (250M!) collection of LLMs outputs On different prompts, domains, tokens, models... Join our community effort to expand it with YOUR model predictions & become a co-author!




How can we reduce pretraining costs for multi-modal models without sacrificing quality? We study this Q in our new work: arxiv.org/abs/2411.04996 At @AIatMeta, We introduce Mixture-of-Transformers (MoT), a sparse architecture with modality-aware sparsity for every non-embedding transformer parameter (e.g., feed-forward networks, attention matrices, and layer normalization). MoT achieves dense-level performance with up to 66% fewer FLOPs! ✅ Chameleon setting (text + image generation): Our 7B MoT matches dense baseline quality using just 55.8% of the FLOPs. ✅ Extended to speech as a third modality, MoT achieves dense-level speech quality with only 37.2% of the FLOPs. ✅ Transfusion setting (text autoregressive + image diffusion): MoT matches dense model quality using one-third of the FLOPs. ✅ System profiling shows MoT achieves dense-level image quality in 47% and text quality in 75.6% of the wall-clock time** Takeaway: Modality-aware sparsity in MoT offers a scalable path to efficient, multi-modal AI with reduced pretraining costs. Work of a great team with @liliyu_lili, Liang Luo, @sriniiyer88, Ning Dong, @violet_zct, @gargighosh, @ml_perception, @scottyih, @LukeZettlemoyer, @VictoriaLinML.👏 **Measured on AWS p4de.24xlarge instances with NVIDIA A100 GPUs.







