
Vikas Chandra
326 posts

Vikas Chandra
@vikasc
Senior Director of #AI Research @Meta | CMU Ph.D. | Ex visiting faculty at Stanford




๐ MobileLLM-R1.5 with exceptional performance is now available! ๐ MobileLLM-R1.5-950M Outperforms DeepSeek-R1-Distill-Qwen-1.5B on all math/coding benchmarks โ with ~40% fewer param ๐Big gains from on-policy KD! AIME jumps 15.5 โ 39.9 on MobileLLM-R1.5-950M ๐At 360M scale: MATH 28.4 โ 63.4, GSM8K 24.5 โ 52.8 (2รโ) Models: lnkd.in/gycHY8MS Collaborating with @erniecyc, Changsheng, @tydsh, et al.

Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2รโ5ร Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5ร higher MATH accuracy vs. Olmo-1.24B, and ~2ร vs. SmolLM2-1.7B. Uses just 1/10 the pre-training tokens compared to Qwen: matches or surpasses Qwen3 accuracy on multiple reasoning benchmarks while training on only 4.2T tokens (just 11.7% of Qwen3โs 36T).










We want to make it easier for more people to build with Llama โ so today weโre releasing new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint. Details on our new quantized Llama 3.2 on-device models โก๏ธ ai.meta.com/blog/meta-llamโฆ While quantized models have existed in the community before, these approaches often came at a tradeoff between performance and accuracy. To solve this, we Quantization-Aware Training with LoRA adaptors as opposed to only post-processing. As a result, our new models offer a reduced memory footprint, faster on-device inference, accuracy and portability โ while maintaining quality and safety for developers to deploy on resource-constrained devices. The new models can be downloaded now from Meta and on @huggingface.










MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: arxiv.org/abs/2402.14905


