Vikas Chandra

326 posts

Vikas Chandra

@vikasc

Senior Director of #AI Research @Meta | CMU Ph.D. | Ex visiting faculty at Stanford

Menlo Park, CA Beigetreten Nisan 2009

173 Folgt552 Follower

Vikas Chandra@vikasc·25 Oca

On-Device LLMs: State of the Union, 2026 Three years ago, running an LLM on a phone was a toy demo. Today, billion-parameter models run in real time. What changed? Not just faster chips - we had to rethink everything. Full post: v-chandra.github.io/on-device-llms/

English

229

Vikas Chandra@vikasc·18 Oca

The AI industry is betting on hardware to solve the memory wall. But what if the bottleneck isn't hardware, but how we use it? New post on attacking memory constraints algorithmically—and why the "bigger is better" era is ending. v-chandra.github.io/ai-memory-wall/

English

170

Vikas Chandra@vikasc·15 Oca

Context graphs are AI's next trillion-dollar opportunity. The debate is focused on enterprise. The bigger opportunity is personal - and only on-device AI models can capture it. v-chandra.github.io/personal-conte…

English

Vikas Chandra@vikasc·17 Ara

Is "Token Anxiety" a thing already or did I just coin that term?

English

Vikas Chandra retweetet

Yuandong Tian@tydsh·26 Kas

Using On-policy distillation leads to great performance boost even for small models😃Great work from @zechunliu @erniecyc and Changsheng (@vikasc's group)!

Zechun Liu@zechunliu

🚀 MobileLLM-R1.5 with exceptional performance is now available! 🌟 MobileLLM-R1.5-950M Outperforms DeepSeek-R1-Distill-Qwen-1.5B on all math/coding benchmarks — with ~40% fewer param 🌟Big gains from on-policy KD! AIME jumps 15.5 → 39.9 on MobileLLM-R1.5-950M 🌟At 360M scale: MATH 28.4 → 63.4, GSM8K 24.5 → 52.8 (2×↑) Models: lnkd.in/gycHY8MS Collaborating with @erniecyc, Changsheng, @tydsh, et al.

English

187

32.3K

Vikas Chandra@vikasc·19 Eyl

At Meta Connect today, we announced a set of AI tools that let us create immersive worlds, just from natural language prompts. We created first of its kind foundational GenAI models that enable rapid and high quality content creation.

English

200

Vikas Chandra retweetet

Zechun Liu@zechunliu·12 Eyl

Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training, MobileLLM-R1 demonstrates strong reasoning ability. Despite using only 4.2T tokens, 11.7% compared to 36T pretraining token Qwen used, it delivers remarkable performance. Collaborating with @erniecyc , Changsheng, et al.

AK@_akhaliq

Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the pre-training tokens compared to Qwen: matches or surpasses Qwen3 accuracy on multiple reasoning benchmarks while training on only 4.2T tokens (just 11.7% of Qwen3’s 36T).

English

118

142.7K

Vikas Chandra retweetet

Zechun Liu@zechunliu·4 Mar

🚀 We're thrilled to announce that the SoTA low-bit quantization ParetoQ code is now open-source! 🌟 github.com/facebookresear… 🔍 What does this repo support? 🌟State-of-the-art sub-4-bit quantization: It is a significant upgrade from our previous LLM-QAT repo. Outperforming all previous methods, our advanced tech supports ultra low-bit quantization (binary to 4-bit). Try it on your model & training data now! 🥳 🌟Comprehensive comparison across different bits: Our unified framework enables reliable scaling laws across various bit widths. 📊 🎉What's Coming Next? 🌟Quantized weights release: Stay tuned for the upcoming release of all quantized weights. ⏳ 🌟We're working on releasing a 2-bit quantization tensor-core kernel soon. ⚙️

English

2.2K

Vikas Chandra@vikasc·6 Şub

Quantization is a pivotal research area to reduce computational and memory demands. The optimal bit-width for achieving best tradeoff between quantized model size and accuracy has been a subject of ongoing debate. Paper: arxiv.org/pdf/2502.02631

English

174

Vikas Chandra retweetet

Forrest Iandola@fiandola·3 Ara

[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from @Meta: interactive video segmentation and tracking on an iPhone!

English

107

516

66.2K

Vikas Chandra retweetet

Yunyang Xiong@YoungXiong1·2 Ara

🚀Excited to share our Efficient Track Anything. It is small but mighty, >2x faster than SAM2 on A100 and runs > 10 FPS on iPhone 15 Pro Max. How’d we do it? EfficientSAM + Efficient Memory Attention! Paper: arxiv.org/pdf/2411.18933 Project (demo): yformer.github.io/efficient-trac… with: @ChongZhou7, @mukosame, @klightlm, @zechunliu, @_sakshams_ , @balakrishnan_vr, @fiandola, @bilgeesra, @raghuraman, @vikasc, etc

English

111

16.5K

Vikas Chandra retweetet

Yuandong Tian@tydsh·25 Eki

Our SpinQuant work (arxiv.org/abs/2405.16406) has been used in the quantized versions of LLaMA 3.2 1B/3B model released in Meta Connect'24. Congrats to all co-authors! @zechunliu, Changshen Zhao, Raghuraman Krishnamoorthi, Dhruv Choudhary, Bilge Soran, Igor Fedorov, @vikasc, @TiRune

AI at Meta@AIatMeta

We want to make it easier for more people to build with Llama — so today we’re releasing new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint. Details on our new quantized Llama 3.2 on-device models ➡️ ai.meta.com/blog/meta-llam… While quantized models have existed in the community before, these approaches often came at a tradeoff between performance and accuracy. To solve this, we Quantization-Aware Training with LoRA adaptors as opposed to only post-processing. As a result, our new models offer a reduced memory footprint, faster on-device inference, accuracy and portability — while maintaining quality and safety for developers to deploy on resource-constrained devices. The new models can be downloaded now from Meta and on @huggingface.

English

8.9K

Vikas Chandra retweetet

Yunyang Xiong@YoungXiong1·24 Eki

🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: huggingface.co/papers/2410.17… 🧑🏻‍💻Code: github.com/Vision-CAIR/Lo… 🚀Project (Demo): vision-cair.github.io/LongVU We propose LongVU, a video LLM with a spatiotemporal adaptive compression mechanism designed for real-world hour-long video understanding. LongVU adaptively reduces the number of video tokens by leveraging (1) DINOv2 feature similarity across frames, (2) Cross-modal text-frame similarity, and (3) temporal frame similarity. 1. High quality on video-based QA: 67.6% on EgoSchema, 66.9% on MVBench, 65.4% on MLVU and 59.5% on VideoMME long 2. +5% accuracy boost on average across various video understanding benchmarks compared to LLaVA-OneVision and VideoChat2 3. Our edge model, LongVU-3B, also outperformed 4B counterparts such as VideoChat2(Phi-3) and Phi-3.5-vision-instruct by a large margin. with: @xiaoqian_shen @liuzhuang1234 @Hu_Hsu @garvinchen2 @klightlm @zechunliu @balakrishnan_vr @Fanyi_Xiao @hyunwoojkim @bilgeesra @raghuraman @moElhoseiny @vikasc

English

251

49.5K

Vikas Chandra retweetet

Mingchen Zhuge@MingchenZhuge·16 Eki

🔔 new 𝗔𝗴𝗲𝗻𝘁-𝗮𝘀-𝗮-𝗝𝘂𝗱𝗴𝗲 paper: 𝗖𝗮𝗻 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗮𝘀 𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲𝗹𝘆 𝗮𝘀 𝗵𝘂𝗺𝗮𝗻𝘀? 𝗬𝗲𝘀, 𝘁𝗵𝗲𝘆 𝗰𝗮𝗻! 📄 arxiv.org/abs/2410.10934… 👨‍💻 github.com/metauto-ai/age… Introducing 𝗔𝗴𝗲𝗻𝘁-𝗮𝘀-𝗮-𝗝𝘂𝗱𝗴𝗲, a groundbreaking proof-of-concept that reduces costs and time by 97%, while providing rich, intermediate feedback. It precisely captures the natural step-by-step processes of agentic systems. We also developed 𝗗𝗲𝘃𝗔𝗜, a new benchmark featuring 55 automated AI development tasks and 365 requirements. Agent-as-a-Judge not only outperforms LLM-as-a-Judge but also closely mirrors human evaluations with greater efficiency and precision. The real game-changer? It provides reliable reward signals, paving the way for scalable, self-improving agentic systems. Thanks my Meta/KAUST mentors/peers/collaborators @SchmidhuberAI @tydsh @zechunliu @vikasc @YoungXiong1 @vikasc @Obs01ete @erniecyc @oneDylanAshley ...

Menlo Park, CA 🇺🇸 English

205

183.4K

Vikas Chandra retweetet

Andrej Karpathy@karpathy·23 Tem

Huge congrats to @AIatMeta on the Llama 3.1 release! Few notes: Today, with the 405B model release, is the first time that a frontier-capability LLM is available to everyone to work with and build on. The model appears to be GPT-4 / Claude 3.5 Sonnet grade and the weights are open and permissively licensed, including commercial use, synthetic data generation, distillation and finetuning. This is an actual, open, frontier-capability LLM release from Meta. The release includes a lot more, e.g. including a 92-page PDF with a lot of detail about the model: ai.meta.com/research/publi… The philosophy underlying this release is in this longread from Zuck, well worth reading as it nicely covers all the major points and arguments in favor of the open AI ecosystem worldview: "Open Source AI is the Path Forward" facebook.com/4/posts/101157… I like to say that it is still very early days, that we are back in the ~1980s of computing all over again, that LLMs are a next major computing paradigm, and Meta is clearly positioning itself to be the open ecosystem leader of it. - People will prompt and RAG the models. - People will finetune the models. - People will distill them into smaller expert models for narrow tasks and applications. - People will study, benchmark, optimize. Open ecosystems also self-organize in modular ways into products apps and services, where each party can contribute their own unique expertise. One example from this morning is @GroqInc , who built a new chip that inferences LLMs *really fast*. They've already integrated Llama 3.1 models and appear to be able to inference the 8B model ~instantly: x.com/karpathy/statu… And (I can't seem to try it due to server pressure) the 405B running on Groq is probably the highest capability, fastest LLM today (?). Early model evaluations look good: ai.meta.com/blog/meta-llam… x.com/alexandr_wang/… Pending still is the "vibe check", look out for that on X / r/LocalLlama over the next few days (hours?). I expect the closed model players (which imo have a role in the ecosystem too) to give chase soon, and I'm looking forward to that. There's a lot to like on the technical side too, w.r.t. multilingual, context lengths, function calling, multimodal, etc. I'll post about some of the technical notes a bit later, once I make it through all the 92 pages of the paper :)

English

184

1.4K

12.1K

987.5K

Vikas Chandra@vikasc·19 Tem

@cmualumnihouse

QAM

Vikas Chandra@vikasc·19 Tem

I am really grateful for this generous profile piece that my alma mater @CarnegieMellon wrote about my work at Meta Reality Labs, as well as my journey over the last twenty years since I left CMU in 2004. Thank you! cmu.edu/engage/about-u… @CMU_ECE

English

156

Vikas Chandra@vikasc·9 Tem

Thanks @ylecun for highlighting our work on Efficient LLM. @zechunliu will be presenting MobileLLM at ICML '24 later this month in Vienna. The source code is also open sourced now at github.com/facebookresear…

Yann LeCun@ylecun

MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: arxiv.org/abs/2402.14905

English

400

Vikas Chandra retweetet

Yann LeCun@ylecun·7 Tem

English

195

1.1K

146.5K

Vikas Chandra retweetet

Yuandong Tian@tydsh·30 May

New ML efficiency work! 🎯We propose SpinQuant that optimizes the rotation matrices in several parts of the pre-trained transformers, so that the resulting quantized model largely retains full-precision performance. For small models that are hard to optimize, SpinQuant can quantize weight/activation/KVs, all to 4-bits, with a reduction of the gap-to-full-precision by 30.2% (LLaMA2-7B) and by 34.1% (LLaMa3-8B), compared to concurrent work QuaRot that only uses random rotation to remove outliers. As a result, for all 4-bits quantization, there is only a 2.9 points gap-to-full-precision for LLaMA2-7B and 4.4 points gap-to-full-precision for LLaMA3-8B, in zero-shot common sense reasoning tasks. This is a joint work with @vikasc's team (@zechunliu et al). Thanks all for the efforts! arxiv.org/abs/2405.16406

English

17.8K

Entdecken

@zechunliu @erniecyc @_akhaliq @Meta @ChongZhou7 @mukosame @klightlm @_sakshams_