Vikas Chandra

326 posts

Vikas Chandra

Vikas Chandra

@vikasc

Senior Director of #AI Research @Meta | CMU Ph.D. | Ex visiting faculty at Stanford

Menlo Park, CA Katılım Nisan 2009
173 Takip Edilen552 Takipçiler
Vikas Chandra
Vikas Chandra@vikasc·
On-Device LLMs: State of the Union, 2026 Three years ago, running an LLM on a phone was a toy demo. Today, billion-parameter models run in real time. What changed? Not just faster chips - we had to rethink everything. Full post: v-chandra.github.io/on-device-llms/
English
0
1
6
229
Vikas Chandra
Vikas Chandra@vikasc·
The AI industry is betting on hardware to solve the memory wall. But what if the bottleneck isn't hardware, but how we use it? New post on attacking memory constraints algorithmically—and why the "bigger is better" era is ending. v-chandra.github.io/ai-memory-wall/
English
0
0
4
170
Vikas Chandra
Vikas Chandra@vikasc·
Context graphs are AI's next trillion-dollar opportunity. The debate is focused on enterprise. The bigger opportunity is personal - and only on-device AI models can capture it. v-chandra.github.io/personal-conte…
English
0
0
3
95
Vikas Chandra
Vikas Chandra@vikasc·
Is "Token Anxiety" a thing already or did I just coin that term?
English
0
0
1
94
Vikas Chandra retweetledi
Vikas Chandra
Vikas Chandra@vikasc·
At Meta Connect today, we announced a set of AI tools that let us create immersive worlds, just from natural language prompts. We created first of its kind foundational GenAI models that enable rapid and high quality content creation.
English
0
0
3
200
Vikas Chandra retweetledi
Zechun Liu
Zechun Liu@zechunliu·
Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training, MobileLLM-R1 demonstrates strong reasoning ability. Despite using only 4.2T tokens, 11.7% compared to 36T pretraining token Qwen used, it delivers remarkable performance. Collaborating with @erniecyc , Changsheng, et al.
AK@_akhaliq

Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the pre-training tokens compared to Qwen: matches or surpasses Qwen3 accuracy on multiple reasoning benchmarks while training on only 4.2T tokens (just 11.7% of Qwen3’s 36T).

English
6
15
118
142.7K
Vikas Chandra retweetledi
Zechun Liu
Zechun Liu@zechunliu·
🚀 We're thrilled to announce that the SoTA low-bit quantization ParetoQ code is now open-source! 🌟 github.com/facebookresear… 🔍 What does this repo support? 🌟State-of-the-art sub-4-bit quantization: It is a significant upgrade from our previous LLM-QAT repo. Outperforming all previous methods, our advanced tech supports ultra low-bit quantization (binary to 4-bit). Try it on your model & training data now! 🥳 🌟Comprehensive comparison across different bits: Our unified framework enables reliable scaling laws across various bit widths. 📊 🎉What's Coming Next? 🌟Quantized weights release: Stay tuned for the upcoming release of all quantized weights. ⏳ 🌟We're working on releasing a 2-bit quantization tensor-core kernel soon. ⚙️
Zechun Liu tweet mediaZechun Liu tweet mediaZechun Liu tweet mediaZechun Liu tweet media
English
0
6
18
2.2K
Vikas Chandra
Vikas Chandra@vikasc·
Quantization is a pivotal research area to reduce computational and memory demands. The optimal bit-width for achieving best tradeoff between quantized model size and accuracy has been a subject of ongoing debate. Paper: arxiv.org/pdf/2502.02631
English
0
1
5
174
Vikas Chandra retweetledi
Forrest Iandola
Forrest Iandola@fiandola·
[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from @Meta: interactive video segmentation and tracking on an iPhone!
English
13
107
516
66.2K
Vikas Chandra retweetledi
Yunyang Xiong
Yunyang Xiong@YoungXiong1·
🚀Excited to share our Efficient Track Anything. It is small but mighty, >2x faster than SAM2 on A100 and runs > 10 FPS on iPhone 15 Pro Max. How’d we do it? EfficientSAM + Efficient Memory Attention! Paper: arxiv.org/pdf/2411.18933 Project (demo): yformer.github.io/efficient-trac… with: @ChongZhou7, @mukosame, @klightlm, @zechunliu, @_sakshams_ , @balakrishnan_vr, @fiandola, @bilgeesra, @raghuraman, @vikasc, etc
Yunyang Xiong tweet media
English
4
36
111
16.5K
Vikas Chandra retweetledi
Yuandong Tian
Yuandong Tian@tydsh·
Our SpinQuant work (arxiv.org/abs/2405.16406) has been used in the quantized versions of LLaMA 3.2 1B/3B model released in Meta Connect'24. Congrats to all co-authors! @zechunliu, Changshen Zhao, Raghuraman Krishnamoorthi, Dhruv Choudhary, Bilge Soran, Igor Fedorov, @vikasc, @TiRune
AI at Meta@AIatMeta

We want to make it easier for more people to build with Llama — so today we’re releasing new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint. Details on our new quantized Llama 3.2 on-device models ➡️ ai.meta.com/blog/meta-llam… While quantized models have existed in the community before, these approaches often came at a tradeoff between performance and accuracy. To solve this, we Quantization-Aware Training with LoRA adaptors as opposed to only post-processing. As a result, our new models offer a reduced memory footprint, faster on-device inference, accuracy and portability — while maintaining quality and safety for developers to deploy on resource-constrained devices. The new models can be downloaded now from Meta and on @huggingface.

English
2
14
74
8.9K
Vikas Chandra retweetledi
Yunyang Xiong
Yunyang Xiong@YoungXiong1·
🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: huggingface.co/papers/2410.17… 🧑🏻‍💻Code: github.com/Vision-CAIR/Lo… 🚀Project (Demo): vision-cair.github.io/LongVU We propose LongVU, a video LLM with a spatiotemporal adaptive compression mechanism designed for real-world hour-long video understanding. LongVU adaptively reduces the number of video tokens by leveraging (1) DINOv2 feature similarity across frames, (2) Cross-modal text-frame similarity, and (3) temporal frame similarity. 1. High quality on video-based QA: 67.6% on EgoSchema, 66.9% on MVBench, 65.4% on MLVU and 59.5% on VideoMME long 2. +5% accuracy boost on average across various video understanding benchmarks compared to LLaVA-OneVision and VideoChat2 3. Our edge model, LongVU-3B, also outperformed 4B counterparts such as VideoChat2(Phi-3) and Phi-3.5-vision-instruct by a large margin. with: @xiaoqian_shen @liuzhuang1234 @Hu_Hsu @garvinchen2 @klightlm @zechunliu @balakrishnan_vr @Fanyi_Xiao @hyunwoojkim @bilgeesra @raghuraman @moElhoseiny @vikasc
Yunyang Xiong tweet media
English
4
72
251
49.5K
Vikas Chandra retweetledi
Mingchen Zhuge
Mingchen Zhuge@MingchenZhuge·
🔔 new 𝗔𝗴𝗲𝗻𝘁-𝗮𝘀-𝗮-𝗝𝘂𝗱𝗴𝗲 paper: 𝗖𝗮𝗻 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗮𝘀 𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲𝗹𝘆 𝗮𝘀 𝗵𝘂𝗺𝗮𝗻𝘀? 𝗬𝗲𝘀, 𝘁𝗵𝗲𝘆 𝗰𝗮𝗻! 📄 arxiv.org/abs/2410.10934… 👨‍💻 github.com/metauto-ai/age… Introducing 𝗔𝗴𝗲𝗻𝘁-𝗮𝘀-𝗮-𝗝𝘂𝗱𝗴𝗲, a groundbreaking proof-of-concept that reduces costs and time by 97%, while providing rich, intermediate feedback. It precisely captures the natural step-by-step processes of agentic systems. We also developed 𝗗𝗲𝘃𝗔𝗜, a new benchmark featuring 55 automated AI development tasks and 365 requirements. Agent-as-a-Judge not only outperforms LLM-as-a-Judge but also closely mirrors human evaluations with greater efficiency and precision. The real game-changer? It provides reliable reward signals, paving the way for scalable, self-improving agentic systems. Thanks my Meta/KAUST mentors/peers/collaborators @SchmidhuberAI @tydsh @zechunliu @vikasc @YoungXiong1 @vikasc @Obs01ete @erniecyc @oneDylanAshley ...
Mingchen Zhuge tweet mediaMingchen Zhuge tweet mediaMingchen Zhuge tweet mediaMingchen Zhuge tweet media
Menlo Park, CA 🇺🇸 English
30
205
1K
183.4K
Vikas Chandra retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Huge congrats to @AIatMeta on the Llama 3.1 release! Few notes: Today, with the 405B model release, is the first time that a frontier-capability LLM is available to everyone to work with and build on. The model appears to be GPT-4 / Claude 3.5 Sonnet grade and the weights are open and permissively licensed, including commercial use, synthetic data generation, distillation and finetuning. This is an actual, open, frontier-capability LLM release from Meta. The release includes a lot more, e.g. including a 92-page PDF with a lot of detail about the model: ai.meta.com/research/publi… The philosophy underlying this release is in this longread from Zuck, well worth reading as it nicely covers all the major points and arguments in favor of the open AI ecosystem worldview: "Open Source AI is the Path Forward" facebook.com/4/posts/101157… I like to say that it is still very early days, that we are back in the ~1980s of computing all over again, that LLMs are a next major computing paradigm, and Meta is clearly positioning itself to be the open ecosystem leader of it. - People will prompt and RAG the models. - People will finetune the models. - People will distill them into smaller expert models for narrow tasks and applications. - People will study, benchmark, optimize. Open ecosystems also self-organize in modular ways into products apps and services, where each party can contribute their own unique expertise. One example from this morning is @GroqInc , who built a new chip that inferences LLMs *really fast*. They've already integrated Llama 3.1 models and appear to be able to inference the 8B model ~instantly: x.com/karpathy/statu… And (I can't seem to try it due to server pressure) the 405B running on Groq is probably the highest capability, fastest LLM today (?). Early model evaluations look good: ai.meta.com/blog/meta-llam… x.com/alexandr_wang/… Pending still is the "vibe check", look out for that on X / r/LocalLlama over the next few days (hours?). I expect the closed model players (which imo have a role in the ecosystem too) to give chase soon, and I'm looking forward to that. There's a lot to like on the technical side too, w.r.t. multilingual, context lengths, function calling, multimodal, etc. I'll post about some of the technical notes a bit later, once I make it through all the 92 pages of the paper :)
English
184
1.4K
12.1K
987.5K
Vikas Chandra
Vikas Chandra@vikasc·
Thanks @ylecun for highlighting our work on Efficient LLM. @zechunliu will be presenting MobileLLM at ICML '24 later this month in Vienna. The source code is also open sourced now at github.com/facebookresear…
Yann LeCun@ylecun

MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: arxiv.org/abs/2402.14905

English
0
1
7
400
Vikas Chandra retweetledi
Yann LeCun
Yann LeCun@ylecun·
MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: arxiv.org/abs/2402.14905
Yann LeCun tweet media
English
38
195
1.1K
146.5K
Vikas Chandra retweetledi
Yuandong Tian
Yuandong Tian@tydsh·
New ML efficiency work! 🎯We propose SpinQuant that optimizes the rotation matrices in several parts of the pre-trained transformers, so that the resulting quantized model largely retains full-precision performance. For small models that are hard to optimize, SpinQuant can quantize weight/activation/KVs, all to 4-bits, with a reduction of the gap-to-full-precision by 30.2% (LLaMA2-7B) and by 34.1% (LLaMa3-8B), compared to concurrent work QuaRot that only uses random rotation to remove outliers. As a result, for all 4-bits quantization, there is only a 2.9 points gap-to-full-precision for LLaMA2-7B and 4.4 points gap-to-full-precision for LLaMA3-8B, in zero-shot common sense reasoning tasks. This is a joint work with @vikasc's team (@zechunliu et al). Thanks all for the efforts! arxiv.org/abs/2405.16406
English
2
6
77
17.8K