Pavankumar Vasu (@PavankumarVasu) - Twitter Profili

Sabitlenmiş Tweet

Excited to share code & models for FastVLM — our blazing-fast Vision-Language Model appearing at #CVPR2025 Run it on-device with inference code optimized for Apple Silicon using #mlx. Code: github.com/apple/ml-fastv… Updated paper & results coming soon. Stay tuned! 👀

English

11

49

207

50K

Pavankumar Vasu retweetledi

Jiatao Gu@thoma_gu·12 Ara

(1/n) There’s a long-running debate on bringing representation learning into generative modeling—their latent spaces play different roles. 🚀🚀 We present FAE, a simple-yet-effective framework that bridges them with a single attention layer! Paper: huggingface.co/papers/2512.07…

English

6

88

509

82.6K

Pavankumar Vasu retweetledi

Yizhe Zhang@YizheZhangNLP·27 Kas

We use latent continuous thoughts for retrieval optimized via downstream NTP loss, unified under one LLM backbone. Since representations are shared, documents can be precomputed—eliminating 2-stage RAG. We match raw text performance but with a much shorter context budget. 📉🚀

Jie He@Jiehenlp

Happy to introduce my internship work at @Apple . We introduce CLaRa: Continuous Latent Reasoning, an end-to-end training framework that jointly trains retrieval and generation ! 🧠📦 🔗 arxiv.org/pdf/2511.18659… #RAG #LLMs #Retrieval #Reasoning #AI

English

1

8

35

6.7K

Pavankumar Vasu retweetledi

Jiatao Gu@thoma_gu·27 Kas

STARFlow gets an upgrade—it now works on videos🎥 We present STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows, a invertible, causal video generator built on autoregressive flows! 📄 Paper huggingface.co/papers/2511.20… 💻 Code github.com/apple/ml-starf… (1/10)

English

4

40

197

61.6K

Pavankumar Vasu retweetledi

Eran Malach@EranMalach·17 Eki

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧵

English

6

84

419

115.2K

Pavankumar Vasu retweetledi

Fartash Faghri@FartashFg·14 Eki

🚨While booking your travel for #NeurIPS2025, make sure to stay on Sunday, December 7 8am-5pm for CCFM Workshop (Continual and Compatible Foundation Model Updates). We have received exciting paper contributions and have an amazing lineup of speakers.

Fartash Faghri@FartashFg

Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) sites.google.com/view/ccfm-neur… #FoundationModels #ContinualLearning

English

0

3

21

3.9K

Pavankumar Vasu retweetledi

Xianhang Li@XianhangLi·1 Eki

🤔 Ever thought a small teacher could train a student 6× larger that sets new SOTA in training efficiency and frozen evaluation performance for video representation learning? 🤔 Do we really need complex EMA-based self-distillation to prevent collapse, bringing unstable loss dynamics while offering little insight into representation quality? 🚨 In our new paper, we investigate these questions and propose SALT (Static-teacher Asymmetric Latent Training): a simple, scalable, and compute-efficient alternative for video representation learning. 📄 Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers 🔗 arxiv.org/abs/2509.24317

English

7

73

459

39.1K

Pavankumar Vasu@PavankumarVasu·29 Ağu

📢 FastVLM models are now on 🤗

Xenova@xenovacom

NEW: Apple releases FastVLM and MobileCLIP2 on Hugging Face! 🤗 The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time VLM applications! 🤯 It can even do live video captioning 100% locally in your browser (zero install). Huge for accessibility!

English

0

6

256

Pavankumar Vasu@PavankumarVasu·29 Ağu

📢 Releasing MobileCLIP2 (TMLR Featured). Small embedding models that can power your multimodal RAG applications on resource constrained devices. Models are available on 🤗

Fartash Faghri@FartashFg

🚀Releasing MobileCLIP2 (TMLR Featured). MobileCLIP2-S4 matches acc of SigLIP-SO400M/14 while 2x smaller and surpasses DFN ViT-L/14 at 2.5x faster. Paper: arxiv.org/abs/2508.20691 Code: github.com/apple/ml-mobil… RayGen: github.com/apple/ml-mobil… 🤗huggingface.co/collections/ap… #Apple MLR

English

0

2

244

Pavankumar Vasu retweetledi

Fartash Faghri@FartashFg·29 Ağu

🚀Releasing MobileCLIP2 (TMLR Featured). MobileCLIP2-S4 matches acc of SigLIP-SO400M/14 while 2x smaller and surpasses DFN ViT-L/14 at 2.5x faster. Paper: arxiv.org/abs/2508.20691 Code: github.com/apple/ml-mobil… RayGen: github.com/apple/ml-mobil… 🤗huggingface.co/collections/ap… #Apple MLR

English

5

27

73

6.8K

Pavankumar Vasu retweetledi

Fartash Faghri@FartashFg·15 Ağu

🚨📅The submission deadline for #NeurIPS 2025 CCFM Workshop is just 8 days away on August 22. Get your papers in! Submit your work on Continual and Compatible Foundation Model Updates to the #NeurIPS 2025 CCFM Workshop. Learn more: sites.google.com/view/ccfm-neur…

Fartash Faghri@FartashFg

Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) sites.google.com/view/ccfm-neur… #FoundationModels #ContinualLearning

English

0

1

5

2K

Pavankumar Vasu retweetledi

Max Seitzer@maxseitzer·14 Ağu

Introducing DINOv3 🦕🦕🦕 A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale. High quality dense features, combining unprecedented semantic and geometric scene understanding. Three reasons why this matters…

English

12

138

1K

135K

Pavankumar Vasu retweetledi

Andi Marafioti@andimarafioti·31 Tem

🚀 We're thrilled to launch four new OCR datasets with 20M images: DoclingMatix, SynthFormulaNet, SynthCodeNet, and SynthChartNet. We used them train SmolDocling, our ultra‑compact (256M) full-page document conversion VLM with performance rivaling models up to 27× larger.

English

5

77

551

30.4K

Pavankumar Vasu retweetledi

Andrea Santilli@teelinsan·22 Tem

Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly? 🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed

English

1

17

48

4.2K

Pavankumar Vasu retweetledi

Hadi Pouransari@HPouransari·23 Tem

🌟Explore key insights from the FastVLM project (real-time vision-language model) in this blog post: machinelearning.apple.com/research/fast-…

English

4

39

215

39.5K

Pavankumar Vasu retweetledi

Fartash Faghri@FartashFg·23 Tem

📢Submissions are now open for #NeurIPS2025 CCFM workshop. Submission deadline: August 22, 2025, AoE. Website: sites.google.com/view/ccfm-neur… Call for papers: sites.google.com/view/ccfm-neur… Submission Link: openreview.net/group?id=NeurI…

Fartash Faghri@FartashFg

Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) sites.google.com/view/ccfm-neur… #FoundationModels #ContinualLearning

English

0

6

11

10.7K

Pavankumar Vasu retweetledi

Mustafa Shukor@MustafaShukor1·15 Tem

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

English

6

46

266

31.1K

Pavankumar Vasu retweetledi

Rin Metcalf Susa@RinMetcalfSusa·15 Tem

📣 We are excited to present our work on inferring user preferences from writing samples at @icmlconf Poster Session 3 (Wed. 11:00AM - 1:30PM)! Come by to ✋ chat with us, 📄 learn about our method, and 💻 hear about our new interactive benchmark (🔗s below)!

English

1

3

7

489

Pavankumar Vasu retweetledi

Fartash Faghri@FartashFg·8 Tem

🚀Super excited to share TiC-LM (Oral at #ACL2025)! How to keep FMs up-to-date over months/years? We have a benchmark and lots of insights (arxiv.org/abs/2504.02107). Also organizing a related @NeurIPSConf 2025 workshop continual and compatible FMs (CCFM: sites.google.com/view/ccfm-neur…) Code/Models/Dataset: github.com/apple/ml-tic-lm Our prior work on TiC-CLIP: arxiv.org/abs/2310.16226 Thanks to @jeffwpli for his amazing work on DCLM and TiC-LM and other upcoming works during his internship at @Apple MLR. Thanks to everyone at @Apple MLR to help us do great research.

Jeffrey Li@jeffwpli

Excited to share TiC-LM (Oral at #ACL2025)! LLMs can become outdated ⏲️ and re-training from scratch is costly💰. Ideally, we'd keep reusing and updating models on newer data ♻️. We study continual training as 114 CC months are revealed one-at-a-time. arxiv.org/abs/2504.02107

English

0

2

11

852

Pavankumar Vasu retweetledi

Jiatao Gu@thoma_gu·11 Haz

I will be attending #CVPR2025 and presenting our latest research at Apple MLR! Specifically, I will present our highlight poster--world consistent video diffusion (cvpr.thecvf.com/virtual/2025/p…), and three workshop invited talks which includes our recent preprint ★STARFlow★! (0/n)

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis "We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance on high-resolution image synthesis"

English

2

23

86

28.7K

Pavankumar Vasu retweetledi

Ryan Hoque@ryan_hoque·21 May

Imitation learning has a data scarcity problem. Introducing EgoDex from Apple, the largest and most diverse dataset of dexterous human manipulation to date — 829 hours of egocentric video + paired 3D hand poses across 194 tasks. Now on arxiv: arxiv.org/abs/2505.11709 (1/4)

English

15

91

606

113.7K

Pavankumar Vasu

Keşfet