Fartash Faghri

220 posts

Fartash Faghri

Fartash Faghri

@FartashFg

ML Research @Apple. @UofT PhD.

Toronto,Canada Katılım Kasım 2013
71 Takip Edilen1.1K Takipçiler
Fartash Faghri retweetledi
Mehrdad Farajtabar
Mehrdad Farajtabar@MFarajtabar·
Continual Learning remains one of the most challenging “holy grails” of AI. Most discussions focus on catastrophic forgetting: models lose what they previously learned. But there is another equally important failure mode: over long continual training, neural networks can also lose their plasticity, ie, their ability to learn new things is weakened over time. In our ICLR 2026 work with colleagues at @Apple and @ETH, we study this phenomenon, known as Loss of Plasticity (LoP), from a geometric perspective. We show that LoP can arise when gradient dynamics become trapped in invariant manifolds of parameter space. In particular, we analyze two types of traps: 🔴 Frozen units: units saturate, gradients vanish, and they become effectively silent to backpropagation. 🔵 Cloned units: units become redundant, receive matching forward and backward signals, and move together. For these structures, the gradient is tangent to the trap. Once standard GD/SGD enters these affine subspaces, it cannot leave them on its own. This means the dynamics can remain sticky even when the data distribution or task changes. What we find especially interesting is that these traps are not merely optimization bugs. The same feature-learning pressures that help networks learn useful representations for the current task can also push them toward states with less future adaptability. This raises a difficult open question for future work: are neural networks trained with SGD and cross-entropy loss fundamentally the right framework for continual learning? Please read the full paper for more details: arxiv.org/pdf/2510.00304
Amir Joudaki@AmirJoudaki

Neural nets don’t just forget. Sometimes, after long training, they lose the ability to learn at all. In our #ICLR2026 poster, we model Loss of Plasticity as gradient dynamics trapped in invariant manifolds: 🔴 frozen units, 🔵 cloned units. The video makes the traps visible.

English
9
47
360
49.6K
Fartash Faghri
Fartash Faghri@FartashFg·
In the era of continued pretraining and continued fine-tuning, loss of plasticity means leaving future gains on the table. We need a better theoretical understanding of loss of plasticity. See a great thread unpacking the dynamics. 👇 #ICLR2026 #ContinualLearning #DeepLearning
Amir Joudaki@AmirJoudaki

Neural nets don’t just forget. Sometimes, after long training, they lose the ability to learn at all. In our #ICLR2026 poster, we model Loss of Plasticity as gradient dynamics trapped in invariant manifolds: 🔴 frozen units, 🔵 cloned units. The video makes the traps visible.

English
0
4
11
2K
Fartash Faghri retweetledi
Amir Joudaki
Amir Joudaki@AmirJoudaki·
Neural nets don’t just forget. Sometimes, after long training, they lose the ability to learn at all. In our #ICLR2026 poster, we model Loss of Plasticity as gradient dynamics trapped in invariant manifolds: 🔴 frozen units, 🔵 cloned units. The video makes the traps visible.
English
16
52
615
99.3K
Fartash Faghri
Fartash Faghri@FartashFg·
Attending #ICLR2026! Feel free to message me to chat about efficient multimodal models, or come find me at: 🗣️ Chairing Oral Session 4C: Vision Language Models III Fri 24 Apr | 3:15 PM - 4:45 PM 📊 MobileCLIP2 (DFNDR 2B/12M released, links below👇) Sat 25 Apr | 10:30 AM - 1:00 PM | Poster Session 5, Pavilion 4, #3713 🪧 Apple Booth, Sat 25 Apr | 1:30 PM - 3:30 PM 📊Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity Sat 25 Apr | 3:15 PM - 5:45 PM | Poster Session 6, Pavilion 4, #4202 📊 Data-Centric Lessons To Improve Speech-Language Pretraining Sat 25 Apr | 3:15 PM - 5:45 PM | Poster Session 6, Pavilion 3, #1418 Come work with our MLR team on efficient ML (message me if interested!): jobs.apple.com/en-us/details/… Apple @ ICLR 2026: machinelearning.apple.com/updates/apple-… DFNDR-2B: huggingface.co/datasets/apple… DFNDR-12M: huggingface.co/datasets/apple… huggingface.co/datasets/apple…
English
0
0
8
388
Fartash Faghri retweetledi
Oncel Tuzel
Oncel Tuzel@OncelTuzel·
LiTo: Surface Light Field Tokenization (ICLR 2026) — new work from Apple MLR. LiTo learns a unified 3D representation of geometry + view-dependent appearance, capturing effects like specular highlights & Fresnel reflections, enabling high-fidelity 3D generation from single image.
English
1
28
203
12.3K
Fartash Faghri retweetledi
Oncel Tuzel
Oncel Tuzel@OncelTuzel·
Come work with us! The Machine Learning Research (MLR) team at Apple is seeking a passionate AI researcher to work on Efficient ML algorithms, including models optimized for fast inference and efficient training methods. Apply here: jobs.apple.com/en-us/details/…
English
6
41
368
32.5K
Fartash Faghri retweetledi
Vishaal Udandarao
Vishaal Udandarao@vishaal_urao·
🚀New Paper arxiv.org/abs/2510.20860 We conduct a systematic data-centric study for speech-language pretraining, to improve end-to-end spoken-QA! 🎙️🤖 Using our data-centric insights, we pretrain a 3.8B SpeechLM (called SpeLangy) outperforming 3x larger models! 🧵👇
Vishaal Udandarao tweet media
English
3
40
127
9.7K
Fartash Faghri
Fartash Faghri@FartashFg·
@justachetan The email issue is fixed now. The address is the same. Thanks for letting us know.
English
0
0
1
283
Aditya Chetan
Aditya Chetan@justachetan·
@FartashFg Hi Fartash, I am interested in applying for this role; however, it seems that the email ID shared has a typo. I got a mail delivery failure. Could you kindly share the correct email ID? Thanks!
English
1
0
4
834
Fartash Faghri
Fartash Faghri@FartashFg·
📣 Internship at Apple ML Research We’re looking for a PhD research intern with interests in efficient multimodal models and video. For our recent works see machinelearning.apple.com/research/fast-… This is a pure-research internship where the objective is to publish high-quality work. Internship duration is 4-10 months between November 2025-September 2026 If you are interested email your resume to mlr-efficient-ml-internship@group.apple.com and apply to jobs.apple.com/en-us/details/…
English
3
29
295
19.4K
Fartash Faghri
Fartash Faghri@FartashFg·
🚨While booking your travel for #NeurIPS2025, make sure to stay on Sunday, December 7 8am-5pm for CCFM Workshop (Continual and Compatible Foundation Model Updates). We have received exciting paper contributions and have an amazing lineup of speakers.
Fartash Faghri@FartashFg

Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) sites.google.com/view/ccfm-neur… #FoundationModels #ContinualLearning

English
0
3
21
3.9K
Fartash Faghri retweetledi
Hadi Pouransari
Hadi Pouransari@HPouransari·
📣We have PhD research internship positions available at Apple MLR. DM me your brief research background, resume, and availability (earliest start date and latest end date) if interested in the topics below.
Hadi Pouransari@HPouransari

Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵

English
7
46
457
55.9K
Fartash Faghri retweetledi
Hadi Pouransari
Hadi Pouransari@HPouransari·
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
GIF
English
11
112
634
170.3K
Fartash Faghri retweetledi
Yuyang Wang @ICLR26
Yuyang Wang @ICLR26@YuyangW95·
New preprint & open-source! 🚨 “SimpleFold: Folding Proteins is Simpler than You Think” (arxiv.org/abs/2509.18480). We ask: Do protein folding models really need expensive and domain-specific modules like pair representation? We build SimpleFold, a 3B scalable folding model solely built on general-purpose transformers + flow matching, and is trained on 9M structures. SimpleFold supports easy deployment and efficient inference on consumer-level hardware with PyTorch/MLX (try it on your MacBook!) (1/n)
Yuyang Wang @ICLR26 tweet media
English
12
87
353
104.9K