Mohammad Rastegari (@morastegari) - Twitter Profili

Mohammad Rastegari retweetledi

Sachin@sacmehtauw·25 Nis

Like OpenELM, CatLIP is also "Open" github.com/apple/corenet

Apple presents CatLIP CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text

English

0

6

15

2.6K

Mohammad Rastegari@morastegari·24 Nis

This work was one of the last works that was done by my team when I was working at Apple. A lot of credit to @sacmehtauw whose dedication was the key to this project. Main point behind here is to show as a contributor to the AI community we play our role to be fully open.

AK@_akhaliq

Apple presents OpenELM An Efficient Language Model Family with Open-source Training and Inference Framework The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and

English

1

7

63

25.3K

Mohammad Rastegari@morastegari·20 Şub

Another great research effort on efficiency of LLMs from our group in #apple

AK@_akhaliq

Apple presents Speculative Streaming Fast LLM Inference without Auxiliary Models Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model. While effective, in application-specific settings, it often involves fine-tuning both draft and target models to achieve high acceptance rates. As the number of downstream tasks grows, these draft models add significant complexity to inference systems. We propose Speculative Streaming, a single-model speculative decoding method that fuses drafting into the target model by changing the fine-tuning objective from next token prediction to future n-gram prediction. Speculative Streaming speeds up decoding by 1.8 - 3.1X in a diverse set of tasks, such as Summarization, Structured Queries, and Meaning Representation, without sacrificing generation quality. Additionally, Speculative Streaming is parameter-efficient. It achieves on-par/higher speed-ups than Medusa-style architectures while using ~10000X fewer extra parameters, making it well-suited for resource-constrained devices.

English

0

11

1.3K

Mohammad Rastegari retweetledi

AK@_akhaliq·20 Şub

Apple presents Speculative Streaming Fast LLM Inference without Auxiliary Models Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model. While effective, in application-specific settings, it often involves fine-tuning both draft and target models to achieve high acceptance rates. As the number of downstream tasks grows, these draft models add significant complexity to inference systems. We propose Speculative Streaming, a single-model speculative decoding method that fuses drafting into the target model by changing the fine-tuning objective from next token prediction to future n-gram prediction. Speculative Streaming speeds up decoding by 1.8 - 3.1X in a diverse set of tasks, such as Summarization, Structured Queries, and Meaning Representation, without sacrificing generation quality. Additionally, Speculative Streaming is parameter-efficient. It achieves on-par/higher speed-ups than Medusa-style architectures while using ~10000X fewer extra parameters, making it well-suited for resource-constrained devices.

English

4

69

506

72.1K

Mohammad Rastegari@morastegari·6 Oca

Glad to see a great attention to our research in the Apple MIND team.

AK@_akhaliq

LLM in a Flash by Apple is the most upvoted paper on Hugging Face huggingface.co/spaces/hysts/d…

English

0

9

1.6K

Mohammad Rastegari@morastegari·21 Ara

This has been one of my favorite directions on enabling #llms to run effectively on device. Thanks to the great team who are pushing state-of-the-art in this direction. In the Apple MIND team, we try to attack research problems that move us to the next level of experiencing AI.

AK@_akhaliq

Apple announces LLM in a flash: Efficient Large Language Model Inference with Limited Memory paper page: huggingface.co/papers/2312.11… Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this flash memory-informed framework, we introduce two principal techniques. First, "windowing'" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory.

English

0

14

1.2K

Mohammad Rastegari@morastegari·13 Eyl

Accurate training aware weight quantization was computationally intractable for LLMs. But now in Apple MIND we developed a method to solve the problem very efficiently and it pushes the boundary to 3-bit quantization. eDKM: arxiv.org/abs/2309.00964 #LLM, #LLMoptimizaton

English

1

8

29

4.3K

Mohammad Rastegari retweetledi

Oncel Tuzel@OncelTuzel·15 Tem

“Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement” is an #iccv2023 paper from #Apple. By just swapping the ImageNet dataset with the “reinforced” ImageNet+ dataset, a model can be trained up to 7x faster to reach the same accuracy

Fartash Faghri@FartashFg

Excited that our "Dataset Reinforcement" paper introducing "ImageNet+ dataset" is accepted to #ICCV2023! Up to ~7x FASTER training! Paper: arxiv.org/abs/2303.08983 ImageNet+/Code: Coming Soon w/ @HPouransari @sacmehtauw @MFarajtabar @morastegari @OncelTuzel Ali Farhadi 1/7

English

0

2

15

13.9K

Mohammad Rastegari retweetledi

Anurag Ranjan@anuragranj·16 Ağu

Introducing NeuMan, a NeRF representation of human together with the scene. From a single clip (<100 frames), NeuMan can perform view synthesis of the scene without/with the human in novel poses. (1/4) project page: machinelearning.apple.com/research/neura…… code: github.com/apple/ml-neuman

English

35

321

1.7K

0

Mohammad Rastegari retweetledi

Oncel Tuzel@OncelTuzel·17 Ağu

NeuMan is a new #ECCV2022 paper from our research team @Apple. Using a short (~10s) clip, we reconstruct human and scene radiance fields, and re-render with novel human poses and views. Paper/code/videos: machinelearning.apple.com/research/neura… w. @jwei221, G. Samei, @kwangmoo_yi, @anuragranj

English

0

4

26

0

Mohammad Rastegari retweetledi

Oncel Tuzel@OncelTuzel·27 Tem

MobileOne code is released.

Anurag Ranjan@anuragranj

Introducing the fastest neural network on mobile so far - MobileOne, an improved one millisecond mobile backbone. Latency range: 0.79 ms - 1.86 ms. ImageNet top1 accuracy range: 71.4 % - 79.4 %. arxiv: arxiv.org/abs/2206.04040 code and models: github.com/apple/ml-mobil…

English

0

1

7

0

Mohammad Rastegari retweetledi

Anurag Ranjan@anuragranj·22 Tem

We provide an empirical analysis over different sharing strategies in isotropic networks and how they can make large networks memory-efficient. Joint work with Chien-Yu Lin, Anish Prabhu, Thomas Merth, @sacmehtauw, @mchorton1991, @morastegari. Code: github.com/apple/ml-spin.

English

0

1

3

0

Mohammad Rastegari@morastegari·7 Haz

A Transformer that can be as efficient as a CNN but yet maintains the high performance in the large data regime training. #EfficientTransformer, #AppleMI

Sachin@sacmehtauw

(1/N) New paper! Separable self-attention for mobile vision transformers. Improves the speed and accuracy of transformer-based mobile vision network, MobileViT, significantly across tasks and input sizes. Paper: arxiv.org/abs/2206.02680

English

0

2

19

0

Mohammad Rastegari@morastegari·27 May

@bneyshabur Behnam, how do you Handle the IP rights?

English

0

Behnam Neyshabur@bneyshabur·26 May

In the last two years, I have initiated 4 projects with junior researchers who were seeking supervision and had no prior work relationship with me or anyone I knew. One led to a NeurIPS publication, one is ongoing and the other two didn't lead to a publication. 2/6

English

2

0

52

0

Behnam Neyshabur@bneyshabur·26 May

We often prefer collaborating with people we know or those of high status. That makes it very difficult for hardworking and motivated junior researchers to get enough support to flourish. Is it possible to reduce this barrier? I'v been running some experiments to find out! 1/6

English

8

49

433

0

Mohammad Rastegari@morastegari·25 Kas

@bneyshabur @NeurIPSConf @iclr_conf @icmlconf I like the point on better incentive for reviewers, AC , SAC but I couldn’t find a good method here. I think reviewers should engage after a public survey on the papers.

English

0

2

0

Behnam Neyshabur@bneyshabur·25 Kas

After several years of reviewing & AC work for @NeurIPSConf, @iclr_conf & @icmlconf, I have strong opinions about the reviewing system and some suggestions that many may not like or agree with. Summarizing my points in this thread (hastily written & NOT carefully considered): 1/

English

3

8

92

0

Mohammad Rastegari@morastegari·18 Kas

These CVPR policies are frustrating. Given all the randomness in the review process I feel there is no point submitting paper into conferences anymore. By the law of large data (large number of papers and readers) just submitting to arXiv will be enough for a good paper to shine.

Lucas Beyer (bl16)@giffmana

1/2 @CVPR so if somebody posts wrong facts about a paper I co-authored, I may not correct them in an answer in any way? i.e. you prefer the spread of falsehoods? What if someone hates me and creates anon account, acting like me and answering threads about my submission?

English

3

21

0

Mohammad Rastegari@morastegari·14 Kas

Great progress in self-supervised training. Next milestone matching supervised SOAT Imagenet1k with self-supervised without end-to-end fine tuning

Lucas Beyer (bl16)@giffmana

1/N The return of patch-based self-supervision! It never worked well and you had to bend over backwards with ResNets (I tried). Now with ViT, very simple patch-based self-supervised pre-training rocks! First BeIT, now Masked AutoEncoders i1k=87.8% arxiv.org/pdf/2111.06377… 🧶

English

0

1

5

0

Mohammad Rastegari@morastegari·12 Kas

Wow deep attacks seems to be a very serious problem in our ML modeling approach. Perhaps something is fundamentally wrong in our models!!! arxiv.org/abs/1910.00744 #deeplearning

English

1

0

6

0

Mohammad Rastegari@morastegari·23 Eki

Yes, we are releasing code in Apple to promote effective contribution in our research community. #apple_ml_research, #apple, #DeepLearning

Sachin@sacmehtauw

Excited to release the source code and pre-trained models of MobileViT. github.com/apple/ml-cvnets

English

0

1

19

0

Mohammad Rastegari@morastegari·20 Eki

Deploying ML models on device cannot be static. ML models should adapt themselves to available resource. In our recent research in #Apple, we learn ML models that can dynamically be compressed to any arbitrary sparsity or quantized level at inference time.

Maxwell Horton@mchorton1991

Excited to share our new method, "Learning Compressible Subspaces (LCS)", for adaptive model compression in real-time: (arxiv.org/abs/2110.04252).

English

0

34

177

0

Mohammad Rastegari

Keşfet