
Stanislav Frolov
369 posts

Stanislav Frolov
@stfrolov
Researcher @DFKI Generative Image Modeling | Intern @MetaAI '22 & @AdobeResearch '21





Unifying VXAI: A Systematic Review and Framework for the Evaluation of Explainable AI David Dembinsky, Adriano Lucieri, Stanislav Frolov, Hiba Najjar, Ko Watanabe, Andreas Dengel. Action editor: Krikamol Muandet. openreview.net/forum?id=wAvFL… #explanation















Thanks @_akhaliq for promoting our work! With GaLore, now it is possible to pre-train a 7B model in NVidia RTX 4090s with 24G memory! 🤔How? Instead of assuming low-rank weight structure like LoRA, we show that the weight gradient is naturally low-rank and thus can be projected into a (changing) low-dimensional space. Therefore, we save memory on gradient, Adams' momentum and variance at the same time! As a result, unlike LoRA, GaLore does not change the training dynamics and can be used to pre-train a 7B model from scratch, without any memory-consuming warm-up. This yields 1B/7B models with comparable perplexity as vanilla training up to 13B/20B tokens, using only 1/4 of the rank. With 1/2 of the rank, our 1B model is even better🤯. GaLore can also be used to do fine-tuning as well, yielding comparable results with LoRA. Thanks to awesome collaborators @jiawzhao, @KyriectionZhang, @BeidiChen, Zhangyang Wang and @AnimaAnandkumar!

GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank












