Tasnim Mohiuddin

95 posts

Tasnim Mohiuddin

@mtaasnim

Scientist, QCRI @QatarComputing || Previously Research Intern @MetaAI Training LLMs || Core Contributor to @Ai_Fanar

Doha, Qatar Katılım Ekim 2010

626 Takip Edilen88 Takipçiler

Tasnim Mohiuddin@mtaasnim·28 Oca

@sbmaruf Thanks @sbmaruf for the kind words. Means a lot to me.

English

M Saiful Bari (MARUF)@sbmaruf·28 Oca

Congrats @mtaasnim for the amazing work on Fanar. So proud of you. Who knew the two corner desk at MICL lab would be worth Millions in an year!! Now it's time to make it Billion. arxiv.org/pdf/2501.13944

English

268

Tasnim Mohiuddin@mtaasnim·1 Oca

@sbmaruf @NTUsg @SDAIA_SA Congratulations @sbmaruf 🫡 Very well deserved!

English

Tasnim Mohiuddin retweetledi

M Saiful Bari (MARUF)@sbmaruf·31 Ara

Excited to share that I’ve been recognized as an "Innovator Under 35" by MIT Technology Review from MENA Region! After earning my Ph.D. from @NTUsg, Singapore, I joined the "National Center for Artificial Intelligence (NCAI), SDAIA" (@SDAIA_SA) to work on ALLaM, Arabic Large Language Model, a nationwide initiative aimed at developing Sovereign LLM. We were among the first few organizations to successfully scale the both pretraining and continuous pretraining. arxiv.org/abs/2407.15390 A massive shoutout to my incredible manager @areebsa and mentor @haidarkk1 for their unwavering support (and for tolerating my endless YOLOs and arguments!). Thanks to @mtaasnim, @ajabal4 and @y_alnumay who still put up with me. The saddest part during this year was when @haidarkk1 left NCAI, I felt like a part of me just vanished, it was very difficult initially. I don't have anyone to do pointless arguments. :( I genuinely believe NCAI, SDAIA is on the path to achieving Artificial Superintelligence (ASI) depending on few small but critical factors. The gap with the frontier labs might just be 12–18 months. Finally truely grateful to our former Chief Scientist @ehsan_hoque for his inspiration and constant support and my academic parent @JotyShafiq just be there for me all the time. "Ganbare kuruko, akiramenaide"

إم آي تي تكنولوجي ريفيو@TechReviewAR

#سيف_الباري من #بنغلاديش الفائز بجائزة #مبتكرون_دون35 العالمية في نسختها السابعة لعام 2024، وذلك لابتكاره "علّام (نموذج اللغة العربية الكبير)، نموذج ذكاء اصطناعي متقدم يحافظ على الفروق اللغوية والثقافية للغة العربية، مما يتيح تطبيقات شاملة في مجالات التعليم والرعاية الصحية والخدمات الحكومية". @sbmaruf cstu.io/41d727

English

4.5K

Tasnim Mohiuddin retweetledi

Thomas Wolf@Thom_Wolf·28 Mar

[75min talk] i finally recorded this lecture I gave two weeks ago because people kept asking me for a video so here it is, enjoy "The Little guide to building Large Language Models in 2024" tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports

English

236

1.3K

125.8K

Tasnim Mohiuddin retweetledi

Sebastian Raschka@rasbt·13 Eki

I ran hundreds if not thousands of LoRA & QLoRA experiments to finetune open-source LLMs, and here’s what I learned: 1. Despite the inherent randomness of LLM training (or when training models on GPUs in general), the outcomes remain remarkably consistent across multiple runs. 2. QLoRA presents a trade-off that might be worthwhile if you're constrained by GPU memory. It offers 33% memory savings at the cost of a 33% increase in runtime. 3. When finetuning LLMs, the choice of optimizer shouldn't be a major concern. While SGD on its own is suboptimal, there's minimal variation in outcomes whether you employ AdamW, SGD with a scheduler, or AdamW with a scheduler. 4. While Adam is often labeled a memory-intensive optimizer due to its introduction of two new parameters for every model parameter, this doesn't significantly affect the peak memory demands of the LLM. This is because the majority of the memory is allocated for large matrix multiplications rather than retaining extra parameters. 5. For static datasets, iterating multiple times as done in multi-epoch training might not be beneficial. It often deteriorates the results, probably due to overfitting. 6. If you're incorporating LoRA, ensure it's applied across all layers, not just to the Key and Value matrices, to maximize model performance. 7. Adjusting the LoRA rank is essential, and so is selecting an apt alpha value. A good heuristic is setting alpha at twice the rank's value. 8. 7B models can be finetuned efficiently within a few hours on a single GPU possessing 14 Gb of RAM. With a static dataset, optimizing an LLM to excel across all benchmark tasks is unattainable. Addressing this requires diverse data sources, or perhaps LoRA might not be the ideal tool.

Lightning AI ⚡️@LightningAI

After hundreds of experiments, @rasbt has figured out how to get the most out of LoRA finetuning 👉 lightning.ai/pages/communit… #LLMs #GenAI #DeepLearning

English

218

1.2K

367.4K

Tasnim Mohiuddin retweetledi

Mark Saroufim@marksaroufim·13 Eyl

Gave a talk on why Llama 13B won't fit on my 4090 - it's an overview of all the main sources of memory overhead and how to reduce each of them Simple for those at the frontier but will help the newbs among us back of the envelope VRAM requirements fast docs.google.com/presentation/d…

English

464

103.2K

Tasnim Mohiuddin retweetledi

Sebastien Bubeck@SebastienBubeck·12 Eyl

How far does one billion parameters take you? As it turns out, pretty far!!! Today we're releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs. For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B

English

170

795

243.9K

Tasnim Mohiuddin retweetledi

Emmanuel Ameisen@mlpowered·21 Şub

Do you want to understand how to train models like ChatGPT and stable-diffusion? Good news, I wrote an illustrated notebook which explains different parallelism approaches and give a functional example for each. I've summarized some takeaways below NB: github.com/hundredblocks/…

English

295

23.6K

Tasnim Mohiuddin@mtaasnim·9 Şub

@BeingMIAkashs @jayleicn Hugging Face demo: huggingface.co/spaces/Salesfo…

English

Tasnim Mohiuddin@mtaasnim·9 Şub

@BeingMIAkashs @jayleicn It seems BLIP-2 already tried to do the future work and their results are quite impressive. arxiv.org/abs/2301.12597

English

Mofijul Islam@BeingMIAkashs·9 Şub

Take-way from @jayleicn talk in Knowledge driven Visual-language pretraining #AAAI23

English

272

Tasnim Mohiuddin retweetledi

Zachary Nado@zacharynado·19 Oca

Excited to announce our Deep Learning Tuning Playbook, a writeup of tips & tricks we employ when designing DL experiments. We use these techniques to deploy numerous large-scale model improvements and hope formalizing them helps the community do the same! github.com/google-researc…

English

593

2.5K

334.3K

Tasnim Mohiuddin retweetledi

Hugging Face@huggingface·28 Kas

To inspire you for our just-released Diffusion Models Course 🎓 with @johnowhitaker we are excited to share the free online event with @hardmaru, @deviparikh, @Buntworthy, @robrombach, @pess_r and @multimodalart on Nov 30th at 18h CET🎋 Register here: huggingface.us17.list-manage.com/subscribe?u=7f…

English

245

Tasnim Mohiuddin retweetledi

Sebastian Ruder@seb_ruder·14 Kas

My new blog post takes a look at the state of multilingual AI. 🌍 How multilingual are current models in NLP, vision, and speech? 🏛 What are the recent contributions in this area? ⛰ What challenges remain and how we can we address them? ruder.io/state-of-multi…

English

118

364

Tasnim Mohiuddin retweetledi

Mark Tenenholtz@marktenenholtz·10 Kas

Over the last year, I've spent 100's of hours training transformers for NLP. I went back over my most successful projects and competitions and distilled them into a solid, repeatable process that anyone can follow. 7 steps to train transformers:

English

220

1.2K

Tasnim Mohiuddin retweetledi

Mihail Eric@mihail_eric·19 Eki

I've been doing a deep dive into prompt engineering for large language models. Here are 12 of the most interesting papers, resources, and write-ups I've found:

English

159

1.1K

Tasnim Mohiuddin retweetledi

Ethan Perez@EthanJPerez·12 Eyl

I wrote up a few paper writing tips that improve the clarity of research papers, while also being easy to implement: ethanperez.net/easy-paper-wri… I collected these during my PhD from various supervisors (mostly @douwekiela @kchonyc, bad tips my own), thought I would share publicly!

English

248

Tasnim Mohiuddin retweetledi

Andrej Karpathy@karpathy·7 Eyl

🎓New (1h57m) video lecture: "The spelled-out intro to language modeling: building makemore". > We build a neural net bigram language model (working up to transformers). Micrograd was fun, now things complexify: tensors, broadcasting, training, sampling.. youtube.com/watch?v=PaCmpy…

YouTube

English

312

2.3K

Tasnim Mohiuddin retweetledi

Rosanne Liu@savvyRL·25 Haz

A quick thread on "How DALL-E 2, Imagen and Parti Architectures Differ" with breakdown into comparable modules, annotated with size 🧵 #dalle2 #imagen #parti * figures taken from corresponding papers with slight modification * parts used for training only are greyed out

English

470

2.1K

Tasnim Mohiuddin retweetledi

Phillip Lippe@phillip_lippe·13 Haz

Are you interested in learning JAX with Flax? We have translated our popular Deep Learning tutorials on CNNs, GNNs, (Vision) Transformers, and more from PyTorch to JAX+Flax, with considerable speedups for smaller models! Check them out here: uvadlc-notebooks.readthedocs.io/en/latest/tuto… 🧵 1/12

English

206

996

Tasnim Mohiuddin retweetledi

Misha Laskin@MishaLaskin·7 Oca

Transformers are arguably the most impactful deep learning architecture from the last 5 yrs. In the next few threads, we’ll cover multi-head attention, GPT and BERT, Vision Transformer, and write these out in code. This thread → understanding multi-head attention. 1/n

English

609

3.2K

Keşfet

@sbmaruf @NTUsg @SDAIA_SA @areebsa @haidarkk1 @ajabal4 @y_alnumay @ehsan_hoque