Tasnim Mohiuddin
95 posts

Tasnim Mohiuddin
@mtaasnim
Scientist, QCRI @QatarComputing || Previously Research Intern @MetaAI Training LLMs || Core Contributor to @Ai_Fanar
Doha, Qatar Katılım Ekim 2010
626 Takip Edilen88 Takipçiler

Congrats @mtaasnim for the amazing work on Fanar. So proud of you. Who knew the two corner desk at MICL lab would be worth Millions in an year!! Now it's time to make it Billion.
arxiv.org/pdf/2501.13944

English
Tasnim Mohiuddin retweetledi

Excited to share that I’ve been recognized as an "Innovator Under 35" by MIT Technology Review from MENA Region!
After earning my Ph.D. from @NTUsg, Singapore, I joined the "National Center for Artificial Intelligence (NCAI), SDAIA" (@SDAIA_SA) to work on ALLaM, Arabic Large Language Model, a nationwide initiative aimed at developing Sovereign LLM. We were among the first few organizations to successfully scale the both pretraining and continuous pretraining. arxiv.org/abs/2407.15390
A massive shoutout to my incredible manager @areebsa and mentor @haidarkk1 for their unwavering support (and for tolerating my endless YOLOs and arguments!). Thanks to @mtaasnim, @ajabal4 and @y_alnumay who still put up with me. The saddest part during this year was when @haidarkk1 left NCAI, I felt like a part of me just vanished, it was very difficult initially. I don't have anyone to do pointless arguments. :(
I genuinely believe NCAI, SDAIA is on the path to achieving Artificial Superintelligence (ASI) depending on few small but critical factors. The gap with the frontier labs might just be 12–18 months.
Finally truely grateful to our former Chief Scientist @ehsan_hoque for his inspiration and constant support and my academic parent @JotyShafiq just be there for me all the time.
"Ganbare kuruko, akiramenaide"
إم آي تي تكنولوجي ريفيو@TechReviewAR
#سيف_الباري من #بنغلاديش الفائز بجائزة #مبتكرون_دون35 العالمية في نسختها السابعة لعام 2024، وذلك لابتكاره "علّام (نموذج اللغة العربية الكبير)، نموذج ذكاء اصطناعي متقدم يحافظ على الفروق اللغوية والثقافية للغة العربية، مما يتيح تطبيقات شاملة في مجالات التعليم والرعاية الصحية والخدمات الحكومية". @sbmaruf cstu.io/41d727
English
Tasnim Mohiuddin retweetledi

[75min talk] i finally recorded this lecture I gave two weeks ago because people kept asking me for a video
so here it is, enjoy "The Little guide to building Large Language Models in 2024"
tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports

English
Tasnim Mohiuddin retweetledi

I ran hundreds if not thousands of LoRA & QLoRA experiments to finetune open-source LLMs, and here’s what I learned:
1. Despite the inherent randomness of LLM training (or when training models on GPUs in general), the outcomes remain remarkably consistent across multiple runs.
2. QLoRA presents a trade-off that might be worthwhile if you're constrained by GPU memory. It offers 33% memory savings at the cost of a 33% increase in runtime.
3. When finetuning LLMs, the choice of optimizer shouldn't be a major concern. While SGD on its own is suboptimal, there's minimal variation in outcomes whether you employ AdamW, SGD with a scheduler, or AdamW with a scheduler.
4. While Adam is often labeled a memory-intensive optimizer due to its introduction of two new parameters for every model parameter, this doesn't significantly affect the peak memory demands of the LLM. This is because the majority of the memory is allocated for large matrix multiplications rather than retaining extra parameters.
5. For static datasets, iterating multiple times as done in multi-epoch training might not be beneficial. It often deteriorates the results, probably due to overfitting.
6. If you're incorporating LoRA, ensure it's applied across all layers, not just to the Key and Value matrices, to maximize model performance.
7. Adjusting the LoRA rank is essential, and so is selecting an apt alpha value. A good heuristic is setting alpha at twice the rank's value.
8. 7B models can be finetuned efficiently within a few hours on a single GPU possessing 14 Gb of RAM.
With a static dataset, optimizing an LLM to excel across all benchmark tasks is unattainable. Addressing this requires diverse data sources, or perhaps LoRA might not be the ideal tool.
Lightning AI ⚡️@LightningAI
After hundreds of experiments, @rasbt has figured out how to get the most out of LoRA finetuning 👉 lightning.ai/pages/communit… #LLMs #GenAI #DeepLearning
English
Tasnim Mohiuddin retweetledi

Gave a talk on why Llama 13B won't fit on my 4090 - it's an overview of all the main sources of memory overhead and how to reduce each of them
Simple for those at the frontier but will help the newbs among us back of the envelope VRAM requirements fast
docs.google.com/presentation/d…
English
Tasnim Mohiuddin retweetledi
Tasnim Mohiuddin retweetledi

Do you want to understand how to train models like ChatGPT and stable-diffusion?
Good news, I wrote an illustrated notebook which explains different parallelism approaches and give a functional example for each.
I've summarized some takeaways below
NB: github.com/hundredblocks/…
English

@BeingMIAkashs @jayleicn It seems BLIP-2 already tried to do the future work and their results are quite impressive.
arxiv.org/abs/2301.12597
English
Tasnim Mohiuddin retweetledi

Excited to announce our Deep Learning Tuning Playbook, a writeup of tips & tricks we employ when designing DL experiments. We use these techniques to deploy numerous large-scale model improvements and hope formalizing them helps the community do the same! github.com/google-researc…

English
Tasnim Mohiuddin retweetledi

To inspire you for our just-released Diffusion Models Course 🎓 with @johnowhitaker
we are excited to share the free online event with @hardmaru, @deviparikh, @Buntworthy, @robrombach, @pess_r and @multimodalart on Nov 30th at 18h CET🎋
Register here: huggingface.us17.list-manage.com/subscribe?u=7f…

English
Tasnim Mohiuddin retweetledi

My new blog post takes a look at the state of multilingual AI.
🌍 How multilingual are current models in NLP, vision, and speech?
🏛 What are the recent contributions in this area?
⛰ What challenges remain and how we can we address them?
ruder.io/state-of-multi…
English
Tasnim Mohiuddin retweetledi
Tasnim Mohiuddin retweetledi
Tasnim Mohiuddin retweetledi

I wrote up a few paper writing tips that improve the clarity of research papers, while also being easy to implement: ethanperez.net/easy-paper-wri…
I collected these during my PhD from various supervisors (mostly @douwekiela @kchonyc, bad tips my own), thought I would share publicly!
English
Tasnim Mohiuddin retweetledi

🎓New (1h57m) video lecture: "The spelled-out intro to language modeling: building makemore".
> We build a neural net bigram language model (working up to transformers). Micrograd was fun, now things complexify: tensors, broadcasting, training, sampling.. youtube.com/watch?v=PaCmpy…

YouTube
English
Tasnim Mohiuddin retweetledi
Tasnim Mohiuddin retweetledi

Are you interested in learning JAX with Flax? We have translated our popular Deep Learning tutorials on CNNs, GNNs, (Vision) Transformers, and more from PyTorch to JAX+Flax, with considerable speedups for smaller models! Check them out here: uvadlc-notebooks.readthedocs.io/en/latest/tuto…
🧵 1/12

English
Tasnim Mohiuddin retweetledi









