Tasnim Mohiuddin

95 posts

Tasnim Mohiuddin

Tasnim Mohiuddin

@mtaasnim

Scientist, QCRI @QatarComputing || Previously Research Intern @MetaAI Training LLMs || Core Contributor to @Ai_Fanar

Doha, Qatar Katılım Ekim 2010
626 Takip Edilen88 Takipçiler
Tasnim Mohiuddin retweetledi
M Saiful Bari (MARUF)
M Saiful Bari (MARUF)@sbmaruf·
Excited to share that I’ve been recognized as an "Innovator Under 35" by MIT Technology Review from MENA Region! After earning my Ph.D. from @NTUsg, Singapore, I joined the "National Center for Artificial Intelligence (NCAI), SDAIA" (@SDAIA_SA) to work on ALLaM, Arabic Large Language Model, a nationwide initiative aimed at developing Sovereign LLM. We were among the first few organizations to successfully scale the both pretraining and continuous pretraining. arxiv.org/abs/2407.15390 A massive shoutout to my incredible manager @areebsa and mentor @haidarkk1 for their unwavering support (and for tolerating my endless YOLOs and arguments!). Thanks to @mtaasnim, @ajabal4 and @y_alnumay who still put up with me. The saddest part during this year was when @haidarkk1 left NCAI, I felt like a part of me just vanished, it was very difficult initially. I don't have anyone to do pointless arguments. :( I genuinely believe NCAI, SDAIA is on the path to achieving Artificial Superintelligence (ASI) depending on few small but critical factors. The gap with the frontier labs might just be 12–18 months. Finally truely grateful to our former Chief Scientist @ehsan_hoque for his inspiration and constant support and my academic parent @JotyShafiq just be there for me all the time. "Ganbare kuruko, akiramenaide"
إم آي تي تكنولوجي ريفيو@TechReviewAR

#سيف_الباري من #بنغلاديش الفائز بجائزة #مبتكرون_دون35 العالمية في نسختها السابعة لعام 2024، وذلك لابتكاره "علّام (نموذج اللغة العربية الكبير)، نموذج ذكاء اصطناعي متقدم يحافظ على الفروق اللغوية والثقافية للغة العربية، مما يتيح تطبيقات شاملة في مجالات التعليم والرعاية الصحية والخدمات الحكومية". @sbmaruf cstu.io/41d727

English
11
2
34
4.5K
Tasnim Mohiuddin retweetledi
Thomas Wolf
Thomas Wolf@Thom_Wolf·
[75min talk] i finally recorded this lecture I gave two weeks ago because people kept asking me for a video so here it is, enjoy "The Little guide to building Large Language Models in 2024" tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports
Thomas Wolf tweet media
English
14
236
1.3K
125.8K
Tasnim Mohiuddin retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
I ran hundreds if not thousands of LoRA & QLoRA experiments to finetune open-source LLMs, and here’s what I learned: 1. Despite the inherent randomness of LLM training (or when training models on GPUs in general), the outcomes remain remarkably consistent across multiple runs. 2. QLoRA presents a trade-off that might be worthwhile if you're constrained by GPU memory. It offers 33% memory savings at the cost of a 33% increase in runtime. 3. When finetuning LLMs, the choice of optimizer shouldn't be a major concern. While SGD on its own is suboptimal, there's minimal variation in outcomes whether you employ AdamW, SGD with a scheduler, or AdamW with a scheduler. 4. While Adam is often labeled a memory-intensive optimizer due to its introduction of two new parameters for every model parameter, this doesn't significantly affect the peak memory demands of the LLM. This is because the majority of the memory is allocated for large matrix multiplications rather than retaining extra parameters. 5. For static datasets, iterating multiple times as done in multi-epoch training might not be beneficial. It often deteriorates the results, probably due to overfitting. 6. If you're incorporating LoRA, ensure it's applied across all layers, not just to the Key and Value matrices, to maximize model performance. 7. Adjusting the LoRA rank is essential, and so is selecting an apt alpha value. A good heuristic is setting alpha at twice the rank's value. 8. 7B models can be finetuned efficiently within a few hours on a single GPU possessing 14 Gb of RAM. With a static dataset, optimizing an LLM to excel across all benchmark tasks is unattainable. Addressing this requires diverse data sources, or perhaps LoRA might not be the ideal tool.
Lightning AI ⚡️@LightningAI

After hundreds of experiments, @rasbt has figured out how to get the most out of LoRA finetuning 👉 lightning.ai/pages/communit… #LLMs #GenAI #DeepLearning

English
27
218
1.2K
367.4K
Tasnim Mohiuddin retweetledi
Mark Saroufim
Mark Saroufim@marksaroufim·
Gave a talk on why Llama 13B won't fit on my 4090 - it's an overview of all the main sources of memory overhead and how to reduce each of them Simple for those at the frontier but will help the newbs among us back of the envelope VRAM requirements fast docs.google.com/presentation/d…
English
16
71
464
103.2K
Tasnim Mohiuddin retweetledi
Sebastien Bubeck
Sebastien Bubeck@SebastienBubeck·
How far does one billion parameters take you? As it turns out, pretty far!!! Today we're releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs. For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B
Sebastien Bubeck tweet media
English
27
170
795
243.9K
Tasnim Mohiuddin retweetledi
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
Do you want to understand how to train models like ChatGPT and stable-diffusion? Good news, I wrote an illustrated notebook which explains different parallelism approaches and give a functional example for each. I've summarized some takeaways below NB: github.com/hundredblocks/…
English
4
57
295
23.6K
Tasnim Mohiuddin retweetledi
Zachary Nado
Zachary Nado@zacharynado·
Excited to announce our Deep Learning Tuning Playbook, a writeup of tips & tricks we employ when designing DL experiments. We use these techniques to deploy numerous large-scale model improvements and hope formalizing them helps the community do the same! github.com/google-researc…
Zachary Nado tweet media
English
26
593
2.5K
334.3K
Tasnim Mohiuddin retweetledi
Sebastian Ruder
Sebastian Ruder@seb_ruder·
My new blog post takes a look at the state of multilingual AI. 🌍 How multilingual are current models in NLP, vision, and speech? 🏛 What are the recent contributions in this area? ⛰ What challenges remain and how we can we address them? ruder.io/state-of-multi…
English
6
118
364
0
Tasnim Mohiuddin retweetledi
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
Over the last year, I've spent 100's of hours training transformers for NLP. I went back over my most successful projects and competitions and distilled them into a solid, repeatable process that anyone can follow. 7 steps to train transformers:
English
26
220
1.2K
0
Tasnim Mohiuddin retweetledi
Mihail Eric
Mihail Eric@mihail_eric·
I've been doing a deep dive into prompt engineering for large language models. Here are 12 of the most interesting papers, resources, and write-ups I've found:
English
20
159
1.1K
0
Tasnim Mohiuddin retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
I wrote up a few paper writing tips that improve the clarity of research papers, while also being easy to implement: ethanperez.net/easy-paper-wri… I collected these during my PhD from various supervisors (mostly @douwekiela @kchonyc, bad tips my own), thought I would share publicly!
English
1
50
248
0
Tasnim Mohiuddin retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
🎓New (1h57m) video lecture: "The spelled-out intro to language modeling: building makemore". > We build a neural net bigram language model (working up to transformers). Micrograd was fun, now things complexify: tensors, broadcasting, training, sampling.. youtube.com/watch?v=PaCmpy…
YouTube video
YouTube
English
28
312
2.3K
0
Tasnim Mohiuddin retweetledi
Rosanne Liu
Rosanne Liu@savvyRL·
A quick thread on "How DALL-E 2, Imagen and Parti Architectures Differ" with breakdown into comparable modules, annotated with size 🧵 #dalle2 #imagen #parti * figures taken from corresponding papers with slight modification * parts used for training only are greyed out
Rosanne Liu tweet media
English
19
470
2.1K
0
Tasnim Mohiuddin retweetledi
Phillip Lippe
Phillip Lippe@phillip_lippe·
Are you interested in learning JAX with Flax? We have translated our popular Deep Learning tutorials on CNNs, GNNs, (Vision) Transformers, and more from PyTorch to JAX+Flax, with considerable speedups for smaller models! Check them out here: uvadlc-notebooks.readthedocs.io/en/latest/tuto… 🧵 1/12
Phillip Lippe tweet media
English
9
206
996
0
Tasnim Mohiuddin retweetledi
Misha Laskin
Misha Laskin@MishaLaskin·
Transformers are arguably the most impactful deep learning architecture from the last 5 yrs. In the next few threads, we’ll cover multi-head attention, GPT and BERT, Vision Transformer, and write these out in code. This thread → understanding multi-head attention. 1/n
Misha Laskin tweet media
English
22
609
3.2K
0