Imane Momayiz

22 posts

Imane Momayiz

Imane Momayiz

@imomayiz

Katılım Ocak 2022
113 Takip Edilen18 Takipçiler
Imane Momayiz retweetledi
Jéssica Leão
Jéssica Leão@jesslionness·
The perfect hat doesn’t exi-
Jéssica Leão tweet media
English
45
19
797
84.5K
Imane Momayiz retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
Andrej Karpathy tweet media
English
687
3.4K
24.2K
5.8M
Imane Momayiz retweetledi
AtlasIA | أطلسية
AtlasIA | أطلسية@atlasia_ma·
Why this matters: Your support helps us: 🔸 Cover compute to train & fine-tune open models 🔸 Maintain multilingual & regional benchmarks 🔸 Build stronger NLP tools for under-represented languages 🔸 Keep contributing to the global open-source ecosystem #AtlasIA #OpenSource
AtlasIA | أطلسية tweet media
English
0
1
3
328
Imane Momayiz retweetledi
AtlasIA | أطلسية
AtlasIA | أطلسية@atlasia_ma·
This new chapter gives us the foundation to: 🤝 #Collaborate more deeply with labs, orgs & universities ⚡ Access better #infrastructure to train and deploy models 📚 Invest in research & education to lift up more communities 🌱 Keep everything open, transparent & communitydriven
AtlasIA | أطلسية tweet media
English
1
2
3
422
Imane Momayiz retweetledi
AtlasIA | أطلسية
AtlasIA | أطلسية@atlasia_ma·
🚀 Big news today. 𝗔𝘁𝗹𝗮𝘀𝗜𝗔 𝗶𝘀 𝗻𝗼𝘄 𝗼𝗳𝗳𝗶𝗰𝗶𝗮𝗹𝗹𝘆 𝗮𝗻 𝗡𝗚𝗢💫 What started as a small group of passionate builders, researchers & dreamers in Morocco…is growing into something bigger than any of us. #AtlasIA #OpenSourceAI #NonProfit #AI #Research
AtlasIA | أطلسية tweet media
English
2
5
11
1.7K
Imane Momayiz
Imane Momayiz@imomayiz·
Proud to share AtlasOCR 🔥 Personal key takeaways when working on this project: • @deepseek_ai‘s scaling laws held even for low-resource language fine-tuning (see my previous post) • DDP not handled yet in @unslothai 👀 • No need for massive compute for impactful AI
AtlasIA | أطلسية@atlasia_ma

We just shipped #AtlasOCR - the first open-source #OCR model for #Darija (Moroccan Arabic) 🇲🇦 Here's what we built: → Fine-tuned Qwen2.5-VL 3B on 30K Darija samples → Trained on 10M+ words of real Moroccan text → Used QLoRA + @unslothai for crazy efficient training

English
0
0
4
60
Imane Momayiz retweetledi
AtlasIA | أطلسية
AtlasIA | أطلسية@atlasia_ma·
We just shipped #AtlasOCR - the first open-source #OCR model for #Darija (Moroccan Arabic) 🇲🇦 Here's what we built: → Fine-tuned Qwen2.5-VL 3B on 30K Darija samples → Trained on 10M+ words of real Moroccan text → Used QLoRA + @unslothai for crazy efficient training
AtlasIA | أطلسية tweet media
English
1
8
17
1.3K
Imane Momayiz retweetledi
Lisan al Gaib
Lisan al Gaib@scaling01·
American companies are losing market share to chinese open-source companies! Anthropic's coding market share on OpenRouter went from 46% in July down to 32% in a month the reason for it? Qwen3-Coder
Lisan al Gaib tweet media
English
44
54
656
158K
Imane Momayiz retweetledi
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.
English
42
456
3.1K
251.8K
Imane Momayiz
Imane Momayiz@imomayiz·
Stay tuned for the model release and the blogpost 👀
English
0
0
4
69
Imane Momayiz
Imane Momayiz@imomayiz·
𝐋𝐨𝐑𝐀 𝐑𝐚𝐧𝐤/α = 𝟏28: - More parameters meant re-tuning LR/batch size trade-offs (same as DeepSeek). - Similarly to the previous experiment, higher LR helped convergence, until divergence hit (lr @ 2e-3). - Best run: 16 batch @ 2e-4.
Imane Momayiz tweet media
English
1
0
2
95
Imane Momayiz
Imane Momayiz@imomayiz·
One perk of working on @AtlasIA projects: we get to confirm big-lab findings with limited community budget💪 We finetuned Qwen2.5-VL at two scales to find the sweet spot for LR × batch size and saw patterns validating DeepSeek’s scaling laws 📈 (arxiv.org/pdf/2401.02954).
Imane Momayiz tweet media
English
1
5
19
1.1K
Imane Momayiz retweetledi
AtlasIA | أطلسية
AtlasIA | أطلسية@atlasia_ma·
Big news for Moroccan Darija & AI! We’re proud to release Al-Atlas, the first full pretraining dataset & language models suite made just for Moroccan Darija. From tweets to blogs, AI can finally understand & generate Darija like we do.
AtlasIA | أطلسية tweet media
English
3
5
33
1.8K
Imane Momayiz retweetledi
MBZUAI
MBZUAI@mbzuai·
Congrats to the #MBZUAI team for winning the Best Paper Award at the COLING 2025 workshop on Language Models for Low-Resource Languages (LoResLM 2025) for “Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect”. Developed by our France Lab, here’s what you need to know about Atlas-Chat:  * 1st ever LLM for Darija, an Arabic dialect in Morocco  * Atlas-Chat-9B outperforms existing 13B models with a 13% boost on DarijaMMLU * Introduces several LLM benchmark datasets for Darija * Pioneers methodologies for low-resource languages  * All resources are fully open to the public  Link to paper: arxiv.org/pdf/2409.17912 Atlas-Chat is also available on HuggingFace in 2B, 9B, and 27B parameter models: huggingface.co/MBZUAI-Paris/A… #LLM #LLM360 #AI
MBZUAI tweet media
English
2
7
24
1.7K
Imane Momayiz retweetledi
AtlasIA | أطلسية
AtlasIA | أطلسية@atlasia_ma·
Last week, we launched TerjamaBench, Morocco’s first cultural Machine Translation benchmark! It reflects authentic Darija (Moroccan Dialect), covering everyday expressions, technical terms, & regional dialects. Find more details below 👇
AtlasIA | أطلسية tweet mediaAtlasIA | أطلسية tweet media
English
1
9
24
2.2K
Imane Momayiz retweetledi
AtlasIA | أطلسية
AtlasIA | أطلسية@atlasia_ma·
Many benchmarks overlook cultural nuances when it comes to non-Western-centric cultures, leaving low-resource languages like Moroccan Darija underserved.
AtlasIA | أطلسية tweet media
English
1
2
6
194