Jay Alammar

2.1K posts

Jay Alammar

Jay Alammar

@JayAlammar

Machine Learning Researcher and writer https://t.co/5GlbofAHs0. O'Reilly Author https://t.co/Fl3uPAZHLg. LLM Builder @Cohere.

Katılım Nisan 2020
1.4K Takip Edilen48.8K Takipçiler
Sabitlenmiş Tweet
Jay Alammar
Jay Alammar@JayAlammar·
We're ecstatic to bring you "How Transformer LLMs Work" -- a free course with ~90 minutes of video, code, and crisp visuals and animations that explain the modern Transformer architecture, tokenizers, embeddings, and mixture-of-expert models. @MaartenGr and I have developed a lot of the visual language over the last several years (tens of thousands of iterations for hundreds of figures) for the book. But to have an opportunity to collaborate with the legendary @AndrewYNg, we took them to the next level with animations and a concise narrative meant to enable technical learners to pick up an ML paper and understand the architecture description. Link in comments
Andrew Ng@AndrewYNg

Announcing How Transformer LLMs Work, created with @JayAlammar and @MaartenGr, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language models (LLMs). The transformer architecture revolutionized generative AI; in fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer." Originally introduced in the Google Brain team's groundbreaking 2017 paper "Attention Is All You Need," by Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, Anthropic and DeepSeek. In this course, you’ll learn in detail how LLMs process text. You'll also work through code examples that illustrate that transformer's individual components. In details, you’ll learn: - How the representation of language has evolved, from Bag-of-Words to Word2Vec embeddings to the transformer architecture that captures a word's meanings taking into account the context of other words in the input. - How inputs are broken down into tokens before they are sent to the language model. - The details of a transformer's main stages: Tokenization and embedding, the stack of transformer blocks, and the language model head. - The inner workings of the transformer block, including attention, which calculates relevance scores, and the feedforward layer, which incorporates stored information learned in training. - How cached calculations make transformers faster. - Some of the most recent ideas in the latest models such as Mixture-of-Experts (MoE) which uses multiple sub-models and a router on each layer to improve the quality of LLMs. By the end of this course, you’ll have a deep understanding of how LLMs actually process text and be able to read through papers describing the latest models and understand the details. Gaining this intuition will improve your approach to building LLM applications. Please sign up here: deeplearning.ai/short-courses/…

English
28
211
1.5K
137K
Jay Alammar retweetledi
mrdoob
mrdoob@mrdoob·
mrdoob tweet media
ZXX
15
286
2.2K
127.4K
Jay Alammar retweetledi
Leland McInnes
Leland McInnes@leland_mcinnes·
EVoC is a library designed specifically for fast clustering of high dimensional embedding vectors. It can produce high quality clusters extremely efficiently, and requires little to no hyperparameter tuning. Better clustering than UMAP + HDBSCAN; faster clustering than KMeans.
English
8
40
185
18.2K
Jay Alammar retweetledi
Daniel San
Daniel San@dani_avila7·
Cohere released a new Transcribe model, so I built a Chrome extension to test it It works two ways: through the API or with a local server API mode: grab a free key from your Cohere account, dashboard.cohere.com/api-keys sign up and you get free tier access Local mode: download the model from Hugging Face, huggingface.co/CohereLabs/coh…, spin up the local server and add your HF access token from here: huggingface.co/settings/tokens Pick either mode and you're good to go, left it open source under MIT: github.com/davila7/cohere… Good weekend project, going to keep exploring where else this can be applied Thanks @nickfrosst and the @cohere team!
Nick Frosst@nickfrosst

@cohere transcribe Sota open source transcription model running in the browser :) Weights on @huggingface link below

English
10
14
74
10.1K
hashim alsharif
hashim alsharif@nothashem·
@wballaa @JayAlammar جداً، وخصوصاً أني متأكد أن الموضوع ممكن لحديث النبي ﷺ: "والذي نفسي بيده، لا تقوم الساعة حتى تكلم السباع الإنس" (أخرجه الترمذي) هذا دليل صادق على نبوة أشرف الخلق
العربية
1
0
5
412
hashim alsharif
hashim alsharif@nothashem·
one of the things I keep thinking about is this: we’re probably 5 years away from understanding animals, maybe even communicating with them I don’t really care about AGI compared to this. humans have lived alongside animals forever without ever understanding them my view comes down to two things: a) do animals actually have language systems? not as a whole, but per species. meaning structured signals, patterns, intent not just random noise. if that’s true, then this becomes a pattern recognition problem. and that’s very solvable with current AI trends, better models, more data, more compute b) how do you build this at scale? so the real question is who’s actually building this
English
2
1
7
1.6K
Jay Alammar
Jay Alammar@JayAlammar·
@imM0hannad @NajwaGhamdi يدعم العربي، ويهمنا نطوره زيادة ونسمع رأي الناس بعد تجربته
العربية
1
1
5
1.3K
Jay Alammar retweetledi
vLLM
vLLM@vllm_project·
🎉 Congrats to @Cohere on releasing Cohere Transcribe, a 2B speech recognition model (Apache 2.0, 14 languages). Day-0 support in vLLM. Cohere contributed encoder-decoder serving optimizations to vLLM: variable-length encoder batching and packed attention for the decoder. Up to 2x throughput improvement for speech workloads, and these gains carry over to all encoder-decoder models on vLLM. Thanks to the @Cohere team for the contribution! PR 🔗 github.com/vllm-project/v… Blog 🔗 huggingface.co/blog/CohereLab…
vLLM tweet media
Cohere@cohere

Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.

English
2
21
204
16.1K
Jay Alammar retweetledi
Pierre Richemond 🇪🇺
Pierre Richemond 🇪🇺@TheOneKloud·
Excited and proud to introduce our latest: Cohere Transcribe, the best dedicated ASR model in the world. #1 EN HF leaderboard, SotA human evals, ahead of ElevenLabs, Qwen3, Mistral, Kyutai, and OpenAI. 14 supported languages. Apache 2.0, on HF for you to try. Our first audio model and a key step in powering North experiences. huggingface.co/CohereLabs/coh…
Pierre Richemond 🇪🇺 tweet media
English
4
23
113
14.6K
Jay Alammar
Jay Alammar@JayAlammar·
@NajwaGhamdi Good names, too! Easy to type, too! I can never find any of my "muhammed/mohamed/muhammad/etc" friends on linkedin.
English
0
0
0
103
نجوى مسفر
نجوى مسفر@NajwaGhamdi·
@JayAlammar I also gave names to my agents and I constantly refer to them as my team ..another future note to the anthropologists :)
نجوى مسفر tweet media
English
1
0
4
597
Jay Alammar
Jay Alammar@JayAlammar·
Blink and you may miss it. Multiple times this week I've heard people (in the industry and out) refer to their LLM with a human pronoun: "ask him", "I told him". Didn't register it as often before this year. It's not even a decade since the Transformer. A note for a future anthropologist
English
14
5
60
8.2K
Jay Alammar retweetledi
Cohere
Cohere@cohere·
We’re honored to be named one of @FastCompany's Most Innovative Companies of 2026! This recognition reflects our commitment to building secure, sovereign AI for enterprises and governments. Over the past year, we’ve deepened our focus on serving the unique needs of highly regulated industries—expanding what organizations can do with their protected data through North, our agentic platform for getting more work done. Learn more: fastcompany.com/91495412/artif…
Cohere tweet media
English
0
8
45
3.3K
Jay Alammar retweetledi
Ivan Zhang
Ivan Zhang@1vnzh·
I'm excited to announce we're working with Saab to bring North onboard to Command the sky! saab.com/newsroom/press…
English
14
84
620
21.2K
Jay Alammar retweetledi
Bharat
Bharat@bharatrunwal2·
Introducing PRISM: Demystifying Retention and Interaction in Mid-Training The modern LLM training pipeline has evolved beyond just pre-training + alignment. State-of-the-art models now insert a critical middle stage "mid-training" where targeted, high-quality data mixtures build reasoning foundations before RL. Yet despite its growing adoption, the field lacks a principled understanding of what actually drives its effectiveness. — What data should you use? — When in the pipeline should you mid-train? — How does it interact with downstream RL? — Does it generalize across architectures and scales? — And beyond benchmarks: what do these stages actually do to the model at the weight and representation level? These questions don't have clear answers in the literature at scale : and the cost of getting them wrong is significant. PRISM is our systematic attempt to answer all of these. Using ~27B high-quality tokens, we run controlled experiments across 7 models · 4 families · 3B–24B parameters, spanning both dense Transformers and attention-Mamba hybrids, measuring what mid-training actually does: to performance, to weights, to representations, and to downstream RL. 🧵 Key findings below. 🌐 bharat-runwal.github.io/PRISM/ 📄 arxiv.org/abs/2603.17074 🤗 Models and Datasets: huggingface.co/PRISM-Midtrain… (coming soon)
GIF
English
4
21
143
42.3K