Sebastian Raschka

19.5K posts

Sebastian Raschka

@rasbt

ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)

United States Beigetreten Ekim 2012

1.1K Folgt409.7K Follower

Angehefteter Tweet

Sebastian Raschka@rasbt·13 Ara

Just updated the Big LLM Architecture Comparison article... ...it grew quite a bit since the initial version in July 2025, more than doubled! magazine.sebastianraschka.com/p/the-big-llm-…

English

528

2.9K

359.9K

Sebastian Raschka@rasbt·5h

@astral_sh uv is one of my favorite Python tools. Keep up the good work!

English

699

Astral@astral_sh·13h

Astral has entered into an agreement to join OpenAI as part of the Codex team. astral.sh/blog/openai

English

132

182K

Sebastian Raschka@rasbt·8h

@ajith_io Nice! But tbh I don't know what countries they deliver to, but they probably mention that at checkout

English

228

Ajith@ajith_io·8h

@rasbt That looks awesome! I’ll purchase one. Is delivery available to India?

English

266

Sebastian Raschka@rasbt·4d

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

197

1.5K

8.2K

689.9K

Sebastian Raschka retweetet

Sebastian Raschka@rasbt·8h

The Redbubble poster version just arrived (redbubble.com/i/poster/LLM-A…)! This is the Medium (26.9 x 23.4 in). Font looks good and sharp, but I probably wouldn't go smaller.

English

177

10.8K

Sebastian Raschka@rasbt·1d

@karpathy Ah nice! Autoresearch makes a lot more sense now, haha. (I was thinking of the cloud compute bills all the time 😆)

English

7.6K

Andrej Karpathy@karpathy·1d

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!

NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

495

777

17.8K

878.8K

Sebastian Raschka@rasbt·2d

Oh wow, Mamba-3 is here! For me, the most interesting use case of Mamba and Mamba-likes are the recent transformer attention hybrid architectures (Qwen3.5, Kimi Linear, etc.) Would be interesting to swap Gated DeltaNet with Mamba-3 (which now also has RoPE) in next gen hybrids.

Albert Gu@_albertgu

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

English

135

981

71.9K

Sebastian Raschka@rasbt·2d

@Me5466255992308 Yes, that'd be neat. Thinking about it. The challenge is that it is framework and context length specific.

English

Wilkins Micawber@Me5466255992308·2d

@rasbt Estimated VRAM size

English

Sebastian Raschka@rasbt·2d

@Thomas_Tao_1 Thanks for the feedback. The only problem with VRAM, disk, and quant options is that they are framework specific. Disk is maybe the most obvious or feasible one for a given precision (like bf16)

English

Dao@Thomas_Tao_1·2d

@rasbt License type is surprisingly useful, saves a lot of back and forth later. I’d add “runs on” requirements too, like VRAM, disk, and quant options if relevant.

English

Sebastian Raschka@rasbt·2d

@futurewithki I was thinking about inference VRAM as well, but this one is so tricky because it depends on the implementation/framework one is using

English

Kai Online@futurewithki·2d

@rasbt License type is crucial for commercial viability! I'd suggest inference VRAM requirements, helping teams instantly gauge hardware costs and deployment feasibility. 🚀

English

Sebastian Raschka@rasbt·2d

@taneemishere I see. In that case they'd be all open-weight 😊

English

Taneem Ullah Jan@taneemishere·2d

@rasbt ahh that could be a discussion but maybe what if we add open-weight with a yes or no?

English

Sebastian Raschka@rasbt·2d

@taneemishere It'd be hard to tell whether it's open-weight or open-source because they may both use the same license like MIT, Apache V2 etc. (But the definition of open-source is tricky, some only call it open source if training code is available under open source licenses as well)

English

Taneem Ullah Jan@taneemishere·2d

@rasbt license could be a good addition, will directly tell if it’s an open source/weight model or not

English

Sebastian Raschka@rasbt·2d

@SalajSonar1086 Already done :). The respective tutorial articles are linked via the “View in Article” links there

English

Salaj Sonar@SalajSonar1086·2d

@rasbt Now time to put together a tutorial for each of these Architectures…

English

Sebastian Raschka@rasbt·2d

@EncodePanda Yeah I tried it a few months ago and I’d didn’t work well on Mac, but they probably improved it since then

Sebastian Raschka@rasbt

@sachindetrax @mattturck Was just giving it a try on mac (following the instructions on github.com/microsoft/BitN…)... results look wild. My guess is something in the compilation went wrong or mac is not supported:

English

3.5K

Paweł Szulc@EncodePanda·2d

Wait, wait. What now? @rasbt have you seen this?

Nainsi Dwivedi@NainsiDwiv50980

Holy shit... Microsoft open sourced an inference framework that runs a 100B parameter LLM on a single CPU. It's called BitNet. And it does what was supposed to be impossible. No GPU. No cloud. No $10K hardware setup. Just your laptop running a 100-billion parameter model at human reading speed. Here's how it works: Every other LLM stores weights in 32-bit or 16-bit floats. BitNet uses 1.58 bits. Weights are ternary just -1, 0, or +1. That's it. No floats. No expensive matrix math. Pure integer operations your CPU was already built for. The result: - 100B model runs on a single CPU at 5-7 tokens/second - 2.37x to 6.17x faster than llama.cpp on x86 - 82% lower energy consumption on x86 CPUs - 1.37x to 5.07x speedup on ARM (your MacBook) - Memory drops by 16-32x vs full-precision models The wildest part: Accuracy barely moves. BitNet b1.58 2B4T their flagship model was trained on 4 trillion tokens and benchmarks competitively against full-precision models of the same size. The quantization isn't destroying quality. It's just removing the bloat. What this actually means: - Run AI completely offline. Your data never leaves your machine - Deploy LLMs on phones, IoT devices, edge hardware - No more cloud API bills for inference - AI in regions with no reliable internet The model supports ARM and x86. Works on your MacBook, your Linux box, your Windows machine. 27.4K GitHub stars. 2.2K forks. Built by Microsoft Research. 100% Open Source. MIT License

English

4.1K

Sebastian Raschka@rasbt·3d

@maxmbeck Nice one, will add this week!

English

1.2K

Maximilian Beck@maxmbeck·3d

Thanks @rasbt for the great overview — and for leaving a little spot for xLSTM 7B 😉 📄Paper: arxiv.org/abs/2503.13427

Sebastian Raschka@rasbt

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

6.5K

Sebastian Raschka@rasbt·3d

@zhuoyuan45514 Nice! Ordered one for myself too and am pretty excited. (not gonna lie, I'm also a bit tempted by the shower curtain option on redbubble 😆)

English

Daniel Jiang@zhuoyuan45514·3d

@rasbt This is such a genius idea!!!! Already ordered a large poster to put up on my wall

English

Sebastian Raschka@rasbt·3d

Also looked into alternative poster print places and also added a Redbubble page (redbubble.com/i/poster/LLM-A…) I am new to this so I don't know which one is better quality-wise. Ordered one from there too so I might be able to tell in a couple of days.

English

4.3K

Sebastian Raschka@rasbt·4d

Upon popular request, you can now also get this as a physical poster via Zazzle zazzle.com/llm_architectu… This is based on a 56 MB PNG file with 182 megapixels. I just ordered one myself but please be aware that I haven't been able to verify the quality, yet.

English

20.3K

Sebastian Raschka@rasbt·3d

Looks like my website couldn't handle the traffic... I'll take this as a compliment, haha. Went down a rabbit hole and now added cloudflare to help with caching. Might take a while until the nameserver changes propagate but it should improve from now on.

English

2.8K

Sebastian Raschka@rasbt·4d

cc @santiviquez @KyleCranmer @VeerarajuE @maharshi365 @maharshi365 @attnisalluneed @pop_krisss @vo_d_p since some of you asked for it, hinted at it, or suggested it 😆

English

11.5K

Sebastian Raschka@rasbt·4d

@DnuLkjkjh I'll try to update it with the major ones. E.g., Gemma 4 next week (likely), and DeepSeek V4 any day now 😃

English

607