Sebastian Raschka

19.5K posts

Sebastian Raschka banner
Sebastian Raschka

Sebastian Raschka

@rasbt

ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)

United States Beigetreten Ekim 2012
1.1K Folgt409.7K Follower
Astral
Astral@astral_sh·
Astral has entered into an agreement to join OpenAI as part of the Codex team. astral.sh/blog/openai
English
61
132
1K
182K
Sebastian Raschka
@ajith_io Nice! But tbh I don't know what countries they deliver to, but they probably mention that at checkout
English
1
0
1
228
Ajith
Ajith@ajith_io·
@rasbt That looks awesome! I’ll purchase one. Is delivery available to India?
English
1
0
1
266
Sebastian Raschka retweetet
Sebastian Raschka
The Redbubble poster version just arrived (redbubble.com/i/poster/LLM-A…)! This is the Medium (26.9 x 23.4 in). Font looks good and sharp, but I probably wouldn't go smaller.
Sebastian Raschka tweet media
English
9
8
177
10.8K
Sebastian Raschka
@karpathy Ah nice! Autoresearch makes a lot more sense now, haha. (I was thinking of the cloud compute bills all the time 😆)
English
1
2
59
7.6K
Andrej Karpathy
Andrej Karpathy@karpathy·
Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!
NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English
495
777
17.8K
878.8K
Sebastian Raschka
@Me5466255992308 Yes, that'd be neat. Thinking about it. The challenge is that it is framework and context length specific.
English
0
0
1
43
Sebastian Raschka
@Thomas_Tao_1 Thanks for the feedback. The only problem with VRAM, disk, and quant options is that they are framework specific. Disk is maybe the most obvious or feasible one for a given precision (like bf16)
English
0
0
0
43
Dao
Dao@Thomas_Tao_1·
@rasbt License type is surprisingly useful, saves a lot of back and forth later. I’d add “runs on” requirements too, like VRAM, disk, and quant options if relevant.
English
1
0
0
39
Sebastian Raschka
@futurewithki I was thinking about inference VRAM as well, but this one is so tricky because it depends on the implementation/framework one is using
English
0
0
0
32
Kai Online
Kai Online@futurewithki·
@rasbt License type is crucial for commercial viability! I'd suggest inference VRAM requirements, helping teams instantly gauge hardware costs and deployment feasibility. 🚀
English
1
0
0
29
Taneem Ullah Jan
Taneem Ullah Jan@taneemishere·
@rasbt ahh that could be a discussion but maybe what if we add open-weight with a yes or no?
English
1
0
0
16
Sebastian Raschka
@taneemishere It'd be hard to tell whether it's open-weight or open-source because they may both use the same license like MIT, Apache V2 etc. (But the definition of open-source is tricky, some only call it open source if training code is available under open source licenses as well)
English
1
0
1
41
Taneem Ullah Jan
Taneem Ullah Jan@taneemishere·
@rasbt license could be a good addition, will directly tell if it’s an open source/weight model or not
English
1
0
1
33
Sebastian Raschka
@SalajSonar1086 Already done :). The respective tutorial articles are linked via the “View in Article” links there
English
0
0
1
75
Salaj Sonar
Salaj Sonar@SalajSonar1086·
@rasbt Now time to put together a tutorial for each of these Architectures…
English
1
0
1
58
Paweł Szulc
Paweł Szulc@EncodePanda·
Wait, wait. What now? @rasbt have you seen this?
Nainsi Dwivedi@NainsiDwiv50980

Holy shit... Microsoft open sourced an inference framework that runs a 100B parameter LLM on a single CPU. It's called BitNet. And it does what was supposed to be impossible. No GPU. No cloud. No $10K hardware setup. Just your laptop running a 100-billion parameter model at human reading speed. Here's how it works: Every other LLM stores weights in 32-bit or 16-bit floats. BitNet uses 1.58 bits. Weights are ternary just -1, 0, or +1. That's it. No floats. No expensive matrix math. Pure integer operations your CPU was already built for. The result: - 100B model runs on a single CPU at 5-7 tokens/second - 2.37x to 6.17x faster than llama.cpp on x86 - 82% lower energy consumption on x86 CPUs - 1.37x to 5.07x speedup on ARM (your MacBook) - Memory drops by 16-32x vs full-precision models The wildest part: Accuracy barely moves. BitNet b1.58 2B4T their flagship model was trained on 4 trillion tokens and benchmarks competitively against full-precision models of the same size. The quantization isn't destroying quality. It's just removing the bloat. What this actually means: - Run AI completely offline. Your data never leaves your machine - Deploy LLMs on phones, IoT devices, edge hardware - No more cloud API bills for inference - AI in regions with no reliable internet The model supports ARM and x86. Works on your MacBook, your Linux box, your Windows machine. 27.4K GitHub stars. 2.2K forks. Built by Microsoft Research. 100% Open Source. MIT License

English
1
1
13
4.1K
Sebastian Raschka
@zhuoyuan45514 Nice! Ordered one for myself too and am pretty excited. (not gonna lie, I'm also a bit tempted by the shower curtain option on redbubble 😆)
English
0
0
2
36
Daniel Jiang
Daniel Jiang@zhuoyuan45514·
@rasbt This is such a genius idea!!!! Already ordered a large poster to put up on my wall
English
1
0
1
27
Sebastian Raschka
Also looked into alternative poster print places and also added a Redbubble page (redbubble.com/i/poster/LLM-A…) I am new to this so I don't know which one is better quality-wise. Ordered one from there too so I might be able to tell in a couple of days.
English
2
0
18
4.3K
Sebastian Raschka
Upon popular request, you can now also get this as a physical poster via Zazzle zazzle.com/llm_architectu… This is based on a 56 MB PNG file with 182 megapixels. I just ordered one myself but please be aware that I haven't been able to verify the quality, yet.
English
3
5
90
20.3K
Sebastian Raschka
Looks like my website couldn't handle the traffic... I'll take this as a compliment, haha. Went down a rabbit hole and now added cloudflare to help with caching. Might take a while until the nameserver changes propagate but it should improve from now on.
English
2
0
14
2.8K
Sebastian Raschka
@DnuLkjkjh I'll try to update it with the major ones. E.g., Gemma 4 next week (likely), and DeepSeek V4 any day now 😃
English
0
0
1
607
dnu
dnu@DnuLkjkjh·
@rasbt bookmarking this. how often are you planning to update it? feels like a new architecture drops every week now
English
1
0
0
272