Sebastian Raschka

19.7K posts

Sebastian Raschka banner
Sebastian Raschka

Sebastian Raschka

@rasbt

ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)

United States Katılım Ekim 2012
1.1K Takip Edilen445.1K Takipçiler
Sebastian Raschka
Meta observation: DeepSeek is still king of the active-parameter ratio
Sebastian Raschka tweet media
English
17
31
291
44.6K
Sebastian Raschka
@MRashadnow In theory, yes. But then, I am not sure how much of a signal this is here. And I don't want to suggest that %active is the only variable determining quality. I.e., you would get vastly different numbers for K2, K2.5, K2.6 even though they have the same 3.2% active ratio.
English
0
0
2
781
Mohamed Rashad
Mohamed Rashad@MRashadnow·
@rasbt I think a column for quality or overall intelligince of the model will paint the picture better
English
1
0
1
841
Alex Goodman
Alex Goodman@AlexTheGoodman·
@rasbt I was unaware of so many attention variants. MLA appears to be becoming more common?
English
1
0
2
631
Sebastian Raschka
@yechan_ai Thanks for the link and ping. I try to focus on those with open weights these days but I’ll add this to the list for the future
English
0
0
2
40
Yechan Do
Yechan Do@yechan_ai·
@rasbt Big fan of your works Any interests on DLM?
Yechan Do@yechan_ai

Came across Cola-DLM(hongcanguo.github.io/Cola-DLM/) from ByteDance. A hierarchical continuous latent diffusion LM that separates global semantic planning (DiT in latent space) from local token realization (VAE decoder). Paper is out, but no code and no HF model yet. So I reproduced it from scratch. Happy to share with anyone interested 👇

English
2
0
4
210
Sebastian Raschka
A little talk on what we can learn from implementing LLM architectures from scratch in Python and PyTorch. And how I approach new open-weight models, compare them against reference implementations etc: youtube.com/watch?v=TXzQ7P…
YouTube video
YouTube
English
21
155
953
65.1K
Sebastian Raschka
@yechan_ai Thanks for the link and ping. I try to focus on those with open weights these days but I’ll add this to the list
English
1
0
2
96
engineer cat 🐈
engineer cat 🐈@MLCatttt·
@rasbt does anyone actually have a reliable workflow for picking the right reference impl when there are 3-4 candidates floating around. would love to see how you triage
English
1
0
1
309
Sebastian Raschka
@bygregorr historically, this (and norm layers) is probably where I spent most of debugging time
English
0
0
1
279
Gregor
Gregor@bygregorr·
@rasbt 6 hours debugging positional embeddings. still don't fully get it.
English
2
0
2
305
Sebastian Raschka
@140ismymax Yes! It would be interesting whether this happens in GQA:SWA models only though or also in MLA or Gated DeltaNet models, for example.
English
0
0
2
467
mark seery
mark seery@140ismymax·
Hi @rasbt really enjoyed your comments on sliding context windows in different models. Given all the talk over the last few months about repeating information in prompts, and the information in the front of the prompt getting lost, this has to be one of the most interesting aspects of LLMs at the moment.
English
1
0
3
605
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
My wife after watching some pop-investing reel or something: > Lucas, you bought SanDisk a bit over a year ago, right??? Me: yeah Her, all excited: how much?? Me: lemme check, but i think 2Tb The SanDisk i bought:
Lucas Beyer (bl16) tweet media
English
64
43
6K
1.2M
Sebastian Raschka
@willdepue @ChiefScientist You are joking but I use an hdmi version of this for OpenClaw. But to be fair, you could just use software (caffeinate, amphetamine apps) to keep the laptop awake.
English
2
0
36
7.4K
will depue
will depue@willdepue·
Tired of holding your laptop half open to keep your agents running? Introducing AgentPlug: A USB-C dummy plug that keeps your Mac in clamshell mode by pretending to be an external display! No commands, no security worries (just pull it out to stop!), no hassle.
will depue tweet mediawill depue tweet media
English
484
186
7.5K
1.2M
Sebastian Raschka
@Tiaanmin @TSMCCruz Thanks, but there are no open weights yet, right? Asking because it would be impossible to cover architecture details without open weights and/or a detailed technical report
English
1
0
0
62
Sebastian Raschka
Back from a little family break! Lots has happened, and I’m planning to do a deeper dive into the most interesting architectural components (soon). Btw, are there any major architectures I missed below?
Sebastian Raschka tweet media
English
29
53
446
31.8K
Sebastian Raschka
@TSMCCruz Awesome, thanks! Looks like Ernie 5.1 & SubQ are not available yet, but the other ones look interesting.
English
1
0
3
461
TSMCCruz
TSMCCruz@TSMCCruz·
@rasbt Also these, but some are hybrids: -Jamba 2 (AI21 Labs) -LFM2.5 (Liquid AI) -ZAYA1 (Zyphra)
English
1
0
0
94