Benjamin F Spector

155 posts

Benjamin F Spector

Benjamin F Spector

@bfspector

stanford cs phd student. i make ml go brr.

Katılım Ekim 2020
281 Takip Edilen10.6K Takipçiler
Benjamin F Spector retweetledi
will depue
will depue@willdepue·
megakernels remain underrated. if you haven’t dug into them before go look them up! flappy seems to be hinting at some really powerful training megakernel stuff which is sick ex: fully contained training megakernel could be great for automated research
Flapping Airplanes@flappyairplanes

(4/5) One thing we’ve built is a “kittens” virtual machine that takes over the whole GPU and allows new kinds of co-optimization. We can go past the traditional sequential kernel model – for example, fusing entire training runs into a single kernel and even weirder stuff.

English
8
19
514
52.1K
Benjamin F Spector retweetledi
Hayden Prairie
Hayden Prairie@hayden_prairie·
We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇
Hayden Prairie tweet media
English
41
179
1.3K
291.3K
Benjamin F Spector
Benjamin F Spector@bfspector·
@dylan522p @SemiAnalysis_ @Kurnalsalts Ah I meant they'd get physically larger to allow for lower power consumption. The whole thing has to run on like 100W or something? Including memory and whatnot. But this was an offhand comment, I am far from confident I am right.
English
0
0
2
126
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
NVIDIA's first GPU on TSMC 3nm shows an unexpected area increase compared to 4nm Blackwell! Thanks to @Kurnalsalts, the NVIDIA GB10 dieshot shows that GPC (12SM) area increased 12.5%, TPC area increased 16.7%, and SM area increased by 13.5%. NVIDIA has confirmed multiple times that both dies on GB10 are on TSMC's more expensive 3nm process, so this significant scaling regression is shocking. What was the Physical Design Team doing when porting Blackwell from 4nm to 3nm?
SemiAnalysis tweet media
English
29
47
542
66.7K
Benjamin F Spector
Benjamin F Spector@bfspector·
@elliotarledge @mttrdmnd Yeah you want interaction for computation so you need non-integer spin to get Pauli, but you want non-interaction for communication so you can stack all of your nats, and then integer spins are better.
English
0
0
5
97
Elliot Arledge
Elliot Arledge@elliotarledge·
why not just do matmuls in light and keep ops like topk and exp to electrons?
English
4
1
16
2.3K
Benjamin F Spector
Benjamin F Spector@bfspector·
@mcxfrank Definitely interested! Will be down at Stanford tomorrow, any time after 3:30pm PT work?
English
2
0
24
2.2K
Benjamin F Spector
Benjamin F Spector@bfspector·
@karpathy has been so incredibly generous with his advice, time, and support, and @amspector100 and I are incredibly grateful! Getting to work with great people is, as always, the best part of the job.
Andrej Karpathy@karpathy

A conventional narrative you might come across is that AI is too far along for a new, research-focused startup to outcompete and outexecute the incumbents of AI. This is exactly the sentiment I listened to often when OpenAI started ("how could the few of you possibly compete with Google?") and 1) it was very wrong, and then 2) it was very wrong again with a whole another round of startups who are now challenging OpenAI in turn, and imo it still continues to be wrong today. Scaling and locally improving what works will continue to create incredible advances, but with so much progress unlocked so quickly, with so much dust thrown up in the air in the process, and with still a large gap between frontier LLMs and the example proof of the magic of a mind running on 20 watts, the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) imo still feels very high - plenty high to continue to bet on and look for. The tricky part ofc is creating the conditions where such breakthroughs may be discovered. I think such an environment comes together rarely, but @bfspector & @amspector100 are brilliant, with (rare) full-stack understanding of LLMs top (math/algorithms) to bottom (megakernels/related), they have a great eye for talent and I think will be able to build something very special. Congrats on the launch and I look forward to what you come up with!

English
4
1
66
4.6K
Benjamin F Spector
Benjamin F Spector@bfspector·
Very grateful and excited to be working together every week!
mark xu@marklxu1

The best part of my job is the privilege to partner with extraordinary founders. It’s especially rewarding when those founders are people you’ve long admired and respected. I’ve known @amspector100 since our college days, and I’ve gotten to know his brother, @bfspector, through the Prod community, where Ben helped shape a generation of founders and young talent. So when Asher mentioned to me on a walk that he and Ben were thinking about starting something together, I could barely contain my excitement. It felt like a moment that had been building for an eternity. Today, that conviction has taken shape as Flapping Airplanes, a new foundational AI research lab led by Ben, Asher, and @aidanmantine, exploring radically more data-efficient approaches to learning. We’re thrilled to co-lead this investment alongside @GVteam and @sequoia, and to partner with my dear friends Ben, Asher, Aidan and the rest of their all-star team.

English
3
3
68
41.3K
levi
levi@levidiamode·
@bfspector @HazyResearch If you ever find the time to teach more, it'd be amazing if you could do your own version of MIT 6.S894 The CS336 sections that @tatsu_hashimoto did on GPUs etc were great but feel like a standalone course could go much deeper, especially on current research like ThunderKittens
English
1
0
2
167
levi
levi@levidiamode·
Day 13/365 of GPU Programming Watched this super underrated talk by @bfspector on AI hardware from various levels of abstraction One of the best GPU resources I've come across since starting my journey Ben really has a knack for teaching and cuts through all the bs out there
levi tweet medialevi tweet medialevi tweet medialevi tweet media
levi@levidiamode

Day 12/365 of GPU Programming Studied GPU hierarchy in terms of GPCs, TPCs, SMs, etc on various Nvidia architectures Also pretty interesting to see what's on the hardware level vs pure software abstractions

English
2
1
7
2.2K
Benjamin F Spector retweetledi
Ricursive Intelligence
Ricursive Intelligence@RicursiveAI·
Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com
English
49
151
1.1K
490.1K