Dan Woods

214 posts

Dan Woods banner
Dan Woods

Dan Woods

@danveloper

Vice President of AI Platforms for CVS Health. Former CTO for @JoeBiden.

Beigetreten Mart 2011
770 Folgt7.7K Follower
Dan Woods
Dan Woods@danveloper·
We’re gonna see someone running a 1T model at 100tok/s on a $2500 laptop by like a month from now. Long Apple, this is the best AI inference platform.
English
9
7
103
7K
Alex Cheema
Alex Cheema@alexocheema·
@danveloper Close. We will see someone running a 1T model on $16,000 of heterogeneous hardware (Apple Silicon + NVIDIA) at 100tok/sec this year.
English
2
0
17
963
Dan Woods
Dan Woods@danveloper·
@danpacary You're asking how much of your ANE training work you can port straight to inference? You'd know more than me on that, but I'd reckon all of it. Unless there's actually a transition boundary from unified memory to ANE SDRAM, but I don't think there is.
English
2
0
1
29
Daniel Isaac
Daniel Isaac@danpacary·
@danveloper That’s what I was thinking. How much of the work can I port straight to inference
English
1
0
0
15
Dan Woods
Dan Woods@danveloper·
@danpacary pure inference, correct... although 🤔 no reason you *couldn't* do it for training... the ANE shares the same unified memory, so nothing stopping you from adapting your ANE training code to use the streaming code
English
1
0
1
72
Dan Woods
Dan Woods@danveloper·
@danpacary Technically your only param and quant limits are (size of non-expert weights)+(2*size of the current expert layer) in memory. Parameter count and quantization don't matter so much, except to say that bigger weight bytes (obviously) equals bigger size on disk equals more read time
English
2
0
1
82
Daniel Isaac
Daniel Isaac@danpacary·
@danveloper So hold on. What are my Active param and quant limits? Can technically you can shave and squeeze a model down
English
1
0
1
80
Dan Woods
Dan Woods@danveloper·
@danpacary We've got some friends... the plumbing is close. If we can predict the MoE needed, we can prewarm the disk cache for layer N+1 while layer N is processing and throughput will skyrocket because it's cache->DMA at that point. x.com/Ex0byt/status/…
Eric@Ex0byt

Get Excited: @0xSero and I are close — B300 is currently training on a tiny (15M param) side-loaded neural network that helps select, load, and cache the correct MoE experts for Kimi-K.2.5 (1T Param MoE model running on 25GB of memory). Once experiments are done -will share paper. "Thicket-Guided Expert Prediction for Memory-Minimal Trillion-Parameter MoE Inference on Unified Memory & Consumer Grade Hardware"

English
1
0
1
158
Dan Woods
Dan Woods@danveloper·
Key learning for the night (although I'm not done) is Apple seems to have a built-in LZ4 hardware decompressor that can be utilized to stream compressed layers into memory with enough decompression efficiency that the faster read+decompression is better than raw.
English
6
3
78
9.2K
Dan Woods
Dan Woods@danveloper·
@8my41 🙏 It would be my honor, but I'm gpu rich and budget poor
English
0
0
1
57
Dan Woods retweetet
Daniel Isaac
Daniel Isaac@danpacary·
I just trained a 5B param model on Apple's Neural Engine. On a MacBook Pro. Forward. Backward. Adam optimizer. Then I checked to see how far it would go. Technically got to 30B.
Daniel Isaac tweet media
English
12
20
318
73.7K
Dan Woods retweetet
Eric
Eric@Ex0byt·
Get Excited: @0xSero and I are close — B300 is currently training on a tiny (15M param) side-loaded neural network that helps select, load, and cache the correct MoE experts for Kimi-K.2.5 (1T Param MoE model running on 25GB of memory). Once experiments are done -will share paper. "Thicket-Guided Expert Prediction for Memory-Minimal Trillion-Parameter MoE Inference on Unified Memory & Consumer Grade Hardware"
0xSero@0xSero

@pierrelezan Yes, @Ex0byt is working on this.

English
5
13
136
16.1K
Dan Woods
Dan Woods@danveloper·
Some very meaningful progress on this project. A bunch of performance experiments and we've landed at 4.4 tok/s on the distribution Q4 weights. Feels pretty good since we started at 0.28tok/s. Code and experiments are up in the github repo now!
Dan Woods tweet media
Dan Woods@danveloper

x.com/i/article/2034…

English
6
4
54
6.6K
Jack Jaw
Jack Jaw@8my41·
@danveloper wait you really are a VP at CVS? and you're ballin out hacking stuff?
English
1
0
4
311