Dan Woods

214 posts

Dan Woods banner
Dan Woods

Dan Woods

@danveloper

Vice President of AI Platforms for CVS Health. Former CTO for @JoeBiden.

가입일 Mart 2011
770 팔로잉7.8K 팔로워
Dan Woods
Dan Woods@danveloper·
We’re gonna see someone running a 1T model at 100tok/s on a $2500 laptop by like a month from now. Long Apple, this is the best AI inference platform.
English
17
11
246
16.5K
Alex Cheema
Alex Cheema@alexocheema·
@danveloper Close. We will see someone running a 1T model on $16,000 of heterogeneous hardware (Apple Silicon + NVIDIA) at 100tok/sec this year.
English
3
0
51
2.9K
Dan Woods
Dan Woods@danveloper·
@danpacary You're asking how much of your ANE training work you can port straight to inference? You'd know more than me on that, but I'd reckon all of it. Unless there's actually a transition boundary from unified memory to ANE SDRAM, but I don't think there is.
English
2
0
1
30
Daniel Isaac
Daniel Isaac@danpacary·
@danveloper That’s what I was thinking. How much of the work can I port straight to inference
English
1
0
0
18
Dan Woods
Dan Woods@danveloper·
@danpacary pure inference, correct... although 🤔 no reason you *couldn't* do it for training... the ANE shares the same unified memory, so nothing stopping you from adapting your ANE training code to use the streaming code
English
1
0
1
113
Dan Woods
Dan Woods@danveloper·
@danpacary Technically your only param and quant limits are (size of non-expert weights)+(2*size of the current expert layer) in memory. Parameter count and quantization don't matter so much, except to say that bigger weight bytes (obviously) equals bigger size on disk equals more read time
English
2
0
1
126
Daniel Isaac
Daniel Isaac@danpacary·
@danveloper So hold on. What are my Active param and quant limits? Can technically you can shave and squeeze a model down
English
1
0
1
124
Dan Woods
Dan Woods@danveloper·
@danpacary We've got some friends... the plumbing is close. If we can predict the MoE needed, we can prewarm the disk cache for layer N+1 while layer N is processing and throughput will skyrocket because it's cache->DMA at that point. x.com/Ex0byt/status/…
Eric@Ex0byt

Get Excited: @0xSero and I are close — B300 is currently training on a tiny (15M param) side-loaded neural network that helps select, load, and cache the correct MoE experts for Kimi-K.2.5 (1T Param MoE model running on 25GB of memory). Once experiments are done -will share paper. "Thicket-Guided Expert Prediction for Memory-Minimal Trillion-Parameter MoE Inference on Unified Memory & Consumer Grade Hardware"

English
1
0
1
230
Dan Woods
Dan Woods@danveloper·
Key learning for the night (although I'm not done) is Apple seems to have a built-in LZ4 hardware decompressor that can be utilized to stream compressed layers into memory with enough decompression efficiency that the faster read+decompression is better than raw.
English
6
3
95
11K
Dan Woods
Dan Woods@danveloper·
@8my41 🙏 It would be my honor, but I'm gpu rich and budget poor
English
0
0
1
84
Dan Woods 리트윗함
Daniel Isaac
Daniel Isaac@danpacary·
I just trained a 5B param model on Apple's Neural Engine. On a MacBook Pro. Forward. Backward. Adam optimizer. Then I checked to see how far it would go. Technically got to 30B.
Daniel Isaac tweet media
English
12
24
352
87.7K
Dan Woods 리트윗함
Eric
Eric@Ex0byt·
Get Excited: @0xSero and I are close — B300 is currently training on a tiny (15M param) side-loaded neural network that helps select, load, and cache the correct MoE experts for Kimi-K.2.5 (1T Param MoE model running on 25GB of memory). Once experiments are done -will share paper. "Thicket-Guided Expert Prediction for Memory-Minimal Trillion-Parameter MoE Inference on Unified Memory & Consumer Grade Hardware"
0xSero@0xSero

@pierrelezan Yes, @Ex0byt is working on this.

English
7
13
145
17.3K
Dan Woods
Dan Woods@danveloper·
Some very meaningful progress on this project. A bunch of performance experiments and we've landed at 4.4 tok/s on the distribution Q4 weights. Feels pretty good since we started at 0.28tok/s. Code and experiments are up in the github repo now!
Dan Woods tweet media
Dan Woods@danveloper

x.com/i/article/2034…

English
7
4
55
7.2K
Jack Jaw
Jack Jaw@8my41·
@danveloper wait you really are a VP at CVS? and you're ballin out hacking stuff?
English
1
0
4
451