Stephen Panaro

903 posts

Stephen Panaro banner
Stephen Panaro

Stephen Panaro

@flat

making coffee and other things. @BrewTimerApp

boston Se unió Mayıs 2013
25 Siguiendo540 Seguidores
Tweet fijado
Stephen Panaro
Stephen Panaro@flat·
“We won’t run it in digital because we’re purists and maniacs.”
English
2
0
5
0
Stephen Panaro
Stephen Panaro@flat·
@mweinbach Thought it was a cleaner summary. Possible I’ve just never looked beyond that though 🤔
English
1
0
0
52
Max Weinbach
Max Weinbach@mweinbach·
@flat i'm pretty sure anthropic doesn't hide theirs, so that feels normal
English
1
0
0
56
Max Weinbach
Max Weinbach@mweinbach·
is this gpt-5.4 leaking it's reasoning trace?
Max Weinbach tweet media
English
11
5
161
17.9K
Stephen Panaro
Stephen Panaro@flat·
@mweinbach Huh weird. I also got it from Opus today too but had never seen it before. Assumed it was an Anthropic issue
English
1
0
0
57
Max Weinbach
Max Weinbach@mweinbach·
@flat all models tend to do that, noticed it a lot with gemini
English
1
0
0
333
Stephen Panaro
Stephen Panaro@flat·
@mattcassinelli @tylerangert Oh for sure some cool non-MLX stuff this year. When Google released their on-device base model + LoRA, I crossed my fingers Apple would do the same. And they did! Am actually curious now if anyone has shipped an AFM LoRA.
English
0
0
0
41
Matthew Cassinelli
Matthew Cassinelli@mattcassinelli·
@tylerangert This year’s APIs are so cool but because they’re not frontier-level models everyone decided to not even check
English
1
0
2
890
Stephen Panaro
Stephen Panaro@flat·
This ButterflyQuant paper looks neat, but also a little sus: - no code - no comparison against its closest relative (SpinQuant) A good test project for coding agents?
English
2
0
2
912
Anemll
Anemll@anemll·
@flat Now you need to install ‘26 🙈
English
1
0
1
112
Stephen Panaro
Stephen Panaro@flat·
Incoming new coremltools looks like it has some nice bits: - 8 bit input/output tensors (previously all 8bit compute was kept internal) - >1 input can be enumerated shapes (👀ANE)
English
1
2
8
1.5K
Simo Ryu
Simo Ryu@cloneofsimo·
same deal with V-O of attention btw... multiply V by factor of k in one row, divide by k at O, you get the same output Jesus, why did I not think of this before
English
3
1
28
4.2K
Simo Ryu
Simo Ryu@cloneofsimo·
Oh my god, with relu / relu^2 and no bias, there is further redundancy in parameterization of MLP because you can multiply one row of fc1 by k and divide same column of fc2 by k^2, and output is completely identical! so we have *extra* n redundancy if your activations are scale equivariant...
Simo Ryu@cloneofsimo

Its nightmare to think 2 layer MLP are super redundant: even if you have a global minima, there are at least n! more of em. For n = 8k, 2 layer MLP is like 10^27800 larger in terms of hypothesis space WHO ALLOWED THIS ???

English
10
24
388
55K
Stephen Panaro
Stephen Panaro@flat·
Turns out you don’t need R₅⁻¹ at all. 🫠 Fusing into Q and K is enough! Cool paper from Qualcomm explains this and a few similar transforms. No code in the paper, so gist proof👇
Stephen Panaro@flat

Liking the line of research where you multiply LLM weights by rotation matrices and the model still works. Most do it in between layers, but you can also sneak one between Q/K and RoPE. Extra parameters? None. Useful? …Maybe. Cool? I think so. (See R₅ below.)

English
2
0
4
787
Stephen Panaro
Stephen Panaro@flat·
See for yourself: 1. Get the adapter training toolkit: #download-toolkit" target="_blank" rel="nofollow noopener">developer.apple.com/apple-intellig… 2. Clone: github.com/smpanaro/netro… 3. Edit draft.mil: - delete all functions except the first - rename it to: func main<ios18>( 4. Follow readme to start netron, and open the .mil
English
1
1
5
334
Stephen Panaro
Stephen Panaro@flat·
Curious about the Apple Foundation Model architecture? I updated my netron fork to visualize the draft model*. *they say it might differ from the real model but looks convincing to me
English
1
2
10
2K
Stephen Panaro
Stephen Panaro@flat·
btw, you can quantize the “hard-to-quantize” Llama 3.1 8B now. (LDLQ is GPTQ)
Stephen Panaro tweet media
English
1
0
3
807