Stephen Panaro

903 posts

Stephen Panaro

@flat

making coffee and other things. @BrewTimerApp

boston Se unió Mayıs 2013

25 Siguiendo540 Seguidores

Tweet fijado

Stephen Panaro@flat·18 Oca

“We won’t run it in digital because we’re purists and maniacs.”

English

Stephen Panaro@flat·9 Mar

@mweinbach Thought it was a cleaner summary. Possible I’ve just never looked beyond that though 🤔

English

Max Weinbach@mweinbach·9 Mar

@flat i'm pretty sure anthropic doesn't hide theirs, so that feels normal

English

Max Weinbach@mweinbach·9 Mar

is this gpt-5.4 leaking it's reasoning trace?

English

161

17.9K

Stephen Panaro@flat·9 Mar

@mweinbach Huh weird. I also got it from Opus today too but had never seen it before. Assumed it was an Anthropic issue

English

Max Weinbach@mweinbach·9 Mar

@flat all models tend to do that, noticed it a lot with gemini

English

333

Stephen Panaro@flat·6 Kas

@mattcassinelli @tylerangert Oh for sure some cool non-MLX stuff this year. When Google released their on-device base model + LoRA, I crossed my fingers Apple would do the same. And they did! Am actually curious now if anyone has shipped an AFM LoRA.

English

Matthew Cassinelli@mattcassinelli·6 Kas

@flat @tylerangert I just mean overall, because they weren’t available to use last year, are foundational instead of frontier, and generally native-only, the entire potential of Apple Intelligence models are skipped over. Specific ones like handwriting also enable apps like this easily now

Matthew Cassinelli@mattcassinelli

This developer just turned the iPad into Tom Riddle’s diary ‼️

English

125

Tyler Angert@tylerangert·5 Kas

training a tiny handwriting synthesis model. this is the latest checkpoint at 44%. so cool

Tyler Angert@tylerangert

I’ve been radicalized by MLX and now need a cluster of Mac minis

English

455

188.1K

Stephen Panaro@flat·6 Kas

@mattcassinelli @tylerangert Having been away from MLX for a bit, what did I miss? (Or is this about non-MLX like CoreML/Foundation Models?)

English

Matthew Cassinelli@mattcassinelli·5 Kas

@tylerangert This year’s APIs are so cool but because they’re not frontier-level models everyone decided to not even check

English

890

Stephen Panaro@flat·15 Eyl

arxiv.org/abs/2509.09679

ZXX

717

Stephen Panaro@flat·15 Eyl

This ButterflyQuant paper looks neat, but also a little sus: - no code - no comparison against its closest relative (SpinQuant) A good test project for coding agents?

English

912

Stephen Panaro@flat·24 Tem

@anemll How’s the speed compare to float16 inputs?

English

149

Anemll@anemll·24 Tem

MLP from dequantized int8 with CoreML 9.x test / example source code gist.github.com/Anemll/a9838e1…

English

991

Stephen Panaro@flat·23 Tem

@anemll

GIF

QME

121

Anemll@anemll·23 Tem

@flat Now you need to install ‘26 🙈

English

112

Stephen Panaro@flat·23 Tem

Incoming new coremltools looks like it has some nice bits: - 8 bit input/output tensors (previously all 8bit compute was kept internal) - >1 input can be enumerated shapes (👀ANE)

English

1.5K

Stephen Panaro@flat·23 Tem

github.com/apple/coremlto…

ZXX

327

Stephen Panaro@flat·23 Tem

Looks like it’s happening!

Stephen Panaro@flat

Wonder if we’re gonna get a new version of coremltools. Last year it dropped on Monday.

English

1.1K

Stephen Panaro@flat·13 Tem

@AaronWeiHuang 👋 Any chance y’all plan to release code for DBellQuant?

English

Stephen Panaro@flat·7 Tem

@cloneofsimo Can use this for quantization too: arxiv.org/abs/2506.04985

English

Simo Ryu@cloneofsimo·6 Tem

same deal with V-O of attention btw... multiply V by factor of k in one row, divide by k at O, you get the same output Jesus, why did I not think of this before

English

4.2K

Simo Ryu@cloneofsimo·6 Tem

Oh my god, with relu / relu^2 and no bias, there is further redundancy in parameterization of MLP because you can multiply one row of fc1 by k and divide same column of fc2 by k^2, and output is completely identical! so we have *extra* n redundancy if your activations are scale equivariant...

Simo Ryu@cloneofsimo

Its nightmare to think 2 layer MLP are super redundant: even if you have a global minima, there are at least n! more of em. For n = 8k, 2 layer MLP is like 10^27800 larger in terms of hypothesis space WHO ALLOWED THIS ???

English

388

55K

Stephen Panaro@flat·17 Haz

🐙: gist.github.com/smpanaro/5d301… 📄: arxiv.org/pdf/2506.04985 (R₅ is a rotation matrix, so its transpose is its inverse and it naturally cancels out in Q@K.T)

English

211

Stephen Panaro@flat·17 Haz

Turns out you don’t need R₅⁻¹ at all. 🫠 Fusing into Q and K is enough! Cool paper from Qualcomm explains this and a few similar transforms. No code in the paper, so gist proof👇

Stephen Panaro@flat

Liking the line of research where you multiply LLM weights by rotation matrices and the model still works. Most do it in between layers, but you can also sneak one between Q/K and RoPE. Extra parameters? None. Useful? …Maybe. Cool? I think so. (See R₅ below.)

English

787

Stephen Panaro@flat·15 Haz

The python library is interesting too. “Download files”: pypi.org/project/tamm/0…

English

237

Stephen Panaro@flat·15 Haz

See for yourself: 1. Get the adapter training toolkit: #download-toolkit" target="_blank" rel="nofollow noopener">developer.apple.com/apple-intellig… 2. Clone: github.com/smpanaro/netro… 3. Edit draft.mil: - delete all functions except the first - rename it to: func main<ios18>( 4. Follow readme to start netron, and open the .mil

English

334

Stephen Panaro@flat·15 Haz

Curious about the Apple Foundation Model architecture? I updated my netron fork to visualize the draft model*. *they say it might differ from the real model but looks convincing to me

English

Stephen Panaro@flat·15 Haz

Cool to see folks measuring KL too. arxiv.org/abs/2505.22988

English

124

Stephen Panaro@flat·15 Haz

btw, you can quantize the “hard-to-quantize” Llama 3.1 8B now. (LDLQ is GPTQ)

English

807

Descubrir

@mweinbach @mattcassinelli @tylerangert @anemll @AaronWeiHuang @elonmusk @BarackObama @taylorswift13