Stephen Panaro
903 posts

Stephen Panaro
@flat
making coffee and other things. @BrewTimerApp





This developer just turned the iPad into Tom Riddle’s diary ‼️


I’ve been radicalized by MLX and now need a cluster of Mac minis







Wonder if we’re gonna get a new version of coremltools. Last year it dropped on Monday.



Its nightmare to think 2 layer MLP are super redundant: even if you have a global minima, there are at least n! more of em. For n = 8k, 2 layer MLP is like 10^27800 larger in terms of hypothesis space WHO ALLOWED THIS ???


Liking the line of research where you multiply LLM weights by rotation matrices and the model still works. Most do it in between layers, but you can also sneak one between Q/K and RoPE. Extra parameters? None. Useful? …Maybe. Cool? I think so. (See R₅ below.)



