Sabitlenmiş Tweet
cider
775 posts

cider
@jeffreycider
my purpose in life is to forget linear algebra 2x a year
San Francisco, CA Katılım Eylül 2019
632 Takip Edilen2.5K Takipçiler

@LotusAbhi @konstmish this has been empirically known for awhile, not sure what the seminal reference is. central flows explains why constant LR prevents sharpness explosion
English

@jeffreycider @konstmish Do you know of a reference for this? Might explain some behavior of some work I've been doing recently.
English

@konstmish in fact, muon was designed as the steepest descent optimizer in a constant sharpness regime (eqn 7 in bernstein's modular duality paper)
English

@konstmish yes, my criticism is not that it's an apples-to-oranges comparison. it's that all useful NN optimization occurs in the constant-sharpness regime, so a line-search regime is irrelevant
English

@creatine_cycle @REK god made men but mecha teleoperation made them equal
English

@distributionat is daeho good if you don't want gimmick instagram food, asking for a friend
English

@yifan_zhang_ the Mu in Muon stands for momentum. the precise claim is that if you strip accumulation out of both Muon and Shampoo, they both give identical steepest descent updates
English

“when a field is getting started, it’s easy to confuse the essence of what you’re doing with the tools you’re using”
what is the essence of LLM Studies? could LLMs just be one instantiation of a broader abstraction?
please don’t answer “ml theory”
Valentin Ignatev@valigo
Computer Science is not science, and it's not about computers. Got reminded about this gem from MIT the other day
English

@SeunghyunSEO7 only in the 124M track. og muon is still better on the 350M track.
English

just noticed modded-nanogpt adopt 'NorMuon' as default (?).
it looks like `AdaMuon`. i personally didnt buy this idea because i thought Muon is enough and dont want to introduce optim state for 2nd moment again like adam... hmm
arxiv.org/abs/2510.05491
arxiv.org/abs/2507.11005…



English

@tarashakhurana @CMU_Robotics @RamananDeva @shubhtuls @KaterinaFragiad @cvondrick @GuibasLeonidas welcome!
English

Life update: I recently defended my PhD at @CMU_Robotics where I was advised by @RamananDeva!
Last few years were so much fun. Incredibly grateful to everyone in Smith and especially to my committee @shubhtuls @KaterinaFragiad @cvondrick and @GuibasLeonidas.
I am now at @Tesla_Optimus working on cool perception problems for humanoids!



English
















