sageev

2.5K posts

sageev

sageev

@osageev

"a symptom of approaching nervous breakdown is belief that one's work is terribly important"-Russell curr/prev faculty at: @dalfcss, @vectorinst, @googleAI

Katılım Şubat 2009
755 Takip Edilen1K Takipçiler
Sabitlenmiş Tweet
sageev
sageev@osageev·
"Sharpness of the loss surface tells us about generalization... except when it doesn't (like in transformers)." But what does sharpness mean? You might say "it's how much the function changes within a little ball". Ok, so in the simple picture shown below, at which of the two points is the loss surface sharper? To find out * why this is a tricky question, and * why our answer to it does allow sharpness to tell us about generalization, even in transformers(!!), come to our #ICML2025 #spotlight poster E-2001 on Wednesday at 11:00am-1:30pm PDT Work by Marvin F. da Silva and Felix Dangel @dalfcs @VectorInst
sageev tweet media
sageev@osageev

[1/🧵] ✨ Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It ✨ Super excited to announce our paper on factoring out parameter symmetries to better predict generalization in transformers (accepted as #ICML25 spotlight! 🎉) Amazing work by Marvin da Silva (@marvinfsilva) and Felix Dangel (@f_dangel). Symmetries hide sharpness — Riemannian geometry reveals it👇

English
0
4
9
1.5K
sageev
sageev@osageev·
@DamienTeney Empirical validation: I tried it and it was indeed great.
English
0
0
1
11
sageev
sageev@osageev·
@andrewgwils listening to it now-- so great. (i often listen to a heifetz recording of it, which i love too.)
English
0
0
1
75
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
These days, the idea of giving music your full undivided attention for even 30 minutes seems unthinkable. But if you can take the plunge, try this performance of the Sibelius Violin Concerto. There’s nothing quite like it: youtu.be/vNyaAdNtXBY?si…
YouTube video
YouTube
English
1
1
22
3.2K
sageev
sageev@osageev·
Not that I post that often, but I’ll post one more music thing here, and then mainly save my music postings for fb and insta (osageev@), stay here for #ML. Here ya go. Snapshot from last night’s session… was cathartic fun
English
0
0
1
83
sageev
sageev@osageev·
"The lion and the calf shall lie down together but the calf won't get much sleep." -- woody allen (from "Without Feathers")
English
0
0
0
27
sageev
sageev@osageev·
quick stop at my lab last night.
English
0
0
1
84
sageev
sageev@osageev·
@memoakten Also, what about academics (including philosophers) and their bodies? Another example of frequent “completely fail to grasp”… The whole talk is wonderful but minutes 9 to 11 (roughly) address this a bit: youtu.be/iG9CE55wbtY?si…
YouTube video
YouTube
English
1
0
1
89
sageev
sageev@osageev·
@docmilanfar I asked a friend (who flourishes outdoors) how often he’d seen bears. He said a lot, but that he would see them particularly often when he’d take a walk from town A to town B from time to time. Later turned out this walk was two days long through the forest.
English
0
0
0
74
Peyman Milanfar
Peyman Milanfar@docmilanfar·
when does a walk become a hike
English
37
2
53
14.3K
sageev
sageev@osageev·
random blowing. end of term. everything due all the time, needed a break. this was a fun break. repeated notes and fourths
English
0
0
0
107
sageev
sageev@osageev·
@yoavgo (maybe it didn't read well-- i was trying to make a "big deal" about the "emotion vectors" underlying your tweet.... : )
English
0
0
1
7
sageev retweetledi
Om Patel
Om Patel@om_patel5·
I taught Claude to talk like a caveman to use 75% less tokens. normal claude: ~180 tokens for a web search task caveman claude: ~45 tokens for the same task "I executed the web search tool" = 8 tokens caveman version: "Tool work" = 2 tokens every single grunt swap saves 6-10 tokens. across a FULL task that's 50-100 tokens saved why does it work? caveman claude doesn't explain itself. it does its task first. gives the result. then stops. no "I'd be happy to help you with that." no "Let me search the web for you" no more unnecessary filler words "result. done. me stop." 50-75% burn reduction with usage limits getting tighter every week this might be the most practical hack out there right now
Om Patel tweet media
English
961
1.4K
23.7K
3.1M
Yael Vinker🎗
Yael Vinker🎗@YVinker·
Creative work often starts before we can describe what we're looking for. What role can generative models play at this stage? 🌱Our new work, Inspiration Seeds, reveals hidden visual connections between images, creating a purely visual exploration space. 🔗kfirgoldberg.github.io/InspirationSee…
English
2
20
90
7.3K
sageev
sageev@osageev·
Get ready to have your 🤯. Apple drops **8 new emoji** in a mere 17Gb OS update.
sageev tweet media
English
0
0
0
71
sageev retweetledi
Weight Space Symmetries @ ICML 2026
📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026
Weight Space Symmetries @ ICML 2026 tweet media
English
2
36
53
19.1K
Michael Bronstein
Michael Bronstein@mmbronstein·
Join us at the coolest AIxBio research institute in the best city in Europe! AITHYRA is hiring new PIs in AI/ML in Vienna. Deadline: 30 April 2026 More info: aithyra.at/fileadmin/down… *Lipizzaner horse is for visualization only and not included in the package
Michael Bronstein tweet media
English
5
27
160
12.5K
sageev
sageev@osageev·
@andrewgwils do you think this is more true on social media than IRL, or independent of the medium?
English
1
0
0
310
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
It’s interesting how an idea either tends to be uncritically accepted, or undergoes an absolutely obsessive level of critical scrutiny. Why is it so hard to be objective and rational?
English
4
1
40
4.7K