Sabitlenmiş Tweet
jake
1.3K posts

jake
@stratejake
Neural Network Mechanic @ stealth Generative models for physical systems. xoogler
SF Katılım Şubat 2014
645 Takip Edilen2.5K Takipçiler

@mirandrom @cloneofsimo so much of MLE in my experience is implementing options A, B, and C in a way that they are all a special case of D, do that you have combinatorially fewer knobs to tune.
English

@cloneofsimo the challenge with ML is high dimensionality and imo what can make theory useful boils down to dimensionality reduction, ie finding the right abstractions that are both meaningful enough to explain a system and useful enough to give you new ways to control/influence it
English

99% people in ML is either full-eng or full-theory. I do think you really need good mix from both end of the spectrum otherwise you get really skewed perspective
engineer people thinks math behind training dynamics / optimizer / diffusion / RL is useless
theory people is obsessed low-mfu linear attention, over-abstraction, and toy problems that disappear at larger scale
Jason Lee@jasondeanlee
Ugh no. You need some language to even define the problem you are trying to solve. Perhaps mdp is limiting/too general but you need some formalism. You should be trying to come up with the formalism.
English

@MingyuanZhou @sedielem Very interesting. Definitely adding your paper to my reading list!
English

@stratejake @sedielem That blog was great. We ignored how flow models were motivated and instead asked: what’s the training loss, and what’s its optimal solution?
This unifies flow matching variants and lets Score identity Distillation (SiD) apply whenever a pretrained teacher provides score/velocity.
English

Two recent papers (arxiv.org/abs/2510.11690, arxiv.org/abs/2511.13720) suggest that predicting x (clean) works much better than predicting eps or v (noisy) in high dimensions.
Natural signals like images live on a low-dimensional manifold. Noise takes you off the manifold! (1/3)
English

@MingyuanZhou @sedielem Thank you for the link! Is this something exclusive to distilled models? I recall this blog post suggesting something to that effect more generally diffusionflow.github.io
English

@sedielem @stratejake Great discussion! In diffusion distillation, it doesn’t matter if the teacher predicts ε, x, velocity, or score — their optimal solutions are linearly related. Our paper echoes many of your points: arxiv.org/abs/2509.25127
English

@jaygala223 @SakanaAILabs @thisismyhat @YesThisIsLion You're not going to believe what kind of neural architecture modern diffusion models use.
English

Sakana AI’s CTO says he’s ‘absolutely sick’ of transformers, the tech that powers every major AI model
“You should only do the research that wouldn’t happen if you weren’t doing it.” (@thisismyhat) 🧠
@YesThisIsLion
venturebeat.com/ai/sakana-ais-…
English

@RobinRene81 @lock_dok @Rothmus Because each qr code points to a different url so they know what table you're ordering from.
English

@xlr8harder @beffjezos @hamandcheese Yeah it's almost like a temperature increase is part of it too.
English

@beffjezos @hamandcheese i don't think that fully captures it either. it tends to increase the firing rate overall, adding noise and randomness.
English

I did 4 cups of ayahuasca once and tunneled through my consciousness in a state of euphoria having one epiphany after another. At some point, the sense of epiphany continued without any meaningful referent, as if I was leaning on the epiphany key in my brain.
That then triggered a meta-epiphany that psychedelics only make you feel like you're learning something new or deep, sorta like how déjà vu is probably just your brain's familiarity circuit misfiring in a context that isn't actually familiar.
If psychedelics have given me any durable insight, it's that our consciousness is extremely fallible and often misleading, particularly with respect to valence.
Autism Capital 🧩@AutismCapital
This is probably one of the top 5 best tweets ever posted.
English

@tracewoodgrains Just read the "money speech" from Atlas Shrugged. That's all you really need to understand her philosophy.
English

@travis4nh The occasional party car rail trip vacation would be cool. The catch is it must be cheaper than a plane.
Imagine if you rented a train car with beds and bath to travel, and it parked in a lot at your destinations. No separate hotel. Then punch in the next leg of the trip.
English

wow imagine if you could get from Boston to LA in only 24 hours of constant travel (48 hours at actual speeds once the press release nonsense was finally admitted as rose-colored)
people would surely abandon current modes of transportation
The Ball Is Orange@theballisorange
A lot of our lives would be incredibly different if this existed
English

@yacineMTB Start sleep training early. We did it around 4-5 months and it was an absolute game changer.
First couple of months are really hard. 6-12 is when they start to gain mobility and develop a personality and it gets a lot more fun.
English

@bilawalsidhu @KTmBoyle Some people can't handle being given even the smallest amount of authority over others.
English

@KTmBoyle Why are people like this?! Wild to me to take the time to send such a notice
English

@GrantSlatton @Gok There's a spectrum of FAANG people, from the money/status chasers to the techie problem solvers. The latter are the ones you wanna hang with outside of work, but they are an endangered species.
English

@stratejake not with np.testing.assert_array_almost_equal. I was surprised too!
English






















