Giosue Migliorini

17 posts

Giosue Migliorini banner
Giosue Migliorini

Giosue Migliorini

@joh_sweh

Ph.D. student in stats @UCIrvine | former AI research intern @FlagshipPioneer, @LosAlamosNatLab, @UniBocconi

CA Katılım Haziran 2023
567 Takip Edilen52 Takipçiler
Giosue Migliorini retweetledi
Felix Draxler
Felix Draxler@FelixDrRelax·
LLMs are autoregressive and slow? No! Parallel Token Prediction decodes multiple consistent tokens in one model call. PTP allows arbitrary dependencies in one call, unlike discrete diffusion. Practical: 2.4x speedup github.com/mandt-lab/ptp ICLR: Apr 23, morning poster P3-#608
English
2
15
52
22.7K
Giosue Migliorini
Giosue Migliorini@joh_sweh·
@JIRIGESI Hi Jiri, I am a fourth year PhD candidate at UCI interested in probabilistic modeling, RL, and multi modal generative models. I’d love to grab a coffee if you are available!
English
1
0
1
96
Jiri
Jiri@JIRIGESI·
I’ll be at NeurIPS, if you’re interested in a 2026 PhD research internship with Amazon Store Foundation AI and want to work on agents, RL, and multi-modal, I’d love to connect at the conference.
English
3
1
17
1.6K
Giosue Migliorini
Giosue Migliorini@joh_sweh·
@cloneofsimo Ideally we should sample with replacement from the dataset (potentially repeated datapoints in a single batch!). Epochs & not shuffling introduce periodic behaviors
English
0
0
1
180
Simo Ryu
Simo Ryu@cloneofsimo·
Guys do we REALLY need to shuffle at the end of epoch? like REALLY REALLY ?
English
22
1
169
19.9K
dr. jack morris
dr. jack morris@jxmnop·
posted the other day about model distillation. pretty much everyone responded with their theories professors, leading lab researchers, students, pseudoanonymous anime-profile posters seems there's no clear consensus why it works, but here are the theories 🧵
dr. jack morris@jxmnop

it's a baffling fact about deep learning that model distillation works method 1 - train small model M1 on dataset D method 2 (distillation) - train large model L on D - train small model M2 to mimic output of L - M2 will outperform M1 no theory explains this; it's magic

English
14
42
565
71.5K
Historic Vids
Historic Vids@historyinmemes·
There were about 180 towers in Bologna in the 12th century. The tallest, 97 meters high, still stands.
Historic Vids tweet media
English
95
461
5.4K
426.9K
Giosue Migliorini retweetledi
Keenan Crane
Keenan Crane@keenanisalive·
We often think of an "equilibrium" as something standing still, like a scale in perfect balance. But many equilibria are dynamic, like a flowing river which is never changing—yet never standing still. These dynamic equilibria are nicely described by so-called "detailed balance"
English
106
1.7K
10.1K
638.6K
Giosue Migliorini retweetledi
Gabriel Peyré
Gabriel Peyré@gabrielpeyre·
Bregman divergences are convex distance-like functionals that are locally Euclidean. Most algorithms handling Euclidean distances generalize to Bregman divergences. en.wikipedia.org/wiki/Bregman_d…
Gabriel Peyré tweet media
English
1
59
439
16.9K
Giosue Migliorini retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Future be like tab tab tab
Eesti
377
529
7.4K
722.4K
Giosue Migliorini
Giosue Migliorini@joh_sweh·
@BalintMucsanyi @mkirchhof_ @coallaoh Mine might be a naive question. Why try to measure uncertainty through the predictive entropy, when the predictive variance admits such a nice and interpretable decomposition? I have never seen a confidence interval based on entropy.
Giosue Migliorini tweet media
English
1
0
1
203
Giosue Migliorini
Giosue Migliorini@joh_sweh·
@PreetumNakkiran Flow matching should recover the optimal vector field in the Benamou-Brenier perspective of optimal transport if data is sampled from the optimal coupling (in the static ot problem). In that case it would recover the identity
English
1
0
3
313
Preetum Nakkiran
Preetum Nakkiran@PreetumNakkiran·
easiest way to see that Flow Matching does not always produce an optimal transport: observe that the marginal flow from a distribution *to itself* is not the Identity (eg for linear flows & independent coupling)
English
2
2
35
4.9K
Giosue Migliorini retweetledi
Jascha Sohl-Dickstein
Jascha Sohl-Dickstein@jaschasd·
Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.
English
298
2.2K
11.3K
1.8M
Giosue Migliorini retweetledi
AI at Meta
AI at Meta@AIatMeta·
Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more. More details on this work & examples ⬇️
English
48
420
1.8K
445.1K
Giosue Migliorini retweetledi
Stat.ML Papers
Stat.ML Papers@StatMLPapers·
Functional Flow Matching. (arXiv:2305.17209v1 [cs.LG]) ift.tt/BmXJcbn
English
0
1
10
2.3K