Matthias Bal

142 posts

Matthias Bal

Matthias Bal

@matthiasbal

Belgium Katılım Eylül 2011
105 Takip Edilen293 Takipçiler
Sabitlenmiş Tweet
Matthias Bal
Matthias Bal@matthiasbal·
A non-equilibrium statistical mechanics perspective on transformers mcbal.github.io/post/spin-mode… We present a class of transformers based on mean-field dynamics of vector-spin models. Our framework supports asymmetric couplings and yields residual, attention, and feed-forward terms.
English
1
51
191
24.5K
Matthias Bal retweetledi
NUKES
NUKES@atomicarchive·
«Apache» thermonuclear test, 1.85 Megatons, Eniwetok Atoll, 9 July 1956.
NUKES tweet media
English
9
355
3K
115.5K
Matthias Bal
Matthias Bal@matthiasbal·
@de_gennes Haven't looked into that, but sure sounds interesting to explore. Lately I've mainly been working on a dynamical TAP-style mean-field approach to include asymmetric couplings like softmax attention into the transformers-as-vector-spin-models framework: mcbal.github.io/post/spin-mode…
English
0
0
0
47
SijingDu
SijingDu@SijingDu52·
@matthiasbal Hi Matthias, really nice perspective to interpret transformer model as vector-spin model! Just wondering except for mean-field apporixmatation of the partition function (free energy), whether other forms of approximate free energy are possible, like Bethe free energy in BP?
English
1
0
0
46
Matthias Bal
Matthias Bal@matthiasbal·
New post on approaching transformer modules from statistical mechanics. mcbal.github.io/post/transform… We implement a differentiable vector-spin model whose couplings act as learnable parameters and probe it with data to find a transformer-like attention response. 1/9
English
3
5
15
0
Deen Kun A.
Deen Kun A.@sir_deenicus·
@karraaman @banburismus_ No, I don't know any but possible to piece together. There are some explicit on the connection between belief prop & bethe free energy. On attention & hopfield nets. Ising models and hopfield nets. SGD and SGLD. Diffusion models link is in original 2015 paper.
English
1
0
2
0
Tom McGrath
Tom McGrath@banburismus_·
What are the most elegant/beautiful ideas in ML? Feels like mathematicians & physicists often talk about aesthetics, but we very rarely do. Why?
English
173
158
1.8K
0
Matthias Bal retweetledi
Baudrillard's America
Baudrillard's America@BaudrillardUSA·
Reduced pace of work, decentralization, air-conditioning, soft technologies.
English
0
7
44
0
Matthias Bal retweetledi
ArtNouveauDeco
ArtNouveauDeco@NouveauDeco·
Art Deco control room of the now defunct Kelenföld power plant, built between 1927-1929, Budapest, Hungary.
ArtNouveauDeco tweet media
English
24
385
2.5K
0
Michael Nielsen
Michael Nielsen@michael_nielsen·
Curious: if you wanted to understand transformer architectures, what would you read first? [This question is addressed to people who understand them, in detail!]
English
37
16
263
0
Matthias Bal retweetledi
Soviet Visuals
Soviet Visuals@sovietvisuals·
Drama theater building in Rostov-on-Don. Photo by Mikhail Prekhner, USSR, 1937
Soviet Visuals tweet media
English
2
54
467
0
Matthias Bal
Matthias Bal@matthiasbal·
A short post on transformers and statistical mechanics. mcbal.github.io/post/transform… Can we interpret transformer modules physically? Yes, their forward pass looks like computing magnetizations from free energies in differentiable spin systems. Training shapes collective behaviour.
English
1
6
45
0
Matthias Bal
Matthias Bal@matthiasbal·
@Thom_Wolf Have you read Tomasello's Becoming Human? It's fascinating.
English
1
0
4
0
Thomas Wolf
Thomas Wolf@Thom_Wolf·
A strange form of tunneling vision I see often in AI consists in beginning to think that humans work just like ML models, that we are in the end just RL agents or language models (depending what one works on) The human experience is much more diverse & complex than any of these
English
18
25
256
0
Matthias Bal retweetledi
Soviet Visuals
Soviet Visuals@sovietvisuals·
"Kosmos" cafe, Perm, USSR, 1960s
Soviet Visuals tweet media
English
16
238
2K
0
Matthias Bal
Matthias Bal@matthiasbal·
We previously interpreted the deep equilibrium fixed-point procedure physically as solving the mean-field fixed-point equations of a vector-spin model. This work instead takes derivatives of the steepest-descent partition function of a particular class of vector-spin models. 8/9
Matthias Bal tweet media
English
1
0
2
0