Florian

250 posts

Florian

Florian

@fses91

เข้าร่วม Şubat 2017
958 กำลังติดตาม112 ผู้ติดตาม
Florian รีทวีตแล้ว
Günter Klambauer
Günter Klambauer@gklambauer·
Symbol-equivariant Recurrent Reasoning Models (SE-RRM) SE-RRM advances HRM and TRM -- guaranteed identical solutions for problems with permuted colors (ARC AGI) or digits (Sudoku). Coolest part: extrapolation to larger problem sizes!!! P: arxiv.org/abs/2603.02193
Günter Klambauer tweet media
English
3
40
214
13.6K
Florian รีทวีตแล้ว
Florian รีทวีตแล้ว
Günter Klambauer
Günter Klambauer@gklambauer·
The Great Comeback of Self-Normalizing Networks in 2025: It’s been a wild year in AI and for SNNs + SELU!! See my overview and some trends here: Blog: bioinf-jku.github.io/SNNs/
Günter Klambauer tweet media
English
3
1
13
866
Florian รีทวีตแล้ว
Sander Dieleman
Sander Dieleman@sedielem·
📢 Another #NeurIPS, another diffusion circle! Join us to talk about diffusion models on Friday Dec 5 at 3:30PM in San Diego! Bayside terrace outside room 11 (upstairs) ☀️🚢🌊 Please help spread the word, tell your friends! No slides, no talks, we just sit down and chat 🗣️
English
7
34
215
63.2K
Florian
Florian@fses91·
“In the judgement of the most competent living mathematicians, Fräulein Noether was the most significant creative mathematical genius thus far produced since the higher education of women began.” – Albert Einstein, 1935 (NYTimes) I’m thrilled to share that I’ll be joining emmi.ai, a company inspired by the legacy of Emmy Noether, for my upcoming internship. Over the next few months, I’ll have the opportunity to work with Sebastian Kaltenbach on my passion: Diffusion and Flow-based generative models, and their applications to physics. Excited for what lies ahead! While Noether devoted her life to uncovering the beauty of symmetries, our recent work explores a different path—approaching the problem without explicitly enforcing them. I’m proud that this work, done together with amazing collaborators @ArturToshev , Andreas Fürst, @gklambauer , @AndreasMayr11 , @jo_brandstetter , has been accepted to @NeurIPSConf 2025 in San Diego. arxiv.org/abs/2502.12128
GIF
English
0
0
3
98
Florian รีทวีตแล้ว
Günter Klambauer
Günter Klambauer@gklambauer·
Celebrating 4,000 citations! Thanks everyone who successfully used self-normalizing networks!!!
Günter Klambauer tweet media
English
3
2
85
5.4K
Florian รีทวีตแล้ว
Andrej Karpathy
Andrej Karpathy@karpathy·
Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've seen a bit of both. A lot of diffusion papers look a bit dense but if you strip the mathematical formalism, you end up with simple baseline algorithms, e.g. something a lot closer to flow matching in continuous, or something like this in discrete. It's your vanilla transformer but with bi-directional attention, where you iteratively re-sample and re-mask all tokens in your "tokens canvas" based on a noise schedule until you get the final sample at the last step. (Bi-directional attention is a lot more powerful, and you get a lot stronger autoregressive language models if you train with it, unfortunately it makes training a lot more expensive because now you can't parallelize across sequence dim). So autoregression is doing an `.append(token)` to the tokens canvas while only attending backwards, while diffusion is refreshing the entire token canvas with a `.setitem(idx, token)` while attending bidirectionally. Human thought naively feels a bit more like autoregression but it's hard to say that there aren't more diffusion-like components in some latent space of thought. It feels quite possible that you can further interpolate between them, or generalize them further. And it's a component of the LLM stack that still feels a bit fungible. Now I must resist the urge to side quest into training nanochat with diffusion.
GIF
Nathan Barry@nathanrs

BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something we’ve been doing since BERT from 2018. The first thought I had was, “can we finetune a BERT-like model to do text generation?”

English
270
534
5.2K
864.6K
Florian รีทวีตแล้ว
Günter Klambauer
Günter Klambauer@gklambauer·
PAPER/ABSTRACT DEADLINE ALREADY END OF THIS WEEK! ELLIS Machine Learning for Molecules workshop: moleculediscovery.github.io/workshop2025/ DON'T MISS THE DEADLINE: short papers or extended abstracts welcome!
English
1
1
10
738
Florian รีทวีตแล้ว
Maximilian Beck
Maximilian Beck@maxmbeck·
🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread
Maximilian Beck tweet media
English
14
41
230
83.8K
Florian รีทวีตแล้ว
sway
sway@SwayStar123·
Paper by bytedance, improves upon Meanflow by removing the need for JVP calculation
sway tweet media
English
6
21
219
14.8K
Florian รีทวีตแล้ว
KREA AI
KREA AI@krea_ai·
if you're interested in building the future of creative tools with us, we're hiring! krea.ai/careers
English
4
2
22
6.5K
Florian รีทวีตแล้ว
MetaStoneAI
MetaStoneAI@theMetaStoneAI·
🚀 Introducing XBai o4:a milestone in our 4th-generation open-source technology based on parallel test time scaling! In its medium mode, XBai o4 now fully outperforms OpenAI−o3−mini.📈 🔗Open-source weights: huggingface.co/MetaStoneTec/X…✅ Github link: github.com/MetaStone-AI/X…
MetaStoneAI tweet mediaMetaStoneAI tweet media
English
74
224
1.3K
362.9K
Florian รีทวีตแล้ว
Johannes Brandstetter
Johannes Brandstetter@jo_brandstetter·
General relativity 🤝 neural fields This simulation of a black hole is coming from our neural networks 🚀 We introduce Einstein Fields, a compact NN representation for 4D numerical relativity. EinFields are designed to handle the tensorial properties of GR and its derivatives.
English
11
72
322
39.2K
Chaitanya K. Joshi
Chaitanya K. Joshi@chaitjo·
I really loved this line of work from DeepMind on Perceiver - Perceiver IO - Perceiver AR. I wonder what happened to it / is it still used and relevant to long-context modelling?
Andrej Karpathy@karpathy

Perceiver IO is good reading/pointers for neural net architectures arxiv.org/abs/2107.14795 esp w.r.t. encoding/decoding schemes of various modalities to normalize them to & from Transformer-amenable latent space (a not-too-large set of vectors), where the bulk of compute happens.

English
8
18
231
28.1K
Florian รีทวีตแล้ว
Erik Bekkers
Erik Bekkers@erikjbekkers·
Great discussion, @chaitjo! We also explored this with extensive experiments in our recent paper: arxiv.org/abs/2501.01999. We find, among others, that equiv mods in a sense scale even better than non-equiv ones. Going more or less completely against the vibes from your post😅1/5
Chaitanya K. Joshi@chaitjo

After a long hiatus, I've started blogging again! My first post was a difficult one to write, because I don't want to keep repeating what's already in papers. I tried to give some nuanced and (hopefully) fresh takes on equivariance and geometry in molecular modelling.

English
2
16
86
11.9K