Thomas Schmied

69 posts

Thomas Schmied

Thomas Schmied

@thsschmied

PhD student @ JKU Linz, Institute for Machine Learning.

Katılım Mayıs 2022
411 Takip Edilen258 Takipçiler
Sabitlenmiş Tweet
Thomas Schmied
Thomas Schmied@thsschmied·
Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 arxiv.org/abs/2410.22391
Thomas Schmied tweet media
English
3
44
214
29.4K
Thomas Schmied retweetledi
Sepp Hochreiter
Sepp Hochreiter@HochreiterSepp·
xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.
Sepp Hochreiter tweet mediaSepp Hochreiter tweet media
English
5
59
313
24.4K
Thomas Schmied retweetledi
Niklas Schmidinger
Niklas Schmidinger@smdrnks·
Excited to share our new paper: Effective Distillation to Hybrid xLSTM Architectures. TL;DR: we retrofit / graft / distill / linearize Transformers into xLSTM-SWA hybrids with fixed-size states. This gives a practical path to studying linear and hybrid architectures starting from already strong pretrained models.
Sepp Hochreiter@HochreiterSepp

xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.

English
1
6
15
1.2K
Thomas Schmied retweetledi
Korbinian Poeppel
Korbinian Poeppel@KorbiPoeppel·
Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997
Korbinian Poeppel tweet media
English
4
42
136
15.1K
Thomas Schmied retweetledi
Andreas Auer
Andreas Auer@AndAuer·
We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.
Andreas Auer tweet media
English
5
21
71
14.8K
Thomas Schmied retweetledi
Nicolas Zucchet
Nicolas Zucchet@NicolasZucchet·
🧵What if emergence could be explained by learning a specific circuit: sparse attention? Our new work explores this bold hypothesis, showing a link between emergence and sparse attention that reveals how data properties influence when emergence occurs during training.
Nicolas Zucchet tweet media
English
5
46
340
71.7K
Thomas Schmied retweetledi
Maximilian Beck
Maximilian Beck@maxmbeck·
Excited to share that 2 of our papers on efficient inference with #xLSTM are accepted at #ICML25. A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks (arxiv.org/abs/2410.22391) and xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference:
Maximilian Beck@maxmbeck

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

English
2
10
70
5.5K
Thomas Schmied retweetledi
Markus Wulfmeier
Markus Wulfmeier@m_wulfmeier·
Agentic LLMs need to explore! The web is a non-stationary, highly partially unobservable 🏔️. @thsschmied's @GoogleDeepMind internship systematically investigates current challenges and the role of scale, RL, and chain-of-thought reasoning to overcome them. arxiv.org/abs/2504.16078 I.e. #ReinforcementLearning is back, and everything is an #agent (we could have realised that everything is an MDP without heavily misusing the agent term 😉)
AK@_akhaliq

Google announced LLMs are Greedy Agents on Hugging Face Effects of RL Fine-tuning on Decision-Making Abilities

English
2
5
44
3.6K
Thomas Schmied retweetledi
AK
AK@_akhaliq·
Google announced LLMs are Greedy Agents on Hugging Face Effects of RL Fine-tuning on Decision-Making Abilities
AK tweet media
English
5
107
705
98.1K
Thomas Schmied retweetledi
Nicolas Zucchet
Nicolas Zucchet@NicolasZucchet·
Large language models store vast amounts of knowledge, but how exactly do they learn it? Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs! arxiv.org/abs/2503.21676
Nicolas Zucchet tweet media
English
6
33
178
24.2K
Thomas Schmied retweetledi
Maximilian Beck
Maximilian Beck@maxmbeck·
Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)
Maximilian Beck tweet media
English
3
60
345
47.9K
Thomas Schmied retweetledi
Lukas Aichberger
Lukas Aichberger@aichberger·
⚠️Beware: Your AI assistant could be hijacked just by encountering a malicious image online! Our latest research exposes critical security risks in AI assistants. An attacker can hijack them by simply posting an image on social media and waiting for it to be captured. [1/6] 🧵
English
4
49
163
47.1K
Thomas Schmied retweetledi
Korbinian Poeppel
Korbinian Poeppel@KorbiPoeppel·
Check out our latest work on scaling up xLSTM to 7B parameters and 2.3T tokens, with all open training data, open training protocol and open training code. Nice team work! 💪💪
Maximilian Beck@maxmbeck

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

English
0
2
9
273
Thomas Schmied retweetledi
Maximilian Beck
Maximilian Beck@maxmbeck·
📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)
Maximilian Beck tweet media
English
8
61
324
45K
Thomas Schmied retweetledi
Fabian Paischer
Fabian Paischer@PaischerFabian·
I am excited to present the result of a fruitful internship at @AIatMeta. We introduce preference discerning, which denotes in-context conditioning on user preferences expressed in text to steer the recommendation system. This enhances flexibility and personalization. 1/n
Fabian Paischer tweet media
AI at Meta@AIatMeta

Newly published research for generative retrieval for recommendations from teams at Meta. - Preference Discerning with LLM-Enhanced Generative Retrieval ➡️ go.fb.me/evvcu8 - Unifying Generative and Dense Retrieval for Sequential Recommendation ➡️ go.fb.me/i7l955

English
3
36
189
28.9K
Thomas Schmied retweetledi
Lukas Aichberger
Lukas Aichberger@aichberger·
𝗡𝗲𝘄 𝗣𝗮𝗽𝗲𝗿 𝗔𝗹𝗲𝗿𝘁: Rethinking Uncertainty Estimation in Natural Language Generation 🌟 Introducing 𝗚-𝗡𝗟𝗟, a theoretically grounded and highly efficient uncertainty estimate, perfect for scalable LLM applications 🚀 Dive into the paper 👇arxiv.org/abs/2412.15176
English
5
36
140
20.6K
Thomas Schmied retweetledi
Johannes Brandstetter
Johannes Brandstetter@jo_brandstetter·
Super hyped to share NeuralDEM -- the first real-time simulation of industrial particulate flows. NeuralDEM replaces Discrete Element Method (DEM) routines and coupled (CFD-DEM) multiphysics simulations. 🧵 📜: arxiv.org/abs/2411.09678 🖥️: nx-ai.github.io/NeuralDEM/
GIF
GIF
English
7
91
389
41.2K
Thomas Schmied retweetledi
Niklas Schmidinger
Niklas Schmidinger@smdrnks·
We are excited to introduce Bio-xLSTM! TLDR: we extend xLSTM to genomic, protein and molecular domains and find that it is a proficient generative model, learns rich representations and can perform in-context learning.
English
1
12
28
4K
Thomas Schmied retweetledi
Günter Klambauer
Günter Klambauer@gklambauer·
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences xLSTM also shines for DNA, proteins and small molecules -- can handle large-range interactions and huge context! P: arxiv.org/abs/2411.04165
Günter Klambauer tweet media
English
0
39
154
19.5K
Thomas Schmied
Thomas Schmied@thsschmied·
Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 arxiv.org/abs/2410.22391
Thomas Schmied tweet media
English
3
44
214
29.4K