Thomas Schmied (@thsschmied) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 arxiv.org/abs/2410.22391

English

3

44

214

29.4K

Thomas Schmied retweetledi

Sepp Hochreiter@HochreiterSepp·17 Mar

xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.

English

5

59

313

24.4K

Thomas Schmied retweetledi

Niklas Schmidinger@smdrnks·17 Mar

Excited to share our new paper: Effective Distillation to Hybrid xLSTM Architectures. TL;DR: we retrofit / graft / distill / linearize Transformers into xLSTM-SWA hybrids with fixed-size states. This gives a practical path to studying linear and hybrid architectures starting from already strong pretrained models.

Sepp Hochreiter@HochreiterSepp

xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.

English

1

6

15

1.2K

Thomas Schmied retweetledi

Korbinian Poeppel@KorbiPoeppel·16 Haz

Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997

English

4

42

136

15.1K

Thomas Schmied retweetledi

Andreas Auer@AndAuer·2 Haz

We’re excited to introduce TiRex — a pre-trained time series forecasting model based on an xLSTM architecture.

English

5

21

71

14.8K

Thomas Schmied retweetledi

Nicolas Zucchet@NicolasZucchet·26 May

🧵What if emergence could be explained by learning a specific circuit: sparse attention? Our new work explores this bold hypothesis, showing a link between emergence and sparse attention that reveals how data properties influence when emergence occurs during training.

English

5

46

340

71.7K

Thomas Schmied retweetledi

Maximilian Beck@maxmbeck·10 May

Excited to share that 2 of our papers on efficient inference with #xLSTM are accepted at #ICML25. A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks (arxiv.org/abs/2410.22391) and xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference:

Maximilian Beck@maxmbeck

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

English

2

10

70

5.5K

Thomas Schmied retweetledi

Markus Wulfmeier@m_wulfmeier·28 Nis

Agentic LLMs need to explore! The web is a non-stationary, highly partially unobservable 🏔️. @thsschmied's @GoogleDeepMind internship systematically investigates current challenges and the role of scale, RL, and chain-of-thought reasoning to overcome them. arxiv.org/abs/2504.16078 I.e. #ReinforcementLearning is back, and everything is an #agent (we could have realised that everything is an MDP without heavily misusing the agent term 😉)

AK@_akhaliq

Google announced LLMs are Greedy Agents on Hugging Face Effects of RL Fine-tuning on Decision-Making Abilities

English

2

5

44

3.6K

Thomas Schmied retweetledi

AK@_akhaliq·23 Nis

Google announced LLMs are Greedy Agents on Hugging Face Effects of RL Fine-tuning on Decision-Making Abilities

English

5

107

705

98.1K

Thomas Schmied retweetledi

Nicolas Zucchet@NicolasZucchet·31 Mar

Large language models store vast amounts of knowledge, but how exactly do they learn it? Excited to share my @GoogleDeepMind internship results, which reveal the fascinating dynamics behind factual knowledge acquisition in LLMs! arxiv.org/abs/2503.21676

English

6

33

178

24.2K

Thomas Schmied retweetledi

Maximilian Beck@maxmbeck·19 Mar

Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)

English

3

60

345

47.9K

Thomas Schmied retweetledi

Lukas Aichberger@aichberger·18 Mar

⚠️Beware: Your AI assistant could be hijacked just by encountering a malicious image online! Our latest research exposes critical security risks in AI assistants. An attacker can hijack them by simply posting an image on social media and waiting for it to be captured. [1/6] 🧵

English

4

49

163

47.1K

Thomas Schmied retweetledi

Korbinian Poeppel@KorbiPoeppel·18 Mar

Check out our latest work on scaling up xLSTM to 7B parameters and 2.3T tokens, with all open training data, open training protocol and open training code. Nice team work! 💪💪

Maximilian Beck@maxmbeck

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

English

0

2

9

273

Thomas Schmied retweetledi

Maximilian Beck@maxmbeck·18 Mar

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

English

8

61

324

45K

Thomas Schmied retweetledi

Fabian Paischer@PaischerFabian·31 Ara

I am excited to present the result of a fruitful internship at @AIatMeta. We introduce preference discerning, which denotes in-context conditioning on user preferences expressed in text to steer the recommendation system. This enhances flexibility and personalization. 1/n

AI at Meta@AIatMeta

Newly published research for generative retrieval for recommendations from teams at Meta. - Preference Discerning with LLM-Enhanced Generative Retrieval ➡️ go.fb.me/evvcu8 - Unifying Generative and Dense Retrieval for Sequential Recommendation ➡️ go.fb.me/i7l955

English

3

36

189

28.9K

Thomas Schmied retweetledi

Lukas Aichberger@aichberger·20 Ara

𝗡𝗲𝘄 𝗣𝗮𝗽𝗲𝗿 𝗔𝗹𝗲𝗿𝘁: Rethinking Uncertainty Estimation in Natural Language Generation 🌟 Introducing 𝗚-𝗡𝗟𝗟, a theoretically grounded and highly efficient uncertainty estimate, perfect for scalable LLM applications 🚀 Dive into the paper 👇arxiv.org/abs/2412.15176

English

5

36

140

20.6K

Thomas Schmied retweetledi

Johannes Brandstetter@jo_brandstetter·15 Kas

Super hyped to share NeuralDEM -- the first real-time simulation of industrial particulate flows. NeuralDEM replaces Discrete Element Method (DEM) routines and coupled (CFD-DEM) multiphysics simulations. 🧵 📜: arxiv.org/abs/2411.09678 🖥️: nx-ai.github.io/NeuralDEM/

GIF

English

7

91

389

41.2K

Thomas Schmied retweetledi

Niklas Schmidinger@smdrnks·8 Kas

We are excited to introduce Bio-xLSTM! TLDR: we extend xLSTM to genomic, protein and molecular domains and find that it is a proficient generative model, learns rich representations and can perform in-context learning.

English

1

12

28

4K

Thomas Schmied retweetledi

Günter Klambauer@gklambauer·8 Kas

Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences xLSTM also shines for DNA, proteins and small molecules -- can handle large-range interactions and huge context! P: arxiv.org/abs/2411.04165

English

0

39

154

19.5K

Thomas Schmied@thsschmied·31 Eki

Paper: arxiv.org/abs/2410.22391 Datasets: huggingface.co/ml-jku GitHub: github.com/ml-jku/LRAM 🎃👻

English

0

8

502

Thomas Schmied@thsschmied·31 Eki

All our experiments are in simulation, but we are eager to put this into real robots. This was a fun project, many thanks to the team! Thomas Adler, @wehungpatil, @maxmbeck, @KorbiPoeppel, @jo_brandstetter, @gklambauer, Razvan Pascanu, @HochreiterSepp

English

1

0

3

623

Thomas Schmied@thsschmied·31 Eki

Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 arxiv.org/abs/2410.22391

English

3

44

214

29.4K

Thomas Schmied

Keşfet