Maximilian Beck

305 posts

Maximilian Beck

@maxmbeck

AI Research Scientist @Meta FAIR. Prev. ELLIS PhD Student @ JKU Linz & PhD Researcher @nx_ai_com, Research Scientist Intern @Meta FAIR

Linz, Österreich Katılım Haziran 2021

868 Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

Maximilian Beck@maxmbeck·19 Mar

Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)

English

344

47.9K

Maximilian Beck@maxmbeck·14 May

Life update: A few weeks ago, I moved to Paris 🇫🇷 to start a new position as AI Scientist at Meta FAIR. I am excited about this new chapter and look forward to the opportunities ahead.✨

GIF

English

1.6K

Maximilian Beck retweetledi

Ai2@allen_ai·30 Nis

Recipes for teaching language models to handle long inputs don't work equally well across model families. We wanted to know why—is it the architecture, the training data, or both? 🧵

English

25K

Maximilian Beck retweetledi

Günter Klambauer@gklambauer·1 May

# GREAT news!!! 4 papers from our group got accepted at ICML 2026!!! # - 🧬 Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design - 🔁 xLSTM Distillation: Achieving Teacher-Student Parity Through Efficient Hybrid Architectures

English

2.9K

Maximilian Beck retweetledi

Sepp Hochreiter@HochreiterSepp·21 Nis

RNNs like xLSTM with vertically chunked inference strategy for efficient memory: arxiv.org/abs/2604.18199 Chunking enables a linear-time and constant-memory like TFLA for xLSTM arxiv.org/abs/2503.14376 To chunk blocks via recurrent updates and speed up computation considerably.

English

8.9K

Maximilian Beck@maxmbeck·12 Nis

If you want to know more, visit our poster at ICLR: iclr.cc/virtual/2026/p…

English

298

Maximilian Beck@maxmbeck·12 Nis

These checkpoints come from our token-per-parameter training setup and are fully compatible with the xLSTM-7B Hugging Face implementation: huggingface.co/NX-AI/xLSTM-7b

English

330

Maximilian Beck@maxmbeck·12 Nis

We’ve released 35 xLSTM checkpoints from our scaling law study, spanning 160M to 7B parameters and trained on 3B - 1.5T tokens from the DCLM dataset. huggingface.co/NX-AI/xlstm_sc…

Maximilian Beck@maxmbeck

🚀 Excited to share our new paper on scaling laws for xLSTMs vs. Transformers. Key result: xLSTM models Pareto-dominate Transformers in cross-entropy loss. - At fixed FLOP budgets → xLSTMs perform better - At fixed validation loss → xLSTMs need fewer FLOPs 🧵 Details in thread

English

115

12.3K

Maximilian Beck@maxmbeck·27 Mar

@KorbiPoeppel @HochreiterSepp Thanks, Korbi 🙂

English

Korbinian Poeppel@KorbiPoeppel·27 Mar

Well deserved! I can only thank you as well for being such a outstandingly great collaborator, too! 🙏 It's been an amazing time in Linz - thanks for putting us together, @HochreiterSepp !

Maximilian Beck@maxmbeck

👨‍🎓Last week, I successfully defended my PhD thesis - an incredibly exciting and rewarding milestone after 3.5 years of work on xLSTM: Recurrent Neural Network Architectures for Scalable and Efficient Large Language Models

English

544

Maximilian Beck@maxmbeck·27 Mar

Now, I’m looking forward to a relaxing Easter break and I’m excited for what comes next 🚀 📄 Thesis: maxbeck.ai/resources/phd_… 🎤 Defense slides: maxbeck.ai/resources/talk…

English

972

Maximilian Beck@maxmbeck·27 Mar

And of course many thanks to @KorbiPoeppel for being an amazing co-author on nearly all xLSTM papers. I also want to thank all collaborators, friends, and family for their support.🤗

English

340

Maximilian Beck@maxmbeck·27 Mar

English

138

8.6K

Maximilian Beck@maxmbeck·21 Mar

Looks Great ! 🔥 Thanks for adding @rasbt

Sebastian Raschka@rasbt

@maxmbeck Added ✅ #card-xlstm-7b" target="_blank" rel="nofollow noopener">sebastianraschka.com/llm-architectu… Thanks again!

English

543

Maximilian Beck retweetledi

Niklas Schmidinger@smdrnks·17 Mar

Excited to share our new paper: Effective Distillation to Hybrid xLSTM Architectures. TL;DR: we retrofit / graft / distill / linearize Transformers into xLSTM-SWA hybrids with fixed-size states. This gives a practical path to studying linear and hybrid architectures starting from already strong pretrained models.

Sepp Hochreiter@HochreiterSepp

xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.

English

1.2K

Maximilian Beck@maxmbeck·17 Mar

🆕 New xLSTM models! 🔥 ⚗️ This time distilled from Llama, Qwen & Olmo!

Sepp Hochreiter@HochreiterSepp

English

2.6K

Maximilian Beck@maxmbeck·16 Mar

Thanks @rasbt for the great overview — and for leaving a little spot for xLSTM 7B 😉 📄Paper: arxiv.org/abs/2503.13427

Sebastian Raschka@rasbt

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

7.3K

Maximilian Beck@maxmbeck·12 Mar

@babakRmni Thank you :) maybe this is already interesting to you: arxiv.org/abs/2603.09951

English

Babak Rahmani@babakRmni·12 Mar

@maxmbeck Thanks Maximilian. Looking forrward to reading your upcoming work on CWMs :)

English

Maximilian Beck@maxmbeck·10 Mar

Very cool in depth prediction error analysis of Code World Model (CWM) 🌍 ⬇️⬇️⬇️ However, instead of „debugging code world models“, what about debugging WITH code world models? Stay tuned for more on this soon

Babak Rahmani@babakRmni

🧵Debugging Code World Models A few months ago we started studying CWMs. The plan was post-training an LLM on code execution traces. Two weeks in, we realised a paper by Meta had already done much of this : arxiv.org/pdf/2510.02387. We however identified what's wrong with them!

English

480

Maximilian Beck retweetledi

Jonas Gehring@jnsgehring·11 Mar

New work out! Neural interpreters ⏭️ neural debuggers Super fun project with @maxmbeck @janundnik @syhw

Maximilian Beck@maxmbeck

🧠🪲We introduce Neural Debuggers: 🧑‍🏭 LLMs that emulate traditional debuggers by predicting forward code execution (future states & outputs) and inverse execution (inferring prior states or inputs) conditioned on debugger actions such as step over, step into, or breakpoints.

English

1.1K

Keşfet

@KorbiPoeppel @HochreiterSepp @rasbt @babakRmni @janundnik @syhw @elonmusk @BarackObama