

Thomas Schmied
69 posts

@thsschmied
PhD student @ JKU Linz, Institute for Machine Learning.






xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.



📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

Google announced LLMs are Greedy Agents on Hugging Face Effects of RL Fine-tuning on Decision-Making Abilities



📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)


Newly published research for generative retrieval for recommendations from teams at Meta. - Preference Discerning with LLM-Enhanced Generative Retrieval ➡️ go.fb.me/evvcu8 - Unifying Generative and Dense Retrieval for Sequential Recommendation ➡️ go.fb.me/i7l955






