Vihang Patil

218 posts

Vihang Patil

@wehungpatil

Post-training, Reinforcement Learning. Applied Science in Team Rufus @Amazon.

Berlin Katılım Temmuz 2019

291 Takip Edilen188 Takipçiler

Sabitlenmiş Tweet

Vihang Patil@wehungpatil·31 Eki

This is what we have been working on for the last few months. Advent of architectures like xLSTM open new frontiers of efficiency for generative models. The xLSTM not only provides constant memory consumption with increasing context length, but is extremely fast at inference.

Thomas Schmied@thsschmied

Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 arxiv.org/abs/2410.22391

English

338

Vihang Patil retweetledi

Mayank Singh@mayansingh09·6 Şub

Check out researchwith.ai. It’s your constant AI research companion. Read any PDF with the AI as your partner. ✍️Highlight and annotate your reading 🤖Ask powerful AI models questions 🗂️Organize your reading into folders 🌐Find new papers via conversation search

English

127

Vihang Patil retweetledi

Korbinian Poeppel@KorbiPoeppel·16 Haz

Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997

English

136

15.1K

Vihang Patil retweetledi

Mayank@mayank_iitgn·9 Haz

#Eka initiative is looking for your contributions to curate the List of websites in the Native Indian Languages. The majority of Indic websites are missing from existing corpora like CC. Please fill out this form to add URLs in your native language: docs.google.com/forms/d/1MUVA_…

English

5.1K

Vihang Patil retweetledi

torchrl@torchrl1·10 Nis

torchrl 🤝 gymnasium happy ever after With the help of the @FaramaFound team, we managed to make TorchRL compatible with gymnasium v1.1 onward!

English

586

Vihang Patil retweetledi

Maximilian Beck@maxmbeck·19 Mar

Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)

English

345

47.9K

Vihang Patil retweetledi

Maximilian Beck@maxmbeck·18 Mar

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

English

324

45K

Vihang Patil retweetledi

Korbinian Poeppel@KorbiPoeppel·18 Mar

Check out our latest work on scaling up xLSTM to 7B parameters and 2.3T tokens, with all open training data, open training protocol and open training code. Nice team work! 💪💪

Maximilian Beck@maxmbeck

English

273

Vihang Patil@wehungpatil·14 Mar

Great place to work 😃

Sepp Hochreiter@HochreiterSepp

Join Our Research Team in Linz! We are looking for 5 PostDocs and 10 PhDs in Machine Learning working on xLSTM, NLP, robustness, learning theory. Deadline: 04/20/25. More details: jku.at/en/lit-artific… #MachineLearning #DeepLearning #ResearchOpportunities #PhDPositions

English

Vihang Patil retweetledi

Lucas Beyer (bl16)@giffmana·24 Ara

Everything old is new again. Mamba/ssm folks should really google their "new idea + lstm" please. About a decade ago, people have tried a shitton of things with lstms. Nothing wrong with retrying with modern tools, but ack the past. This is not the first such case I see btw.

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks paper: arxiv.org/abs/2412.16146 SSMs are emerging as efficient alternatives to transformers but struggle with spatial dependencies in visual tasks due to biases from their natural language processing origins. Existing methods rely on arbitrary 1D scan directions to process 2D data, which limits effectiveness. Mamba2D introduces a native 2D scan direction that models spatial dependencies more effectively by considering both input dimensions simultaneously. On the ImageNet-1K dataset, Mamba2D achieves performance comparable to prior SSM adaptations for vision tasks, offering a more efficient approach to handling visual inputs.

English

615

120.7K

Vihang Patil retweetledi

Lukas Aichberger@aichberger·20 Ara

𝗡𝗲𝘄 𝗣𝗮𝗽𝗲𝗿 𝗔𝗹𝗲𝗿𝘁: Rethinking Uncertainty Estimation in Natural Language Generation 🌟 Introducing 𝗚-𝗡𝗟𝗟, a theoretically grounded and highly efficient uncertainty estimate, perfect for scalable LLM applications 🚀 Dive into the paper 👇arxiv.org/abs/2412.15176

English

140

20.6K

Vihang Patil retweetledi

Korbinian Poeppel@KorbiPoeppel·11 Ara

Thrilled to announce two new developments at JKU and NXAI that are released today: - We scaled xLSTM to 7B parameters: linktr.ee/xlstm - For the people caring about state tracking capabilities, there's the new FlashRNN library: arxiv.org/abs/2412.07752

English

2.3K

Vihang Patil retweetledi

Niklas Schmidinger@smdrnks·8 Kas

We are excited to introduce Bio-xLSTM! TLDR: we extend xLSTM to genomic, protein and molecular domains and find that it is a proficient generative model, learns rich representations and can perform in-context learning.

English

Vihang Patil retweetledi

Günter Klambauer@gklambauer·8 Kas

Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences xLSTM also shines for DNA, proteins and small molecules -- can handle large-range interactions and huge context! P: arxiv.org/abs/2411.04165

English

154

19.5K

Vihang Patil@wehungpatil·1 Kas

@techphilo_art @HochreiterSepp We do compare against the transformer in our experiments. You can find them here: arxiv.org/abs/2410.22391

English

Vihang Patil retweetledi

Sepp Hochreiter@HochreiterSepp·31 Eki

xLSTM as large recurrent action model. xLSTM has the potential to enter the field of robotics as it is much faster than transformers at inference. xLSTM can close the reality-gap by online learning in applications like robotics, self-driving, automated production systems. Cool.

Thomas Schmied@thsschmied

English

177

17.3K

Vihang Patil retweetledi

Günter Klambauer@gklambauer·31 Eki

A LARGE RECURRENT ACTION MODEL: xLSTM enables Fast Inference for Robotics Tasks In robotics & embodied AIs, very fast inference is needed which is prohibitive for Transformers. xLSTM is well suited because of its recurrent inference mode. P: arxiv.org/abs/2410.22391

English

1.2K

Vihang Patil retweetledi

Sayan Ranu@SayanRanu·26 Eki

Graph distillation compresses massive graph datasets into tiny versions that train GNNs as effectively as the original. But current methods have a huge problem.. They require training on the full data first—which defeats the whole purpose! Enter Bonsai(arxiv.org/pdf/2410.17579)

English

5.7K

Vihang Patil retweetledi

Kajetan Schweighofer@kschweig_·21 Eki

Deep Ensembles are widely used to improve the performance of Deep Learning models. But beware, they can have profound impact on group fairness ⚖️ We analyzed why it happens and what can be done about it 🧵👇

English

11.7K

Vihang Patil retweetledi

Turing Post@TheTuringPost·17 Eki

7. Original paper: arxiv.org/pdf/2410.07071 Authors: @thsschmied, @PaischerFabian, @wehungpatil, @mrkhof, Razvan Pascanu, @HochreiterSepp @LITAILab, @ExtensityAI, @ucl, @nx_ai_com, and @GoogleDeepMind

983

Keşfet

@FaramaFound @techphilo_art @HochreiterSepp @thsschmied @PaischerFabian @mrkhof @LITAILab @ExtensityAI