Vihang Patil

218 posts

Vihang Patil banner
Vihang Patil

Vihang Patil

@wehungpatil

Post-training, Reinforcement Learning. Applied Science in Team Rufus @Amazon.

Berlin Katılım Temmuz 2019
291 Takip Edilen188 Takipçiler
Sabitlenmiş Tweet
Vihang Patil
Vihang Patil@wehungpatil·
This is what we have been working on for the last few months. Advent of architectures like xLSTM open new frontiers of efficiency for generative models. The xLSTM not only provides constant memory consumption with increasing context length, but is extremely fast at inference.
Thomas Schmied@thsschmied

Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 arxiv.org/abs/2410.22391

English
0
1
4
338
Vihang Patil retweetledi
Mayank Singh
Mayank Singh@mayansingh09·
Check out researchwith.ai. It’s your constant AI research companion. Read any PDF with the AI as your partner. ✍️Highlight and annotate your reading 🤖Ask powerful AI models questions 🗂️Organize your reading into folders 🌐Find new papers via conversation search
English
3
1
4
127
Vihang Patil retweetledi
Korbinian Poeppel
Korbinian Poeppel@KorbiPoeppel·
Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: arxiv.org/abs/2506.11997
Korbinian Poeppel tweet media
English
4
42
136
15.1K
Vihang Patil retweetledi
Mayank
Mayank@mayank_iitgn·
#Eka initiative is looking for your contributions to curate the List of websites in the Native Indian Languages. The majority of Indic websites are missing from existing corpora like CC. Please fill out this form to add URLs in your native language: docs.google.com/forms/d/1MUVA_…
English
4
4
27
5.1K
Vihang Patil retweetledi
torchrl
torchrl@torchrl1·
torchrl 🤝 gymnasium happy ever after With the help of the @FaramaFound team, we managed to make TorchRL compatible with gymnasium v1.1 onward!
English
1
3
13
586
Vihang Patil retweetledi
Maximilian Beck
Maximilian Beck@maxmbeck·
Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)
Maximilian Beck tweet media
English
3
60
345
47.9K
Vihang Patil retweetledi
Maximilian Beck
Maximilian Beck@maxmbeck·
📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)
Maximilian Beck tweet media
English
8
61
324
45K
Vihang Patil retweetledi
Korbinian Poeppel
Korbinian Poeppel@KorbiPoeppel·
Check out our latest work on scaling up xLSTM to 7B parameters and 2.3T tokens, with all open training data, open training protocol and open training code. Nice team work! 💪💪
Maximilian Beck@maxmbeck

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

English
0
2
9
273
Vihang Patil retweetledi
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Everything old is new again. Mamba/ssm folks should really google their "new idea + lstm" please. About a decade ago, people have tried a shitton of things with lstms. Nothing wrong with retrying with modern tools, but ack the past. This is not the first such case I see btw.
Lucas Beyer (bl16) tweet mediaLucas Beyer (bl16) tweet mediaLucas Beyer (bl16) tweet media
𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks paper: arxiv.org/abs/2412.16146 SSMs are emerging as efficient alternatives to transformers but struggle with spatial dependencies in visual tasks due to biases from their natural language processing origins. Existing methods rely on arbitrary 1D scan directions to process 2D data, which limits effectiveness. Mamba2D introduces a native 2D scan direction that models spatial dependencies more effectively by considering both input dimensions simultaneously. On the ImageNet-1K dataset, Mamba2D achieves performance comparable to prior SSM adaptations for vision tasks, offering a more efficient approach to handling visual inputs.

English
23
58
615
120.7K
Vihang Patil retweetledi
Lukas Aichberger
Lukas Aichberger@aichberger·
𝗡𝗲𝘄 𝗣𝗮𝗽𝗲𝗿 𝗔𝗹𝗲𝗿𝘁: Rethinking Uncertainty Estimation in Natural Language Generation 🌟 Introducing 𝗚-𝗡𝗟𝗟, a theoretically grounded and highly efficient uncertainty estimate, perfect for scalable LLM applications 🚀 Dive into the paper 👇arxiv.org/abs/2412.15176
English
5
36
140
20.6K
Vihang Patil retweetledi
Korbinian Poeppel
Korbinian Poeppel@KorbiPoeppel·
Thrilled to announce two new developments at JKU and NXAI that are released today: - We scaled xLSTM to 7B parameters: linktr.ee/xlstm - For the people caring about state tracking capabilities, there's the new FlashRNN library: arxiv.org/abs/2412.07752
Korbinian Poeppel tweet media
English
2
11
27
2.3K
Vihang Patil retweetledi
Niklas Schmidinger
Niklas Schmidinger@smdrnks·
We are excited to introduce Bio-xLSTM! TLDR: we extend xLSTM to genomic, protein and molecular domains and find that it is a proficient generative model, learns rich representations and can perform in-context learning.
English
1
12
28
4K
Vihang Patil retweetledi
Günter Klambauer
Günter Klambauer@gklambauer·
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences xLSTM also shines for DNA, proteins and small molecules -- can handle large-range interactions and huge context! P: arxiv.org/abs/2411.04165
Günter Klambauer tweet media
English
0
39
154
19.5K
Vihang Patil retweetledi
Sepp Hochreiter
Sepp Hochreiter@HochreiterSepp·
xLSTM as large recurrent action model. xLSTM has the potential to enter the field of robotics as it is much faster than transformers at inference. xLSTM can close the reality-gap by online learning in applications like robotics, self-driving, automated production systems. Cool.
Thomas Schmied@thsschmied

Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 arxiv.org/abs/2410.22391

English
3
32
177
17.3K
Vihang Patil retweetledi
Günter Klambauer
Günter Klambauer@gklambauer·
A LARGE RECURRENT ACTION MODEL: xLSTM enables Fast Inference for Robotics Tasks In robotics & embodied AIs, very fast inference is needed which is prohibitive for Transformers. xLSTM is well suited because of its recurrent inference mode. P: arxiv.org/abs/2410.22391
Günter Klambauer tweet media
English
1
3
16
1.2K
Vihang Patil retweetledi
Sayan Ranu
Sayan Ranu@SayanRanu·
Graph distillation compresses massive graph datasets into tiny versions that train GNNs as effectively as the original. But current methods have a huge problem.. They require training on the full data first—which defeats the whole purpose! Enter Bonsai(arxiv.org/pdf/2410.17579)
English
3
8
42
5.7K
Vihang Patil retweetledi
Kajetan Schweighofer
Kajetan Schweighofer@kschweig_·
Deep Ensembles are widely used to improve the performance of Deep Learning models. But beware, they can have profound impact on group fairness ⚖️ We analyzed why it happens and what can be done about it 🧵👇
Kajetan Schweighofer tweet media
English
3
20
78
11.7K