Neehal Tumma

22 posts

Neehal Tumma

Neehal Tumma

@ntumm120

ML PhD @MIT researching efficient architectures | @LiquidAI | Prev CS/Math @Harvard

Cambridge, MA Katılım Temmuz 2023
259 Takip Edilen106 Takipçiler
Sabitlenmiş Tweet
Neehal Tumma
Neehal Tumma@ntumm120·
Some say Gated DeltaNet > Mamba-2. Others say Mamba-2 > Gated DeltaNet. But what if Gated DeltaNet = Mamba-2? 👀 Well maybe not exactly — but with least-squares preconditioning, we show that they reduce to the same recurrence! We use this lens to design PDN, PGDN, and PKDA: preconditioned delta-style recurrences that outperform their unpreconditioned counterparts at scale 📈 📄 Paper: arxiv.org/abs/2604.21100 w/ @loo_noel @liquidai 💻 Code: github.com/ntumm120/preco…
English
6
30
195
12.5K
Neehal Tumma
Neehal Tumma@ntumm120·
Some say Gated DeltaNet > Mamba-2. Others say Mamba-2 > Gated DeltaNet. But what if Gated DeltaNet = Mamba-2? 👀 Well maybe not exactly — but with least-squares preconditioning, we show that they reduce to the same recurrence! We use this lens to design PDN, PGDN, and PKDA: preconditioned delta-style recurrences that outperform their unpreconditioned counterparts at scale 📈 📄 Paper: arxiv.org/abs/2604.21100 w/ @loo_noel @liquidai 💻 Code: github.com/ntumm120/preco…
English
6
30
195
12.5K
Neehal Tumma
Neehal Tumma@ntumm120·
@YifeiZuoX Agreed, I think that would be a super interesting way to push beyond soft max attention. Also Ive read your LLA paper, cool stuff!
English
0
0
1
52
Yifei Zuo
Yifei Zuo@YifeiZuoX·
@ntumm120 Nice work. Preconditioning actually works for Softmax Attention as well, not just for Linear Attention family.
English
1
0
0
71
Vaibhav Berlia
Vaibhav Berlia@VaibhavBerlia·
@ntumm120 Nice - curious whether a low-rank + diagonal preconditioner would recover more of the gap without blowing up the kernel cost
English
1
0
1
79
Neehal Tumma
Neehal Tumma@ntumm120·
(14/N) Next up: we’re working on updating our kernels for the latest flash-linear-attention stack, including the newer GDN Tilelang and FlashKDA kernels, so stay tuned! And if you made it this far, thanks for reading :)
English
1
1
2
317
Neehal Tumma
Neehal Tumma@ntumm120·
(13/N) But we think this only scratches the surface. PDN/PGDN/PKDA are just three points in a larger design space: different recurrence families + different preconditioners. There is a huge optimization literature on preconditioning that can be translated into recurrence design. The bigger message: preconditioning is a useful new lever for designing linear recurrences.
English
1
1
3
336
Neehal Tumma retweetledi
Liquid AI
Liquid AI@liquidai·
Today, we release LFM2.5-VL-450M, a vision-language model built for real-time reasoning on edge devices. It processes a 512×512 image and returns structured outputs in ~240ms on-device.
Liquid AI tweet media
English
25
132
1.1K
114.7K