Neehal Tumma (@ntumm120) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Some say Gated DeltaNet > Mamba-2. Others say Mamba-2 > Gated DeltaNet. But what if Gated DeltaNet = Mamba-2? 👀 Well maybe not exactly — but with least-squares preconditioning, we show that they reduce to the same recurrence! We use this lens to design PDN, PGDN, and PKDA: preconditioned delta-style recurrences that outperform their unpreconditioned counterparts at scale 📈 📄 Paper: arxiv.org/abs/2604.21100 w/ @loo_noel @liquidai 💻 Code: github.com/ntumm120/preco…

English

6

30

195

12.5K

Neehal Tumma@ntumm120·6d

@jeff_rey_chiu Can Anthropic give me some compute?

English

0

20

Neehal Tumma@ntumm120·27 Nis

Some say Gated DeltaNet > Mamba-2. Others say Mamba-2 > Gated DeltaNet. But what if Gated DeltaNet = Mamba-2? 👀 Well maybe not exactly — but with least-squares preconditioning, we show that they reduce to the same recurrence! We use this lens to design PDN, PGDN, and PKDA: preconditioned delta-style recurrences that outperform their unpreconditioned counterparts at scale 📈 📄 Paper: arxiv.org/abs/2604.21100 w/ @loo_noel @liquidai 💻 Code: github.com/ntumm120/preco…

English

6

30

195

12.5K

Neehal Tumma@ntumm120·29 Nis

@YifeiZuoX Agreed, I think that would be a super interesting way to push beyond soft max attention. Also Ive read your LLA paper, cool stuff!

English

0

1

52

Yifei Zuo@YifeiZuoX·29 Nis

@ntumm120 Nice work. Preconditioning actually works for Softmax Attention as well, not just for Linear Attention family.

English

1

0

71

Neehal Tumma@ntumm120·29 Nis

@VaibhavBerlia Yeah we looked into this a bit, Woodbury solve is expensive though

English

0

1

53

Vaibhav Berlia@VaibhavBerlia·28 Nis

@ntumm120 Nice - curious whether a low-rank + diagonal preconditioner would recover more of the gap without blowing up the kernel cost

English

1

0

1

79

Neehal Tumma@ntumm120·28 Nis

@skippperD Means a lot man 😂

English

0

1

35

Daniel@skippperD·28 Nis

Not quite sure what this means but if Neehal says it, it’s big

Neehal Tumma@ntumm120

Some say Gated DeltaNet > Mamba-2. Others say Mamba-2 > Gated DeltaNet. But what if Gated DeltaNet = Mamba-2? 👀 Well maybe not exactly — but with least-squares preconditioning, we show that they reduce to the same recurrence! We use this lens to design PDN, PGDN, and PKDA: preconditioned delta-style recurrences that outperform their unpreconditioned counterparts at scale 📈 📄 Paper: arxiv.org/abs/2604.21100 w/ @loo_noel @liquidai 💻 Code: github.com/ntumm120/preco…

English

1

0

1

143

Neehal Tumma@ntumm120·27 Nis

(14/N) Next up: we’re working on updating our kernels for the latest flash-linear-attention stack, including the newer GDN Tilelang and FlashKDA kernels, so stay tuned! And if you made it this far, thanks for reading :)

English

1

2

317

Neehal Tumma@ntumm120·27 Nis

(13/N) But we think this only scratches the surface. PDN/PGDN/PKDA are just three points in a larger design space: different recurrence families + different preconditioners. There is a huge optimization literature on preconditioning that can be translated into recurrence design. The bigger message: preconditioning is a useful new lever for designing linear recurrences.

English

1

3

336

Neehal Tumma retweetledi

Liquid AI@liquidai·8 Nis

Today, we release LFM2.5-VL-450M, a vision-language model built for real-time reasoning on edge devices. It processes a 512×512 image and returns structured outputs in ~240ms on-device.

English

25

132

1.1K

114.7K

Neehal Tumma

Keşfet