Nick Alonso (@Nick__Alonso) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Nick Alonso@Nick__Alonso·6 Şub

I enjoyed working on this one. If you're interested in self-attention alternatives, this might interest you. Thanks to all those @ZyphraAI who helped out.

Zyphra@ZyphraAI

Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing! Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute. OVQ-attention is an alternative path. 🧵

English

0

6

670

Nick Alonso retweetledi

samsja@samsja19·7 Şub

Zyphra is still under the radar but doing truly innovative architecture work

Zyphra@ZyphraAI

Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing! Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute. OVQ-attention is an alternative path. 🧵

English

0

7

122

12.1K

Nick Alonso retweetledi

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·6 Şub

OVQ shows a practical route to handling distribution shift via online codebook learning. The universal codebook result is the theoretical side: a fixed decoder can be near optimal for any activation covariance with only a tiny rate gap, if we can actually build that codebook.

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

Zyphra Online Vector Quantized Attention OVQ-attention keeps linear time and constant memory but avoids long-context collapse by learning both key and value centroids online, so memory tracks the live KV stream instead of a fixed dictionary. Sparse updates route each token to a single slot, so memory capacity scales without increasing per-token compute. Based on Gaussian Mixture Regression with online EM-style updates, it outperforms VQ and linear baselines, generalizes from ~4k training context to 64k+, and stays competitive with attention using ~10–25% of the state; still early at sub-500M scale and not kernel-optimized.

English

0

2

7

1.1K

Nick Alonso@Nick__Alonso·6 Şub

Nice summary.👇

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

Zyphra Online Vector Quantized Attention OVQ-attention keeps linear time and constant memory but avoids long-context collapse by learning both key and value centroids online, so memory tracks the live KV stream instead of a fixed dictionary. Sparse updates route each token to a single slot, so memory capacity scales without increasing per-token compute. Based on Gaussian Mixture Regression with online EM-style updates, it outperforms VQ and linear baselines, generalizes from ~4k training context to 64k+, and stays competitive with attention using ~10–25% of the state; still early at sub-500M scale and not kernel-optimized.

English

0

4

219

Nick Alonso retweetledi

Zyphra@ZyphraAI·6 Şub

Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing! Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute. OVQ-attention is an alternative path. 🧵

English

5

37

226

37.2K

Nick Alonso retweetledi

Songlin Yang@SonglinYang4·8 Kas

Hi @JeffDean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code

Jeff Dean@JeffDean

An exciting new approach for doing continual learning, using nested optimization for enhancing long context processing.

English

22

38

1K

249.8K

Nick Alonso retweetledi

Quentin Anthony@QuentinAnthon15·31 Eki

At this point in attention-free architectures, so many people have poisoned the well that it's just a well of poison. A "Transformer Killer™" drops once a month, and then the authors come back and "kill" transformers again like 5 months later. Love the work, I'm knee-deep in a lot of it, but please for the love of god stop over-hyping. Being grounded and pointing out your own limitations gets people more excited, I promise.

English

1

36

2.5K

Nick Alonso retweetledi

Zyphra@ZyphraAI·8 Eki

@ZyphraAI is excited to release Compressed Convolutional Attention (CCA), a novel attention mechanism that: - Beats MHA, GQA, MLA for dense and MoE models - Reduces training/prefill flops - 3x fewer parameters vs MHA - Matches GQA/MLA KV-cache sizes without quality penalty

English

2

6

34

14.1K

Nick Alonso retweetledi

rishi@rishiiyer01·7 Eki

new paper arxiv.org/abs/2510.04476

English

9

31

277

60.1K

Nick Alonso retweetledi

Zyphra@ZyphraAI·1 Eki

Read more at the blog post here: newsroom.ibm.com/2025-10-01-ibm…

English

0

2

12

1.4K

Nick Alonso retweetledi

TensorWave@tensorwave·8 Eyl

It’s not just about GPUs. It’s about the ecosystem. @QuentinAnthon15 joined @jtatarchuk on the Beyond CUDA podcast to share how moving to @AMD MI300X cut training costs at @ZyphraAI 📺 Watch the full episode on YouTube (link in comments)

English

2

4

13

1.1K

Nick Alonso retweetledi

rishi@rishiiyer01·9 Tem

reach out if you want to work with me and others on novel architectures for pretraining! dms are open jobs.ashbyhq.com/zyphra/e509d43…

English

0

4

15

1.1K

Nick Alonso@Nick__Alonso·29 Haz

Learning in real time, during deployment, i.e. doing online-continual learning, effectively is important for many applications. It's also associated with theories of intelligence that emphasize learning efficiency, and is an ability where the gap between animals and AI is large.

dr. jack morris@jxmnop

seems big AI labs are hyperfixating on reasoning when they should focus on *memory* instead normal people won't use models that can think for hours to solve hard math problems people want models that learn over time, remember details, adapt and interact like a person would

English

0

7

160

Nick Alonso retweetledi

Zyphra@ZyphraAI·10 Nis

Zyphra is releasing our first reasoning model, ZR1-1.5B. This small but powerful reasoning model excels at both math and code, making it one of the best models in these categories for its size. It also uses 60% less reasoning tokens than comparable models. 🆓Apache 2.0 license.

English

15

63

499

95.1K

Nick Alonso@Nick__Alonso·10 Mar

@petemandik Oh great! Thanks for the reference. I was unaware of this. Will be taking a look.

English

0

1

13

Pete Mandik@petemandik·9 Mar

@Nick__Alonso This is very much what Rorty does in his chapter “persons without minds” chapter of Philosophy and the Mirror of Nature. He calls his aliens the Antipodeans degruyter.com/document/doi/1…

English

1

0

1

82

Nick Alonso@Nick__Alonso·9 Mar

Thought experiment: what should a non-conscious alien scientist conclude about human theories of consciousness? What should humans think of the alien's conclusion? In my blog(link below), I argue this scenario supports Illusionist views of consciousness. @keithfrankish @eschwitz

English

1

0

83

Nick Alonso@Nick__Alonso·9 Mar

(6/)The scenario also raises the question of how we could even get a non-conscious scientist to understand what we mean by terms like 'phenomenal character', a point which may support those who argue such terms are not meaningful enough to discuss in the first place. @petemandik

English

1

0

1

68

Nick Alonso@Nick__Alonso·9 Mar

(5/) If we cannot find good reasons to convince a non-conscious scientist that phenomenal consciousness and the hard problem exist, then why should humans ever believe they do?

English

1

0

54

Nick Alonso

Keşfet