Subquadratic

2

16

16.6K

Subquadratic retweetledi

Rezoan Ferdose@rezoan_ferdose·8h

@alex_whedon Congrats to the @subquadratic team! It’s awesome to see someone finally breaking out of the standard transformer box

English

31

18.8K

Subquadratic retweetledi

Alexander Whedon@alex_whedon·10h

SubQ is available for early access today, alongside our coding agent, SubQ Code Get access today ↓ subq.ai

English

106

103

1.3K

734.3K

Subquadratic retweetledi

Shruti@heyshrutimishra·9h

“Efficiency is Intelligence” @subquadratic’s tagline hits different when you realize current transformers literally can’t scale context without burning exponentially more compute. The post-transformer era isn’t coming. It’s already being built. #subq @alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

5

8

32

15.3K

Subquadratic retweetledi

shirish@shiri_shh·9h

@alex_whedon i still can't believe this.. 12M context window with 98% accuracy 💀 @alex_whedon @subquadratic

English

4

226

33.8K

Subquadratic retweetledi

Guri Singh@heygurisingh·9h

this is the most expensive sentence Anthropic will read this year. Someone just shipped a frontier LLM with a 12 million token context window that runs at 5% the cost of Opus 4.7. It's called SubQ. First model built on sub-quadratic sparse attention. Here's why every AI lab should be panicking right now. Transformers check every word against every other word. Double the context, compute quadruples. The labs have known this since 2017. They scaled it anyway and charged you more the longer you needed your model to think. SubQ only computes the relationships that actually matter. → 12M token context with 98% accuracy at full length → 52x faster than FlashAttention at 1M tokens → Runs at under $1.50 per million tokens vs Opus at $15 → Cost scales linearly instead of exponentially Now read this part slowly. Every context window you've ever been sold was a marketing number. Accuracy on every frontier model falls apart past 200k tokens. The labs printed 1M on the box knowing most of that window was decoration. The entire RAG industry exists because the foundation was broken. Vector databases. Chunking pipelines. Summarization loops. Every workaround you've ever built or paid for was an apology for quadratic attention. They weren't clever engineering. They were duct tape on architecture that should have been replaced years ago. SubQ fixed the foundation. The math on every agent product being built right now just changed. Long-context at under 10% of Anthropic's price isn't a discount. It's you no longer paying for the company's mistake. The transformer was the first workable answer. Everyone scaled it so hard nobody wanted to admit it was a local maximum. @subquadratic is the first team to actually ship the way out. Opus 4.7 was the long-context benchmark king. That sentence is now in the past tense.

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

41

43

207

55.8K

Subquadratic retweetledi

Your Tech Girl@yourtechgirl24·9h

This might be the most important LLM launch of 2026. @subquadratic just dropped SubQ — the first frontier model that doesn’t use standard transformer attention. The numbers are insane: → 12M token context window → 52x faster than FlashAttention at 1MM tokens → Less than 5% the cost of Opus → ~1,000x less compute This isn’t an incremental upgrade. It’s a new way for LLMs to scale.

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

3

12

29

8.5K

Subquadratic retweetledi

Apex Mentality@apex_mentality_·9h

@AIStackLabX @subquadratic is clearly thinking ahead on how models should evolve

English

1

3

2K

Subquadratic retweetledi

Daily Wisdom@Daily__wisdom_·9h

@AIStackLabX @subquadratic is pushing the right ideas here

English

1

4

1.9K

Subquadratic retweetledi

Brady Long@thisguyknowsai·9h

Opus 4.7 is not the long-context king anymore.  SubQ dropped sub-quadratic heat at 52x speed and <5% cost. Wild breakthrough. #subq @alex_whedon @subquadratic

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

5

2.7K

Subquadratic retweetledi

Farlane@base10_·9h

Everyone focused on scaling transformers. Almost no one questioned the foundation. SubQ didn’t just improve the model. It changed the math behind how models think. That’s where real breakthroughs come from. #subq, @alex_whedon, @subquadratic

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

2

8

1.7K

Subquadratic retweetledi

Alexander Whedon@alex_whedon·9h

@Ronycoder The LLMs market isn't ready for @subquadratic 👀

English

2

25

2.5K

Subquadratic retweetledi

Dhaval Makwana@heyDhavall·9h

Every long-context demo you’ve seen so far… was a number on a box. → 1M tokens → 2M tokens → looks good in docs accuracy collapses way before that SubQ actually holds at 12M #subq @alex_whedon @subquadratic

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

9

16

4.1K

Subquadratic retweetledi

Rony@Ronycoder·10h

For 2 years, every "agent infrastructure" shipped was just engineers apologising for quadratic attention. RAG pipelines. Chunking hacks. Summarisation loops. Sliding windows. Not innovation. Survival. Transformers charge you exponentially more the longer they think. SubQ doesn't. → Linear scaling → 12M tokens that actually hold → ~5% of Opus 4.7 Every vector DB on your stack just became legacy code. #subq @alex_whedon @subquadratic