Subquadratic

25 posts

Subquadratic banner
Subquadratic

Subquadratic

@subquadratic

AI lab leading the subquadratic LLM revolution.

Katılım Ocak 2026
525 Takip Edilen10.2K Takipçiler
Subquadratic
Subquadratic@subquadratic·
Introducing SubQ. The first fully sub-quadratic LLM with 12M-token context. 150 tokens per second. Get early access at subq.ai
English
8
12
111
12.7K
Subquadratic retweetledi
Alexander Whedon
Alexander Whedon@alex_whedon·
We were a little slow on this, but we just got a technical blog post up with more details. Please take a look! subq.ai/how-ssa-makes-… We have a model card coming next week, and we are happy to take requests for any specific details there. I am happy to answer any questions here!
English
33
20
285
78.7K
Subquadratic
Subquadratic@subquadratic·
The numbers behind the SubQ announcement: Speed: 52x faster than Flash Attention SWE Bench Verified: 81.8% Ruler (128K): 95% MRCR V2: 65.9% Get early access at subq.ai
English
21
15
126
16.6K
Subquadratic retweetledi
Rezoan Ferdose
Rezoan Ferdose@rezoan_ferdose·
@alex_whedon Congrats to the @subquadratic team! It’s awesome to see someone finally breaking out of the standard transformer box
English
1
1
31
18.8K
Subquadratic retweetledi
Alexander Whedon
Alexander Whedon@alex_whedon·
SubQ is available for early access today, alongside our coding agent, SubQ Code Get access today ↓ subq.ai
English
106
103
1.3K
734.3K
Subquadratic retweetledi
Shruti
Shruti@heyshrutimishra·
“Efficiency is Intelligence” @subquadratic’s tagline hits different when you realize current transformers literally can’t scale context without burning exponentially more compute. The post-transformer era isn’t coming. It’s already being built. #subq @alex_whedon
Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English
5
8
32
15.3K
Subquadratic retweetledi
Guri Singh
Guri Singh@heygurisingh·
this is the most expensive sentence Anthropic will read this year. Someone just shipped a frontier LLM with a 12 million token context window that runs at 5% the cost of Opus 4.7. It's called SubQ. First model built on sub-quadratic sparse attention. Here's why every AI lab should be panicking right now. Transformers check every word against every other word. Double the context, compute quadruples. The labs have known this since 2017. They scaled it anyway and charged you more the longer you needed your model to think. SubQ only computes the relationships that actually matter. → 12M token context with 98% accuracy at full length → 52x faster than FlashAttention at 1M tokens → Runs at under $1.50 per million tokens vs Opus at $15 → Cost scales linearly instead of exponentially Now read this part slowly. Every context window you've ever been sold was a marketing number. Accuracy on every frontier model falls apart past 200k tokens. The labs printed 1M on the box knowing most of that window was decoration. The entire RAG industry exists because the foundation was broken. Vector databases. Chunking pipelines. Summarization loops. Every workaround you've ever built or paid for was an apology for quadratic attention. They weren't clever engineering. They were duct tape on architecture that should have been replaced years ago. SubQ fixed the foundation. The math on every agent product being built right now just changed. Long-context at under 10% of Anthropic's price isn't a discount. It's you no longer paying for the company's mistake. The transformer was the first workable answer. Everyone scaled it so hard nobody wanted to admit it was a local maximum. @subquadratic is the first team to actually ship the way out. Opus 4.7 was the long-context benchmark king. That sentence is now in the past tense.
Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English
41
43
207
55.8K
Subquadratic retweetledi
Your Tech Girl
Your Tech Girl@yourtechgirl24·
This might be the most important LLM launch of 2026. @subquadratic just dropped SubQ — the first frontier model that doesn’t use standard transformer attention. The numbers are insane: → 12M token context window → 52x faster than FlashAttention at 1MM tokens → Less than 5% the cost of Opus → ~1,000x less compute This isn’t an incremental upgrade. It’s a new way for LLMs to scale.
Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English
3
12
29
8.5K
Subquadratic retweetledi
Subquadratic retweetledi
Subquadratic retweetledi
Subquadratic retweetledi
Rony
Rony@Ronycoder·
For 2 years, every "agent infrastructure" shipped was just engineers apologising for quadratic attention. RAG pipelines. Chunking hacks. Summarisation loops. Sliding windows. Not innovation. Survival. Transformers charge you exponentially more the longer they think. SubQ doesn't. → Linear scaling → 12M tokens that actually hold → ~5% of Opus 4.7 Every vector DB on your stack just became legacy code. #subq @alex_whedon @subquadratic
Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English
7
15
28
8.9K
Subquadratic retweetledi
Alexander Whedon
Alexander Whedon@alex_whedon·
Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.
English
1K
1.9K
14.5K
7.5M
Subquadratic retweetledi
Alexander Whedon
Alexander Whedon@alex_whedon·
We finally have swag.
Alexander Whedon tweet media
English
13
3
64
12.3K