
CM1
890 posts



Think about the timeline: Passenger leaves Hondius on April 24 Travels internationally during a possible incubation window Attends a Hanoi conference for extreme travelers Potential transit through Hong Kong and Bangkok Conference attendees themselves are hyper-mobile “100+ country” travelers Current location reportedly UNKNOWN














my first take, and a good lesson on good research epistemics here: what can we infer from ~82% SWE-Bench? it’s possible they (1) they trained a new model, from scratch, that is unlike a regular transformer but i’ve never heard of this company before, and checking their funding round they’ve only raised ~30M, so it’s unlikely they could/afford to train a Opus/GPT-5/Kimi 2.6 level coding model right now from scratch so this tells us that (2) they need to bootstrap off of an existing pretrained model, likely RL too, to get that performance! this tells us they’ve taken a vanilla Transformer and modified the attention mechanism, likely finetuning/midtraining in a subquadratic attention method its quite possible it doesn’t really work and that there’s some degeneracy to the method, or it’s just plain fake but if it’s not, you could expect that given how long it takes to do weight surgery on big models (bigger changes to a pretrained model == longer mid training to recover performance), it’s a lightweight change id lean towards something mostly leveraging existing attention key value protections like a fancy version of deepseeks sparse attention paper, but it could also be some unique test-time KV compression, which would come with its own downsides



Three people on a cruise ship have passed away from suspected infections of hantavirus, a rare family of viruses carried by rodents.


Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.





He’s dead on.



