Miguel Cardoso

278 posts

Miguel Cardoso

Miguel Cardoso

@ODordio

Software, AI & Product dude. 🇵🇹 Systems Engineer, AI @ soon xCloudflare Lecturing, writing, family, side projects & other adventures. views are my own

Lisbon Katılım Ekim 2022
123 Takip Edilen23 Takipçiler
Miguel Cardoso
Miguel Cardoso@ODordio·
~June 2025: Start the process for Cloudflare January 2026: Officially join Cloudflare May 2026: Access cutoff while on PTO with slightly dubious communication. These months passed by very fast and did a lot and there was more to be done but well.. Onwards
English
0
0
0
20
Kenton Varda
Kenton Varda@KentonVarda·
My agents are always spawning subagents and telling them to do exactly the thing I told the main agent to do, except with a less-clear prompt. I feel like this isn't helping.
English
66
9
486
46.6K
Guri Singh
Guri Singh@heygurisingh·
this is the most expensive sentence Anthropic will read this year. Someone just shipped a frontier LLM with a 12 million token context window that runs at 5% the cost of Opus 4.7. It's called SubQ. First model built on sub-quadratic sparse attention. Here's why every AI lab should be panicking right now. Transformers check every word against every other word. Double the context, compute quadruples. The labs have known this since 2017. They scaled it anyway and charged you more the longer you needed your model to think. SubQ only computes the relationships that actually matter. → 12M token context with 98% accuracy at full length → 52x faster than FlashAttention at 1M tokens → Runs at under $1.50 per million tokens vs Opus at $15 → Cost scales linearly instead of exponentially Now read this part slowly. Every context window you've ever been sold was a marketing number. Accuracy on every frontier model falls apart past 200k tokens. The labs printed 1M on the box knowing most of that window was decoration. The entire RAG industry exists because the foundation was broken. Vector databases. Chunking pipelines. Summarization loops. Every workaround you've ever built or paid for was an apology for quadratic attention. They weren't clever engineering. They were duct tape on architecture that should have been replaced years ago. SubQ fixed the foundation. The math on every agent product being built right now just changed. Long-context at under 10% of Anthropic's price isn't a discount. It's you no longer paying for the company's mistake. The transformer was the first workable answer. Everyone scaled it so hard nobody wanted to admit it was a local maximum. @subquadratic is the first team to actually ship the way out. Opus 4.7 was the long-context benchmark king. That sentence is now in the past tense.
Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English
88
114
702
222.7K
How To AI
How To AI@HowToAI_·
The entire RAG industry is about to get cooked. Researchers have built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book. hit 98.7% on financebench. beats every vector RAG on the leaderboard. no embeddings. no chunking. no vector DB. 100% open source.
How To AI tweet media
English
224
780
6.9K
611.7K
Ethan Mollick
Ethan Mollick@emollick·
(Sorry, after seeing so many of these, could not resist): 🚨 BREAKING: Google just dropped a NEW paper that completely deletes RNNs from existence. No recurrence. No convolutions. Nothing. Just one mechanism. And it’s destroying every translation benchmark on the planet. The title alone is a flex: “Attention Is All You Need” Vaswani. Shazeer. Parmar. Uszkoreit. Jones. Gomez. Kaiser. Polosukhin. 8 researchers. 1 architecture. The entire field of NLP will never be the same. Here’s why this is INSANE → LSTMs took DAYS to train. This thing trains in 12 hours on 8 GPUs. 🤯 → 28.4 BLEU on English-to-German. That’s not an improvement. That’s a MASSACRE. They beat the previous SOTA by over 2 points. → English-to-French? 41.8 BLEU. At a FRACTION of the training cost of every model that came before it. → They called it the “Transformer.” The name alone tells you they knew. But here’s the part nobody is talking about 👇 They threw out sequential processing ENTIRELY. Every other model on Earth processes words one at a time. This thing looks at the ENTIRE sentence simultaneously and figures out what matters. It’s called “self-attention” and it’s basically the model asking itself: “which words should I care about right now?” Every. Single. Token. In parallel. Do you understand what this means? Training that used to take WEEKS now takes HOURS. Models that couldn’t scale past a few layers? This thing stacks 6 encoders and 6 decoders like it’s nothing. And the multi-head attention? 8 attention heads running at once, each learning DIFFERENT relationships in the data. I’m not being dramatic when I say this paper just rewrote the rulebook. RNNs are cooked. 💀 LSTMs are cooked. 💀 The future is attention. And attention is ALL you need. Follow for more 🔔
Ethan Mollick tweet media
English
213
176
2.1K
290.9K
Miguel Cardoso
Miguel Cardoso@ODordio·
Things are getting so wild that now some PM can end up driving and some engineers can also set directions. There's room for it, and that's amazing.
English
0
0
0
4
Adam Wathan
Adam Wathan@adamwathan·
Does anything like AI search exist for Discord? So many huge communities with tons of unindexed knowledge, would be so useful to be able to ask the server questions.
English
18
0
96
26.3K
Miguel Cardoso retweetledi
kathyl
kathyl@kathyyliao·
icymi @CloudflareDev's Browser Run limits were 4x-ed - 120 concurrent browsers - 10 quick action reqs per second live on all Workers Paid plans
kathyl tweet media
English
3
13
117
12K
Miguel Cardoso
Miguel Cardoso@ODordio·
@iam_elias1 Ok. I thought the overly hyped marketing AI slop tweets were slowly disappearing. I guess not. Thanks for the share thou, it does formalize the concept nicely.
English
1
0
0
611
Elias Al
Elias Al@iam_elias1·
MIT just made every AI company's billion dollar bet look embarrassing. They solved AI memory. Not by building a bigger brain. By teaching it how to read. The paper dropped on December 31, 2025. Three MIT CSAIL researchers. One idea so obvious it hurts. And a result that makes five years of context window arms racing look like the wrong war entirely. Here is the problem nobody solved. Every AI model on the planet has a hard ceiling. A context window. The maximum amount of text it can hold in working memory at once. Cross that line and something ugly happens — something researchers have a clinical name for. Context rot. The more you pack into an AI's context, the worse it performs on everything already inside it. Facts blur. Information buried in the middle vanishes. The model does not become more capable as you feed it more. It becomes more confused. You give it your entire codebase and it forgets what it read three files ago. You hand it a 500-page legal document and it loses the clause from page 12 by the time it reaches page 400. So the industry built a workaround. RAG. Retrieval Augmented Generation. Chop the document into chunks. Store them in a database. Retrieve the relevant ones when needed. It was always a compromise dressed up as a solution. The retriever guesses which chunks matter before the AI has read anything. If it guesses wrong — and it does, constantly — the AI never sees the information it needed. The act of chunking destroys every relationship between distant paragraphs. The full picture gets shredded into fragments that the AI then tries to reassemble blindfolded. Two bad options. One broken industry. Three MIT researchers and a deadline of December 31st. Here is what they built. Stop putting the document in the AI's memory at all. That is the entire idea. That is the breakthrough. Store the document as a Python variable outside the AI's context window entirely. Tell the AI the variable exists and how big it is. Then get out of the way. When you ask a question, the AI does not try to remember anything. It behaves like a human expert dropped into a library with a computer. It writes code. It searches the document with regular expressions. It slices to the exact section it needs. It scans the structure. It navigates. It finds precisely what is relevant and pulls only that into its active window. Then it does something that makes this recursive. When the AI finds relevant material, it spawns smaller sub-AI instances to read and analyze those sections in parallel. Each one focused. Each one fast. Each one reporting back. The root AI synthesizes everything and produces an answer. No summarization. No deletion. No information loss. No decay. Every byte of the original document remains intact, accessible, and queryable for as long as you need it. Now here are the numbers. Standard frontier models on the hardest long-context reasoning benchmarks: scores near zero. Complete collapse. GPT-5 on a benchmark requiring it to track complex code history beyond 75,000 tokens — could not solve even 10% of problems. RLMs on the same benchmarks: solved them. Dramatically. Double-digit percentage gains over every alternative approach. Successfully handling inputs up to 10 million tokens — 100 times beyond a model's native context window. Cost per query: comparable to or cheaper than standard massive context calls. Read that again. One hundred times the context. Better answers. Same price. The timeline of the arms race makes this sting harder. GPT-3 in 2020: 4,000 tokens. GPT-4: 32,000. Claude 3: 200,000. Gemini: 1 million. Gemini 2: 2 million. Every generation, every company, billions of dollars spent, all betting on the same assumption. More context equals better performance. MIT just proved that assumption was wrong the entire time. Not slightly wrong. Fundamentally wrong. The entire premise of the last five years of context window research — that the solution to AI memory was a bigger window — was the wrong answer to the wrong question. The right question was never how much can you force an AI to hold in its head. It was whether you could teach an AI to know where to look. A human expert handed a 10,000-page archive does not read all 10,000 pages before answering your question. They navigate. They search. They find the relevant section, read it deeply, and synthesize the answer. RLMs are the first AI architecture that works the same way. The code is open source. On GitHub right now. Free. No license fees. No API costs. Drop it in as a replacement for your existing LLM API calls and your application does not even notice the difference — except that it suddenly works on inputs it used to fail on entirely. Prime Intellect — one of the leading AI research labs in the space — has already called RLMs a major research focus and described what comes next: teaching models to manage their own context through reinforcement learning, enabling agents to solve tasks spanning not hours, but weeks and months. The context window wars are over. MIT won them by walking away from the battlefield. Source: Zhang, Kraska, Khattab · MIT CSAIL · arXiv:2512.24601 Paper: arxiv.org/abs/2512.24601 GitHub: github.com/alexzhang13/rlm
Elias Al tweet media
English
147
447
2.2K
325.7K
Miguel Cardoso
Miguel Cardoso@ODordio·
@eddiejiao_obj @zan2434 @drewocarr Very cool demo! Everything is generated at real time? Similar to that anthropic experimental demo couple months ago? How does one steer what's "supposed" to appear?
English
1
0
2
235
𝖊𝖉𝖉𝖎𝖊 𝖏𝖎𝖆𝖔
What if your whole computer were just pixels streamed to you from a model? I’ve been working with @zan2434 and @drewocarr to imagine a version of generative computing that’s much more flexible and visually rich than the GUIs we have today. (Video is sped up and edited)
English
82
129
1.8K
209.4K
sunil pai
sunil pai@threepointone·
a strange change happening to the sdlc, driven by pi.dev extensions, and now in cloudflare because of dynamic workers + artifacts + think etc previously: check out code, make changes, deploy, repeat the new way: deploy first(!), and make changes in production, per user, repeat fully there. this is remarkable for the future for personal software. things that can build themselves.
English
16
16
251
29.1K
Miguel Cardoso retweetledi
Cloudflare
Cloudflare@Cloudflare·
We’ve just launched Artifacts: Git-compatible versioned storage built for agents. cfl.re/4cJqd1n
English
12
107
654
142.1K
Miguel Cardoso
Miguel Cardoso@ODordio·
Joined the team early this year and I’ve already had the chance to co-write a blog post! 😇 ​AI Search is evolving fast, and we’re just getting started. More to come! 🚀 blog.cloudflare.com/ai-search-agen…
English
0
0
0
14
Miguel Cardoso retweetledi
Brayden
Brayden@BraydenWilmoth·
Cloudflare dashboard can now complete tasks for you. - "Create a Worker and bind a new R2 bucket to it" - "Change my DNS records to 1.1.1.1" - "How many errors have happened this week" Not only do we tell you, but we show you with generative UI. PROTIP: Use full-screen mode.
English
53
70
1.2K
130.7K