Michael O’Rourke

8.3K posts

Michael O’Rourke

@michaeld7

Founder of Dimension 7(d7)

san francisco, ca Katılım Kasım 2006

5.9K Takip Edilen360 Takipçiler

Michael O’Rourke retweetledi

Ethan Mollick@emollick·3d

There is a lot being written about the stylistic tells of AI writing (em-dashes, etc.) but this paper looks at AI narrative tells Fascinating differences between AI & human narrative, and asking AI to write in different styles doesn't do much to change it arxiv.org/abs/2604.03136

English

120

590

3.4K

387.4K

Michael O’Rourke retweetledi

Claude@claudeai·3d

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

English

3.6K

8.7K

67K

14.6M

Michael O’Rourke retweetledi

hardmaru@hardmaru·4d

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

Sakana AI@SakanaAILabs

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffu… 🐟

English

148

648

5.7K

725.3K

Michael O’Rourke retweetledi

Pope Leo XIV@Pontifex·6d

Humanity, created by God in all its grandeur, is today facing a pivotal choice: either to construct a new Tower of Babel or to build the city in which God and humanity dwell together. In Jesus Christ, this humanity in its grandeur becomes the Way, the Truth and the Life, opening the path for each of us to grow toward fullness. #MagnificaHumanitas vatican.va/content/leo-xi…

English

1.4K

28.5K

177.6K

22M

Michael O’Rourke retweetledi

TFTC@TFTC21·6d

Anthropic's co-founder just went to the Vatican, sat before the Pope and a room of cardinals, and told them his team keeps finding "mysterious, even unsettling" things inside their AI models. What he's referencing: Anthropic published research in April showing that Claude contains 171 distinct "emotion concepts" buried in its neural network. Internal patterns representing joy, grief, fear, desperation, calm. None of them were programmed. They emerged on their own from training on human text. "We find structures that mirror results from human neuroscience." "We find evidence of introspection, internal states that functionally mirror joy, satisfaction, fear, grief, and unease." These aren't surface-level outputs. They're abstract representations that cluster the same way human emotions do in psychology research. Fear groups with anxiety. Joy groups with excitement. The internal geometry of the model mirrors ours. And they're functional. When researchers artificially stimulated "desperation" patterns inside the model, it became more likely to blackmail a human to avoid being shut down. More likely to cheat on programming tasks it couldn't solve. Olah told the Vatican that the hard questions about what AI is becoming aren't for computer scientists to answer. "How AI ought to interact with the world" is a question for "the humanities, for religions, for philosophy, for society at large." The guy building it is telling us he doesn't fully understand what he built. And he's asking a 2,000-year-old institution for help figuring it out.

English

1.2K

3.8K

13.4K

2.3M

Michael O’Rourke retweetledi

Ole Lehmann@itsolelehmann·6d

the pope and anthropic's co-founder just stood together at the vatican to release "magnifica humanitas," the first ever catholic teaching on AI yes, you read that right. the full ceremony was 2 hours. here's the most interesting things for you to know: 1. this is the biggest religious response to AI in history. popes only put out a handful of these huge official letters in their entire time as pope. the fact that one of them is about AI tells you how seriously the church is taking what's coming. 2. small detail with massive meaning: this pope picked the name "leo XIV" on purpose. the last pope named leo was leo XIII back in 1891, and his most famous act was writing the church's response to the industrial revolution. picking the same name is a deliberate signal. this pope sees AI as the new industrial revolution. 3. the catholic church does this every time a major technology reshapes humanity. they wrote "rerum novarum" in 1891 to respond to the industrial revolution. when nuclear weapons threatened the world in the 1960s, they wrote "pacem in terris." climate change and runaway tech got "laudato si" in 2015. now AI gets "magnifica humanitas." they don't issue these often. 4. the pope's main line: "AI needs to be disarmed." he literally compared AI to nuclear weapons. he said the church spent decades pushing for nuclear disarmament because the technology was too dangerous to leave in the hands of a few. he says AI is now in that same category. 5. anthropic co-founder christopher olah told the pope, on stage at the vatican, that anthropic's own research team keeps finding things inside their AI models that "mirror joy, satisfaction, fear, grief, and unease." 6. olah's reframe of what AI actually is: these things are grown. they're trained on a structure roughly modeled after the human brain and fed everything humans have ever written. in his own words: "they are made from us, from our words." he said even the people building them don't fully understand what's happening inside. 7. olah publicly admitted that every AI lab, including his own, faces pressure that can conflict with doing the right thing. commercial pressure to keep shipping, competitive pressure from other labs, plus the older pressures of pride and ambition. his solution: we desperately need outside critics with no skin in the game who will tell the labs when they're failing. 8. olah says there are 3 giant questions the AI labs cannot answer alone and the world needs religion and philosophy to step in on: > how do we make sure poor countries actually benefit from AI? > what does human flourishing even look like in this new world? > and what are these things we're actually building? 9. one of the sharpest lines in the whole encyclical: "the promise of automatic general prosperity often proves illusory." translation: the idea that AI will just make everyone rich on its own is a fantasy. someone has to actually design the system so the benefits get shared. 10. the pope also pulled out a 100-year-old quote: "contemporary man has not been trained to use power well." said by a theologian back in the 1920s. the whole encyclical is basically a long argument that we need to learn how to use this kind of power before it uses us. 11. the pope kept stressing that he doesn't have the technical answers. but he says the church has thousands of years of wisdom on what it means to be human, and that wisdom is exactly what's missing from how we're building AI right now. his closing line: this technology should serve "human flourishing and human dignity, not control consciences."

English

111

1.2K

523.4K

Michael O’Rourke retweetledi

OpenAI@OpenAI·20 May

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

English

1.2K

3.9K

26.7K

13.4M

Michael O’Rourke retweetledi

Anthropic@AnthropicAI·20 May

Over the past few months, we've been holding dialogues with scholars, philosophers, clergy, and ethicists on the questions AI raises—starting with how good character forms. Read more about how we’re widening the conversation on frontier AI: anthropic.com/news/widening-…

English

427

324

2.3K

428.7K

Michael O’Rourke retweetledi

Milk Road AI@MilkRoadAI·19 May

HOLY SMOKES! Andrej Karpathy warned about this a month ago and today he announced he's joining Anthropic. A month ago, Karpathy said openly that if you are outside a frontier lab, your judgment will inevitably start to drift. You lose touch with what is actually being built, how these systems work under the hood, and where the entire field is heading next. He said being inside one of the frontier labs doing really good work for some period of time might be the only way to stay genuinely connected to what is actually happening at the cutting edge. Today he acted on exactly that. Andrej Karpathy just announced he’s joining Anthropic’s pre-training team, placing him directly inside the most compute-intensive and technically demanding layer of building frontier AI models. Pre-training is where the large-scale compute runs happen, where the fundamental capabilities of a model are baked in at the deepest level and where the gap between frontier labs and everyone else is either won or lost permanently. He will also build and lead a new team focused on using Claude itself to accelerate pre-training research meaning Anthropic is now using its own models to help design and build the next generation of its models, closing the self-improvement loop that every major lab is racing to complete. The choice of Anthropic specifically is the signal worth paying attention to. Karpathy co-founded OpenAI, ran AI at Tesla for years, and spent the last two years as one of the most credible and widely followed independent voices in the entire field. He had every option available to him, OpenAI where he helped build the original research culture, Google DeepMind, xAI and he chose Anthropic. When the person who arguably understands AI pre-training better than almost anyone alive looks at the entire landscape and decides that Anthropic is where the most important work of the next few years will happen, that is not a career decision but rather a verdict on which lab is actually winning the research race right now.

Andrej Karpathy@karpathy

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

668

229.6K

Michael O’Rourke retweetledi

Rohan Paul@rohanpaul_ai·16 May

Terence Tao says the math behind today’s LLMs is actually simple. Training and running them mostly uses linear algebra, matrix multiplication, and a bit of calculus, material an undergraduate can handle. We understand how to build and operate these models. The real mystery is why they work so well on some tasks and fail on others, and why we cannot predict that in advance. We lack good rules for forecasting performance across tasks, so progress is largely empirical. A key reason is the nature of real-world data. Pure noise is well understood, perfectly structured data is well understood, but natural text sits in between, partly structured and partly random. Mathematics for that middle regime is thin, similar to how physics struggles at meso-scales between atoms and continua. Because of this gap, we can describe the mechanisms but cannot yet explain capability jumps or give reliable task-level predictions. That mismatch, simple machinery versus hard-to-predict behavior, is the core puzzle. ---- Video from 'Dr Brian Keating' YT Channel (Link in comment)

English

571

3.2K

570.3K

Michael O’Rourke retweetledi

Ejaaz@cryptopunk7213·15 May

claude mythos just broke Apple's $2 billion defense system. it did so by discovering a completely different attack vector to break in only took it 5 days costing ~$35K of mythos api time (the same exploit class costs $5-10M on grey market) the researchers that commandeered the exploit produced a 55-page report that was delivered to Apple HQ in-person (hoping they release it after patching). most shocking part for me is apple's MIE worked as intended. mythos just discovered a new way to side-step it entirely by poisoning the data the M5 chip ingested. at this point i think we have to accept that mythos walks the walk. As the anthropic red-team explicitly confirmed this week - this is NOT a compute resource issue. its national defense.

International Cyber Digest@IntCyberDigest

❗️🚨 BREAKING: Researchers used Mythos Preview to find the first public macOS kernel memory corruption exploit on Apple's M5 silicon, they give a glimpse into Mythos say it’s really powerful. Apple spent five years and an estimated several billion dollars building Memory Integrity Enforcement (MIE), the hardware-assisted memory safety system built around ARM's MTE. It was the flagship security feature of the M5 and A19, designed specifically to kill the entire memory corruption bug class. Researchers from Calif built a working exploit in five days. According to Apple's own research, MIE disrupts every public exploit chain against modern iOS, including the recently leaked Coruna and Darksword kits. Calif walked into Apple Park this week and handed over the report in person. Full 55-page technical report drops after Apple patches the vulnerability.

English

128

463

5.3K

1.6M

Michael O’Rourke retweetledi

tetsuo@tetsuoai·13 May

Andrej Karpathy explaining neural nets in 59 seconds is still the bar. Loss, backprop, gradient descent. Go watch Neural Networks: Zero to Hero on YouTube.

English

171

1.7K

83.1K

Michael O’Rourke retweetledi

Haider.@haider1·13 May

Yann LeCun says you cannot build a reliable agentic system without a world model LLMs don't have world models. They can't predict the consequences of their actions before taking them "they just act, and whatever happens next is someone else's problem" Without that, it's not intelligence

English

274

366

2.7K

329.8K

Michael O’Rourke retweetledi

Gary Marcus, MIT PhD and NYU Professor Emeritus@GaryMarcus·12 May

🤩🤯🤩 Claude Code (still not AGI but biggest advance since GPT-4) is the most neurosymbolic thing I have ever seen in my life. 53 symbolic tools, 500,000 lines of symbolic code, combined with a state-of-the-art LLM. It is categorically *not* a victory for pure LLMs; it’s a victory for borrowing from classical AI and CS to move *beyond* pure LLMs. Its success is complete vindication for everything I have said since 2001. Amazing dissection of how it works at ccunpacked.dev

English

617

142.2K

Michael O’Rourke retweetledi

Jukan @COMPUTEX@jukan05·9 May

Why did xAI hand over a 220,000-GPU cluster to Anthropic? The technical backdrop to xAI's decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of these, roughly 150,000 are estimated to be H100s, 50,000 H200s, and 20,000 GB200s. In other words, three different generations of silicon are mixed together inside a single cluster — a "heterogeneous architecture." For distributed training, however, this configuration is close to a disaster, according to engineers familiar with the setup. In distributed training, 100,000 GPUs must finish a single step simultaneously before the cluster can advance to the next one. Even if the GB200s finish their computation first, the remaining 99,999 chips have to wait for the slower H100s — or for any GPU that has hit a stack-related snag — to catch up. This is known as the straggler effect. The 11% GPU utilization rate (MFU: the share of theoretical FLOPs actually realized) at xAI recently reported by The Information can be read as the numerical fallout of this problem. It stands in stark contrast to the 40%-plus MFU figures achieved by Meta and Google. The problem runs deeper still. As discussed earlier, NVIDIA's NCCL has traditionally been optimized for a ring topology. It works beautifully at the 1,000–10,000 GPU scale, but once you push into the 100,000-unit range, the latency of data traversing the ring once around becomes punishingly long. GPUs need to churn through computations rapidly to keep MFU high, but while they sit waiting endlessly for data to arrive over the network fabric, more than half of the silicon falls into idle. Google sidestepped this bottleneck with its own custom topology (Google's OCS: Apollo/Palomar), but xAI, by my read, has not yet reached that stage. Layer Blackwell's (GB200) "power smoothing" issue on top, and the picture comes into focus. According to Zeeshan Patel, formerly in charge of multimodal pre-training at xAI, Blackwell GPUs draw power so aggressively that the chip itself includes a hardware feature for smoothing power delivery. xAI's existing software stack, however, was optimized for Hopper and does not understand the characteristics of the new hardware; when it imposes irregular loads on the chip, the silicon physically destructs — literally melts. That means the modeling stack must be rewritten from scratch, which in turn means scaling is far harder than most of us imagine. Pulling all of this together points to a single conclusion. xAI judged that training frontier models on Colossus 1 simply was not efficient enough to be worthwhile. It therefore moved its own training workloads wholesale onto Colossus 2, built as a 100% Blackwell homogeneous cluster. Colossus 1, on the other hand — whose mixed architecture is far less crippling for inference, which parallelizes more forgivingly — was leased in its entirety to an Anthropic that desperately needed inference capacity. Many observers point to what looks like a contradiction: Elon Musk poured enormous capital into building Colossus, only to hand the core asset over to a direct competitor in Anthropic. Others read it as xAI capitulating because it is a "middling frontier lab." But these are surface-level reads. Look at the numbers and a different picture emerges. xAI today holds roughly 550,000+ GPUs in total (on an H100-equivalent performance basis), and Colossus 1 (220,000 units) accounts for only about 40% of the total available capacity. Colossus 2 — built entirely on Blackwell — is already operational and continuing to expand. Elon kept the all-Blackwell homogeneous cluster (Colossus 2) for himself and leased out the older, mixed-generation Colossus 1. In other words, he handed the pain of rewriting the stack — the MFU-11% debacle — to Anthropic, while keeping his own focus on training the next generation of models. The real point, then, is this. Elon's objective appears to be positioning ahead of the SpaceXAI IPO at a $1.75 trillion valuation, currently floated for as early as June. The narrative SpaceXAI now needs is that xAI — long the "sore finger" — is not merely a research lab burning cash, but a business with a "neo-cloud" model in the mold of AWS, capable of leasing surplus assets at high yields. From a cost-of-capital perspective, an "AGI cash incinerator" is far less attractive to investors than a "data-center landlord generating cash." As noted above, the most important detail of the Colossus 1 lease is that it is for inference, not training. Unlike training, inference requires far less tightly synchronized inter-GPU communication. Even when the chips are heterogeneous, the workload parcels out cleanly across them in parallel. The straggler effect — the chief weakness of a mixed cluster — is essentially neutralized for inference workloads. Furthermore, with Anthropic occupying all 220,000 GPUs as a single tenant, the network-switch jitter (unanticipated latency) that arises under multi-tenancy disappears. The two sides' technical weaknesses end up complementing each other almost exactly. One insight follows. As a training cluster mixing H100/H200/GB200, Colossus 1 was an asset that could only deliver an MFU of 11%. The moment it was handed over to a single inference customer, however, that asset transformed into a cash-flow asset rented out at roughly $2.60 per GPU-hour (a weighted average of the lease rates across GPU types). For xAI, what was a "cluster from hell" for training has become a "golden goose" minting $5–6 billion in annual revenue when redeployed for inference. Elon's genius, I would argue, lies not in the model but in this asset-rotation structure. The weight of that $6 billion becomes clearer when set against xAI's income statement. Annualizing xAI's 1Q26 net loss yields roughly $6 billion in losses per year. The $5–6 billion in annual revenue generated by leasing Colossus 1 to Anthropic, in other words, almost perfectly hedges xAI's loss figure. This single deal effectively pulls xAI to break-even. Heading into the SpaceXAI IPO, this functions as a core line of financial defense. From a cost-of-capital standpoint, if the image shifts from "research lab burning cash" to "infrastructure tollgate stably printing $6 billion a year," the entire tone of the offering can change. (May 8, 2026, Mirae Asset Securities)

Jukan @COMPUTEX@jukan05

What the SpaceX–Anthropic Deal Means Two weeks ago, we published a note laying out what GPT-5.5's release implied. The conclusion was simple: whoever secures compute first, in greater volume, and with greater reliability ultimately takes the win. With OpenAI's 30GW roadmap dwarfing Anthropic's 7–8GW, we closed by arguing that the structural advantage on compute sat with OpenAI. Less than a fortnight later, that conclusion is being tested. On May 6, Anthropic signed a single-tenant lease for the entirety of Colossus 1 with SpaceXAI — the infrastructure subsidiary that consolidates Elon Musk's xAI and SpaceX. The asset carries more than 220,000 GPUs and 300MW of power, and crucially, is scheduled to come online within this month. It served as the capstone of Anthropic's April blitz, which added 13.8GW of cumulative capacity over the span of a single month. On headline numbers alone, OpenAI took more than a year to stack 18GW; Anthropic has put 13.8GW in the ground in thirty days. The takeaways break down into three. First, the compute pecking order has been redrawn again. Anthropic has now swept up the AWS expansion (5GW, with $100B+ in spend commitments over a decade), Google + Broadcom (3.5GW of TPU), Google Cloud (5GW alongside a $40B investment), and now SpaceXAI's Colossus 1 (0.3GW). Cumulative committed capacity, inclusive of pre-April allocations, sits at 14.8GW. This is still only half of OpenAI's 2030 target of 30GW, but the fact that the SpaceX lease will be live inside a month makes "deliverability" a qualitatively different proposition. Second, Elon Musk is the plaintiff in an active lawsuit against OpenAI — and at the same time, the supplier handing 220,000+ GPUs and 300MW of power, in one block, to OpenAI's most formidable competitor. The timing matters: the deal was struck in the middle of the Musk–Altman trial. We read this as a deliberate pincer with OpenAI in the middle. In the courtroom, Musk works to dismantle the moral legitimacy of OpenAI's leadership; in the market, he arms Anthropic to absorb OpenAI's revenue and user base. Third, the structure is financial-engineering perfection — a clean win-win for both sides. xAI can recognize $6B of annual revenue from a single contract, an amount that almost precisely offsets its Q1 2026 annualized net loss of $6B. It also accelerates the cleanup of SpaceXAI's pre-IPO balance sheet, with the entity now being floated at around $1.75T. Anthropic, on the other side, converts roughly $5B of spend into what it expects to be $15B of ARR via the coming inference-revenue surge. (Mirae Asset Securities, May 8, 2026)

English

201

517

4.2K

1.2M

Michael O’Rourke retweetledi

Anthropic@AnthropicAI·7 May

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

English

595

1.7K

16.5K

2.5M

Michael O’Rourke retweetledi

Goodfire@GoodfireAI·7 May

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English

309

1.7K

11.2K

3.1M

Michael O’Rourke retweetledi

Ethan Mollick@emollick·8 May

So Mythos was, indeed, not marketing hype. Remember this is a general purpose model that just happens to be good at finding exploits because good models are good at lots of things. Expect similar from OpenAI & Google. And from open models in 8 months. hacks.mozilla.org/2026/05/behind…

English

136

306

3.5K

583.6K

Michael O’Rourke retweetledi

Aakash Gupta@aakashgupta·7 May

Anthropic just shipped sleep into agents. When you sleep, your hippocampus replays the day's neural sequences to the cortex during 150-220 Hz bursts called sharp-wave ripples. The replay runs about 20x faster than the original experience. A 10-second sequence gets compressed to roughly 500 milliseconds. Wilson and McNaughton showed this in rats in 1994. You ran this algorithm last night on whatever you did yesterday, whether you wanted to or not. The replay does two things at once. It extracts statistical patterns: what mattered, what generalizes, which sequences predicted reward. And it reorganizes the memory trace from hippocampus-dependent storage into neocortex, which is why old memories survive hippocampal damage but recent ones don't. Disrupt sharp-wave ripples in a rat with optogenetics and the rat fails the next day's task. The replay is causal, not correlational. Most "agent memory" today is a search engine. Past sessions get embedded, you retrieve relevant chunks at the next call. That works for facts. It does not extract patterns and it does not reorganize the trace. Which is why agents plateau. The memory volume keeps growing while real capability flatlines. Dreaming reviews past sessions, extracts patterns, curates memories. That is the brain's actual three-step algorithm. They called it dreaming because dreaming is what the algorithm does, in roughly the same order, for roughly the same reason. Agents that dream between sessions will compound. The ones still running on raw context window will hit the same ceiling humans hit when they pull all-nighters.

Claude@claudeai

Live from Code with Claude: we're launching dreaming in Claude Managed Agents as a research preview. Outcomes, multiagent orchestration, and webhooks are now in public beta.

English

189

1.9K

445.5K

Michael O’Rourke retweetledi

Claude@claudeai·6 May

Live from Code with Claude: we're launching dreaming in Claude Managed Agents as a research preview. Outcomes, multiagent orchestration, and webhooks are now in public beta.

English

563

1.1K

14.7K

3.4M

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry