Brandon Kirk Williams

456 posts

Brandon Kirk Williams banner
Brandon Kirk Williams

Brandon Kirk Williams

@BKwilliamscyb

Tech & Nat sec Policy Sr Fellow at CGSR @Livermore_Lab, China Fellow @TheWilsonCenter non-res @stanfordcisac, US & Cold War History PhD @UCBerkeley

Katılım Kasım 2020
323 Takip Edilen334 Takipçiler
Brandon Kirk Williams retweetledi
jietang
jietang@jietang·
Recent thoughts: The Shift to Long-Horizon Tasks The most likely breakthrough this year will be in long-horizon tasks. We are moving toward a stage where Large Language Models (LLMs) learn to complete extended, complex missions by interacting with Agent environments. This is perhaps where the true value of LLMs lies. Take cybersecurity as an example: imagine a model that continuously hunts for software bugs and vulnerabilities. While it sounds like a search process, it’s actually the model learning the high-level intuition and methodology of a professional hacker. Unlike humans, AI can run 24/7 without fatigue. It could potentially find exploits at a much higher frequwill ency and claim bounties on platforms like HackerOne or BugCrowd. It sounds fun, but fundamentally, it's a revolution that displaces the hacker. If even hackers are being "disrupted," one can only imagine the impact on general programmers. From One-Person to None-Person Companies Building on long-horizon capabilities, Autonomous Agent Systems (AAS) will inevitably become the next frontier. Last year, we were discussing the rise of the "One Person Company" (OPC). I didn't expect us to move so quickly toward the "None Person Company" (NPC). It’s an ironic twist—we might all end up as NPCs in this new ecosystem. Engineering the Impossible: Memory and Learning To realize the vision above, we must solve three technical pillars: Memory, Continual Learning, and Self-Judging. I used to think these would require massive paradigm shifts and years of research. However, the pressure from both the technical and application sides is so intense that we are seeing these capabilities emerge through ingenious engineering "tricks": Memory: Long context windows (1M+) and RAG have significantly bridged the gap. Continual Learning: While true continual learning remains difficult, the release cycles are shrinking. Global models are updated monthly; domestic models are catching up. If we reach weekly updates by next year, it will effectively function as continual learning. Self-Judging: This remains the most elusive, yet models like Opus 4.7 are already demonstrating early self-correction and judgment capabilities. The Self-Evolving Endgame The most difficult—and most promising—path is Self-Evolution. The current wave is incredibly fierce. I suspect that models like Claude may have already achieved a baseline for self-training: writing their own code, cleaning their own data, generating synthetic data, and then training on it. It might "waste" some compute, but it saves the most precious resources: human labor and time. In the LLM era, speed is everything. Rapid iteration is what creates the cognitive gap between leaders and followers. Claude’s rumored 2-million-chip cluster for next year is likely dedicated to exactly this: autonomous model self-training. Technical Summary: 1M Context: Necessary baseline. Memory & Continual Learning: Prerequisites, likely solved first via "tricky" engineering. Harnessing Environments: The breakthrough point. Self-Judging: The tipping point. Full Self-Training: The endgame. Redefining AGI and the Industry If this is the road to AGI, then AGI’s definition should be the sum of all human collective intelligence, not just an individual’s intelligence. It must possess the creative capacity to produce something as profound as the "Theory of Relativity"—meeting the bar set by Hassabis. During this transition, every APP will need to be reconstructed as AI-native. In fact, we might move past the concept of APPs entirely. The most significant challenge will be the reconstruction of the operating system itself. In the future, you won’t see a traditional desktop; you will see an LLM OS, where applications are "generated on demand." This challenges the 80-year-old Von Neumann architecture and represents a total upheaval of the computer science industry. The Irreversible Wave From completing long-horizon tasks to fully autonomous operations, every sector—Security, Finance, Law, E-commerce—will be reshaped. Many friends have reached out lately, asking how to transform their enterprises to keep pace with AI. But few truly realize that this irreversible process has already begun. As this massive technical wave hits, we must be prepared to act, but we must also start thinking seriously about how to regulate it.
English
39
146
723
186.9K
Brandon Kirk Williams retweetledi
dylan ツ
dylan ツ@demian_ai·
Inference got a hundred times cheaper this year. The compute bill went up anyway. If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now. I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill. 12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60. Today, an equivalent quality of output costs roughly $0.50. Price /token of o1-level intelligence has dropped about a 128x in a year. Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped. By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money. The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago. The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve. Here is what happened underneath. A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens. This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them. The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing. The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill. The implications stack quickly. If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists. If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them. If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session. Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have. The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters. The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous. A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins. Cheaper tokens. More tokens. Same coal as 1865.
dylan ツ tweet media
English
136
414
2.6K
632.5K
Brandon Kirk Williams retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
I spent some time trying to distill all the complex factors impacting open models -- economics, capabilities, distribution, policy, etc. -- into a clear list of beliefs. Here they are in full. 1. It’s surprising that the top closed models did not show a growing capability margin over open models, based on compute differences for training and research, especially in the second half of 2025 and through today.
English
17
83
913
233.1K
Brandon Kirk Williams retweetledi
Ryan Fedasiuk
Ryan Fedasiuk@RyanFedasiuk·
There are a few nuggets of truth in this @nytimes piece, but I’m afraid @scmallaby is 4 years late to the conversation—and he mostly seems to have swallowed the narrative fed to him by his interlocutors. The main thesis of this op/ed is that we should abandon any effort to slow China’s progress in AI in the name of coordinating on safety. The idea is a noble one—I’ve been writing about it for six years, and lived it for three. In fact I penned a similar op/ed similar back in 2021 after writing more than 30 reports for @CSETGeorgetown on what China was actually doing to build its AI industry: foreignpolicy.com/2022/04/27/us-… Unfortunately, what @scmallaby envisions is a fantasy that ignores years of history between the two countries and betrays a serious blindness to China’s national security decision making. Before writing a New York Times op/ed on the subject, I beg you to ask anyone who worked on the 2024 U.S.-China Geneva AI Safety talks how the conversation went. Not well. To be sure, even after years of playing on Charlie Brown’s football team, I still think it’s worth coordinating with China to get aspects of our domestic AI policies right. This technology is too important to risk shoving pell-mell into the hands of would-be bioterorrists and cybercriminals. But it would be a catastrophic mistake to willfully provide China the single resource it needs to make and serve even more powerful AI models—one that labs are *finally* and *dramatically* feeling the effects of in 2026—for the promise of “dialogue” with the Chinese government about AI safety, which we have already had. And certainly not after last week—now that today’s frontier AI systems can verifiably hold nations’ critical infrastructure at risk. I’ll have some writing coming soon on how the United States and China can better coordinate on AI safety where their interests genuinely align. But under NO circumstances should we allow China to condition such talks on relaxing our real, meaningful control over compute.
Ryan Fedasiuk tweet media
Sebastian Mallaby@scmallaby

Back in 2022, I supported the Biden chip-export controls on China. After a week with Chinese AI researchers and tech leaders, I've changed my mind. The controls are not working, and they obstruct another strategy that just might work. Models like Anthropic's Mythos show us how dangerous AI is becoming. Not governing AI is not an option. nytimes.com/2026/04/13/opi…

English
5
16
71
18.3K
Brandon Kirk Williams
Brandon Kirk Williams@BKwilliamscyb·
Three cheers @wmata and @wmataGM: just saw a gentleman with a metro pin on his lapel pick up to move mid way through the train to take a picture of a spill on the floor, then return to his seat.
English
2
0
16
2.8K
Brandon Kirk Williams retweetledi
Dean W. Ball
Dean W. Ball@deanwball·
When the dust settles, Mythos and the similarly capable models that will follow it will go down as major achievements in the history of cybersecurity. The hardening they will do to all important global software is a gift from American capitalism given freely to the world, at our great expense. And it is even possible, though far from certain, that we achieve this strengthening of global security with no major hiccups. Regardless, this is a gift born of ingenuity, cleverness, and raw industrial might. The Brussels regulators, who speak so passionately about cybersecurity, may therefore send their thank you cards to San Francisco rather than Washington.
English
16
31
446
17.4K
Brandon Kirk Williams retweetledi
Ryan Hass
Ryan Hass@ryanl_hass·
This piece is a useful reminder of the stresses that China’s economic slowdown is creating internally. These dynamics must be viewed alongside stories of China’s dominance of emerging tech sectors to form a comprehensive picture of China. Both narratives hold truth at same time.
Phelim Kine “老 康“@PhelimKine

“This has nothing to do with me. It’s like the Cultural Revolution or the ’90s mass layoff of the state-owned enterprises. It’s a historical cycle. It's just our turn. When the Titanic is sinking, all you can do is try to go down with some dignity.” nytimes.com/2026/03/21/bus…

English
10
37
187
43.9K
Brandon Kirk Williams retweetledi
Defense Analyses and Research Corporation
This is one of the most important national security stories of the day. That it will go largely unremarked upon by nearly every Serious Defense Thinker in Washington tells you everything you need to know about the quality of their forecasts of international affairs.
Defense Analyses and Research Corporation tweet media
METR@METR_Evals

We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.

English
12
28
321
54.6K
Brandon Kirk Williams retweetledi
Dean W. Ball
Dean W. Ball@deanwball·
Moltbook appears to have major security flaws, so a) you absolutely should not use it and b) this creates an incentive for better security in future multi-agent websims, or whatever it is we will end up calling the category of phenomena to which "Moltbook" belongs.
Samuel Hammond 🦉@hamandcheese

seems bad, though I'm grateful Moltbook and OpenClaw are raising awareness of AI's enormous security issues while the stakes are relatively low. Call it "iterative derployment"

English
12
5
79
10.7K
Brandon Kirk Williams retweetledi
martin_casado
martin_casado@martin_casado·
Well, not quite. I'd say 20-30% use open source. Of those I'd say 80% use Chinese based models. So closer to 16-24%.
English
29
73
490
167.8K
William Chou
William Chou@WillRevenge·
After a busy 2024 and 2025, I'm happy to announce that I'm now Senior Fellow and Deputy Director, Japan Chair at @HudsonInstitute Promotion doesn't come with more characters on X, unfortunately, so I'll continue to be creative about fitting as many ideas into one post.
English
4
0
11
355