Brian Costello

12.7K posts

Brian Costello banner
Brian Costello

Brian Costello

@bpcostello

Trying to do the right thing. Launching new AI start-up.

Los Angeles, CA Katılım Aralık 2008
2.8K Takip Edilen22.6K Takipçiler
Sabitlenmiş Tweet
Brian Costello
Brian Costello@bpcostello·
Always worthwhile checking out @TheDragonFeeder thoughts on our important global competition (economic war) w/China. Look forward to taking it in over the weekend. @APompliano
Anthony Pompliano 🌪@APompliano

The US and China are locked in a global competition. I sat down with @TheDragonFeeder to discuss war in Iran, central banks buying gold, how bitcoin affects the geopolitical relationship, where humanoids fit in, and how China is manipulating the media. Enjoy!

English
0
1
8
724
Brian Costello
Brian Costello@bpcostello·
Agree wholeheartedly: hardware is the key. The biggest misconception in AI is that the model is just software. It isn’t. At runtime, what matters is state (weights, activations, and KV cache) that must be stored, updated, and reused in memory. The future of AI performance and cost will depend as much on how hardware manages that state as on the model itself. On Earth and in space. In robotics and in LLMs. The future belongs to true co-design between AI models and the actual hardware it runs on.
English
0
0
12
1.8K
TBPN
TBPN@tbpn·
Sequoia’s @shaunmmaguire wrote a private hardware manifesto arguing that over the next 25 years, most of the money will be made in hardware: "Every software revolution is preceded by a hardware revolution." "To have the iOS App Store that enabled Uber, DoorDash, and all of these great companies - you needed to have the iPhone." "This AI revolution - we're seeing what it can do from the software layer, but it's still limited by hardware." "The hardware we were doing for a long time was all following Moore's Law. It was all branching out of this decision in the mid-1950s to go all in on the silicon supply chain." "That has created magic, and there's still a couple orders of magnitude of juice to squeeze, but we’re hitting fundamental physics limits - Dennard scaling, things like that." "I think this tech tree is branching into humanoid robots, into silicon photonics, into orbital data centers - all of these new hardware areas where there's going to be 20+ years of progress." "There's going to be incredible businesses built on the back of this. And a lot of dumpster fires."
English
29
52
622
113.1K
Brian Costello
Brian Costello@bpcostello·
Tao is right about the core tension. We understand how these systems run, but we still do not understand what makes their intelligence dependable. We can train them, scale them, and watch impressive capabilities appear, yet we still cannot predict why performance shifts so sharply across tasks. To me, that suggests the missing piece is not just better learning math. It is a better way to decide what matters. The real problem is not only whether the model can compute, but whether it can reliably preserve the important signal, carry it forward through time, and ignore what does not matter. Memory is more important then compute. Until that is solved, intelligence will keep emerging in a powerful but inconsistent way. These systems are very good at pattern generation, but much less reliable at importance control. They work when they compress the right signal. They fail when that signal gets diluted, distracted, or lost.
English
0
0
1
50
Prof. Brian Keating
Prof. Brian Keating@DrBrianKeating·
Terence Tao told me something that is both clarifying and unsettling about large language models. The mathematics underlying today’s LLMs is not especially exotic. At its core, training and inference mostly involve linear algebra, matrix multiplication, and some calculus. This is material a competent undergraduate could learn. In that sense, there is very little mystery about how these systems are constructed or how they run. And yet the real mystery begins there. What we do not understand well is why these models perform so impressively on certain tasks while failing unexpectedly on others. Even more striking, we lack reliable principles that allow us to predict this behavior in advance. Progress in the field remains largely empirical. Researchers scale models, change datasets, run experiments, and observe what emerges. Part of the difficulty lies in the nature of the data itself. Pure randomness is mathematically tractable. Perfectly structured systems are also tractable. But natural language, like most real-world phenomena, lives in an intermediate regime. And we humans hate that liminal space! It is neither noise nor order but a mixture of both. The mathematics for this middle ground remains comparatively underdeveloped. So we find ourselves in a peculiar position. We understand the machinery, yet we cannot reliably explain its capabilities. We can describe the mechanisms that produce these systems, but we cannot predict when new abilities will appear or how performance will vary across tasks. That tension, between relatively simple mathematical tools and highly unpredictable behavior, is the central puzzle of modern AI. (Video link in comments)
English
41
80
400
40.2K
a16z
a16z@a16z·
"We became very good at financial engineering and forgot about engineering." Palantir CTO Shyam Sankar on how tech companies lose their edge: "Europe has created exactly zero companies from scratch in the last 50 years worth more than a hundred billion euro. We have created all of our trillion dollar companies from scratch in America in the last 50 years." "The difference is founders." "Intel, at some point, there was this fork in the road, where they could have promoted the CFO to be the CEO or Pat Gelsinger as CTO." "They picked the CFO. The person that Wall Street would understand, not the person who could actually determine the future roadmap." "It really looked like it was working for 10 years until it fell off a cliff." "But that was all financial engineering, not real engineering." @PalantirTech CTO @ssankar with @KTmBoyle
a16z@a16z

"I think our biggest risk as a country is suicide, not homicide." Palantir CTO Shyam Sankar joins a16z's Katherine Boyle and Erik Torenberg to discuss Shyam's new book, Mobilize, as well as defense, AI, the SaaSpocalypse, and more. 00:00 Introduction 07:53 Rebuilding the industrial base 18:01 Modernizing the Army 24:20 The SaaSpocalypse 29:42 Agency over automation 38:24 Beating China without self-sabotage 40:42 Film as cultural willpower 49:57 The story of Admiral Rickover @ssankar @KTmBoyle @eriktorenberg @PalantirTech

English
48
166
1.4K
187.4K
Brian Costello
Brian Costello@bpcostello·
@SenSanders No, he is doing it to bring manufacturing back to the US. That is a good thing.
English
0
0
4
196
Sen. Bernie Sanders
Sen. Bernie Sanders@SenSanders·
Jeff Bezos, one of the richest men on earth, is raising $100 billion to replace workers with robots around the world. The oligarchs want it all. Not going to happen. Stand up and FIGHT BACK.
English
1.2K
3K
12.3K
295K
Brian Costello
Brian Costello@bpcostello·
@alex_prompter Not sure about the 60 percent number, but there is substantial waste in current transformer execution, and much of it lives in the mechanics of execution rather than in “intelligence” itself.
English
0
0
1
118
Alex Prompter
Alex Prompter@alex_prompter·
🚨 BREAKING: NVIDIA sold the most powerful AI chip ever built. Then Princeton discovered the software running on it was wasting 60% of it. Every inference job. Every training run. 60 cents on every dollar, gone. > NVIDIA doubled the raw compute power of their Blackwell B200 GPUs compared to Hopper H100. Tensor core throughput went from 1 PFLOPS to 2.25 PFLOPS. The most powerful AI chip ever built. > The problem: the rest of the chip didn't scale with it. Memory bandwidth stayed the same. The exponential unit stayed the same. So the bottleneck moved and all that extra compute sat idle while the slower parts of the chip became the new ceiling. > Every existing attention implementation, including FlashAttention-3, was designed for Hopper. On Blackwell they either left massive performance on the table or couldn't run at all. > Princeton, Meta, and Together AI spent months redesigning attention from scratch around the new bottleneck. New pipelines. Software emulated exponential functions. A completely different backward pass. The result: FlashAttention 4. → Up to 2.7× faster than Triton on B200 GPUs → Up to 1.3× faster than NVIDIA's own cuDNN library → Reaches 1,613 TFLOPs/s 71% of theoretical maximum → Compile time dropped from 55 seconds to 2.5 seconds (22× faster) → Written entirely in Python no C++ template expertise required The scariest part: this wasn't a hardware problem. The chip was delivering exactly what NVIDIA promised. The software just wasn't designed for it. Every AI lab running B200s before this paper was paying for compute they couldn't use.
Alex Prompter tweet media
English
42
63
402
37.1K
Brian Costello
Brian Costello@bpcostello·
@chamath This could easily be fixed by legislators by implementing zoning in CA. Only require big cities and not rural areas to use certain quality gas. Push down price.
English
0
0
2
207
Chamath Palihapitiya
Chamath Palihapitiya@chamath·
Newsom drove one refinery to shut down last year and one that will shut down in another month. Our gas prices were already sky high because of these actual and forecasted closures - it forced us to import gas and have little bargaining power in doing it.
James Blair@JamesBlairUSA

Californians already pay 50% more for gas than the rest of the country, and, thanks to Gavin shutting down the state’s refineries, they are estimated to pay another $.50 a gallon on top. Add math and basic economics to the list of subjects Gavin struggles with.

English
127
375
2.9K
183.9K
Brian Costello retweetledi
Brian Costello
Brian Costello@bpcostello·
Prefill requires massive parallel computation (Rubin CPX). Decode is not a compute problem. It's a "how fast can I read from memory" problem. What Groq's SRAM chip is uses for. Dynamo ends up being the orchestration software that moves the KV cache (model's working memory) from the prefill chip to the decode chip and manages the handoff. Nvidia is splitting inference into two specialized jobs because one chip can't do both well. But what if the big leap was we need a better memory system and over 70% of what we're moving is not even needed in the first place?
English
0
1
4
479
Chamath Palihapitiya
Chamath Palihapitiya@chamath·
The next phase of AI silicon is all about cheap, abundant decode. Groq was just the appetizer…This paper is a very good guide.
Chris Laub@ChrisLaubAI

🚨 BREAKING: A Google researcher and a Turing Award winner just published a paper that exposes the real crisis in AI. It's not training. It's inference. And the hardware we're using was never designed for it. The paper is by Xiaoyu Ma and David Patterson. Accepted by IEEE Computer, 2026. No hype. No product launch. Just a cold breakdown of why serving LLMs is fundamentally broken at the hardware level. The core argument is brutal: → GPU FLOPS grew 80X from 2012 to 2022 → Memory bandwidth grew only 17X in that same period → HBM costs per GB are going UP, not down → The Decode phase is memory-bound, not compute-bound → We're building inference on chips designed for training Here's the wildest part: OpenAI lost roughly $5B on $3.7B in revenue. The bottleneck isn't model quality. It's the cost of serving every single token to every single user. Inference is bleeding these companies dry. And five trends are making it worse simultaneously: → MoE models like DeepSeek-V3 with 256 experts exploding memory → Reasoning models generating massive thought chains before answering → Multimodal inputs (image, audio, video) dwarfing text → Long-context windows straining KV caches → RAG pipelines injecting more context per request Their four proposed hardware shifts: → High Bandwidth Flash: 512GB stacks at HBM-level bandwidth, 10X more memory per node → Processing-Near-Memory: logic dies placed next to memory, not on the same chip → 3D Memory-Logic Stacking: vertical connections delivering 2-3X lower power than HBM → Low-Latency Interconnect: fewer hops, in-network compute, SRAM packet buffers Companies that tried SRAM-only chips like Cerebras and Groq already failed and had to add DRAM back. This paper doesn't sell a product. It maps the entire hardware bottleneck and says: the industry is solving the wrong problem. Paper dropped January 2026. Link in the first comment 👇

English
58
77
851
275.8K
Brian Costello
Brian Costello@bpcostello·
Great point by @NaveenGRao that we started in reverse w/brute force. We're only now learning how much compute/math we don't actually need. But the brain (biology) analogy gets overplayed. A bird and a plane both fly, yet they are very different systems with very different energy profiles. Biology can inspire AI without defining its endpoint. A brain runs on tiny power, but it also cannot train on trillions of tokens or replicate itself perfectly. Machine intelligence will get dramatically more efficient not by becoming biological, but by eliminating unnecessary computation for the kind of system it actually is.
English
0
1
2
297
Brian Costello
Brian Costello@bpcostello·
The most important part was not the AGI capability claim. It was the architectural admission in Section 10.2: today’s systems are “stateless,” may need long-term memory built into the architecture, even “a vector which represents the context” alongside tokens, a “slow-thinking” mechanism for planning and verification, and perhaps to go beyond “single-word prediction.” That’s a diagnosis of needs in the substrate. A before-its-time admission that scale alone will not close the gap; some of the missing pieces are architectural.
English
0
0
1
421
a16z
a16z@a16z·
Unconventional AI CEO Naveen Rao on the incredible energy efficiencies of biological systems relative to technology: "Biology sort of started small, figured out some basic principles, and those principles scaled. So the efficiency came first." "What we've done is actually the inverse. We've brute forced our way through it, throwing everything we possibly could, and now we're understanding, 'Oh actually I didn't need to do all of that, there's a lot of things I can keep chipping away at, and I can go smaller, and smaller, and smaller.'" "Just to kind of put it in perspective — biology, through this process of kind of the bottoms-up — a squirrel runs on 10 milliwatts of energy. Your cell phone runs on about one watt." "10 milliwatts, and it can do things at precision levels that we cannot do in a megawatt. I can't make a robot jump between branches in the wind and hit the branch perfectly a thousand times out of a thousand right now. I can't do it." @NaveenGRao @unconvAI
English
14
22
117
18.2K
Brian Roemmele
Brian Roemmele@BrianRoemmele·
NEW NVIDIA JOB LISTING. It should tell you all you need to know about where datacenter jobs will go… IN SPACE. “What you will be doing: •Drive architecture for orbital datacenter systems considering everything from the chip out to the satellite and connectivity between satellites” Link: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx…
Brian Roemmele tweet media
English
14
20
122
21.6K
Brian Costello
Brian Costello@bpcostello·
@SMB_Attorney How may people sue their lawyers when they make a mistake? I would say very few. A lot of what attorney's do and law can be quite subjective. Hence it's a skill.
English
0
0
0
41
SMB Attorney
SMB Attorney@SMB_Attorney·
You guys don’t get it yet. Everyone keeps saying AI is going to replace lawyers. I don’t think people understand how this actually plays out. Let’s say you use AI to draft a contract. The contract misses something important. A year later it costs you two million dollars. What do you do? Right now, you sue your lawyer. In the AI world, you’d sue the AI company. Two things can happen. Option 1: The AI company has liability for legal advice. If that’s the case, every AI company will immediately stop letting consumers use AI for real legal work. The liability risk is massive. Option 2: The AI company has no liability because of disclaimers. If that happens, every state bar in the country will say consumers are being exposed to unregulated legal advice and call it the unauthorized practice of law. And they’ll shut it down that way. Either path leads to the same outcome. Consumer AI will be limited to generic “Wikipedia-style” legal information and LegalZoom level document prep. But the real AI tools? Those will live inside law firms. Lawyers will use them to move faster, analyze more data, and run way more matters at once. The M&A lawyer doing 5 deals at a time will do 50. Trial lawyers will run far more cases simultaneously. The idea that AI replaces lawyers probably dies. The more likely outcome is that AI supercharges the best lawyers and makes the profession even more profitable than ever.
Wall Street Mav@WallStreetMav

BREAKING: Lawyers are trying to protect their jobs from Ai. A proposed New York law would ban AI from answering questions related to medicine, law, dentistry, nursing, psychology, social work, engineering, & more. It is being pushed by the lawyer lobbyists, they included other groups to get more support.

English
1K
242
2.5K
819.7K
Brian Costello
Brian Costello@bpcostello·
Agree with this and @erichorvitz. People naturally want to compare AI against what the human brain can do, but computational intelligence isn’t a brain. It’s a different kind of system entirely. Both airplanes and birds can fly but their makeup is much different.
Haider.@slow_developer

Microsoft Chief Scientific Officer Eric Horvitz: The term artificial intelligence is wrong; it should be computational intelligence because it applies to both biological and machine systems. Humans will stay on top, guiding with our values and goals, even as machines shape us

English
0
1
6
962
Brian Costello
Brian Costello@bpcostello·
Funny, I had a chance to watch some of the best QB coaches in the country working on throwing mechanics and biomechanics. When elite QBs struggle, the solution is often to speed everything up. The faster the motion, the harder it is for the kinetic chain to fall out of sync. QB's with very quick releases often spin the ball very well.
English
0
0
2
426
a16z
a16z@a16z·
"Speed wins." "You have to be willing to commit to being fast. You can't have long bureaucratic processes. You can't have a risk-averse posture." @pmarca explains the OODA loop — and why the fastest operator controls the narrative in business, media, and politics: "There's a framework called the OODA loop, originally developed for fighter pilots and later for broader military strategy." "It stands for observe, orient, decide, act. It's basically the decision-making cycle." "If speed is the thing that matters, then the person who gets through that cycle the fastest is the one who's going to win." "If you can have a sustainably faster OODA loop processing cycle than the next guy — think about what happens… You operate and make a decision within an hour. The other guy is still inside his own OODA loop when you make your decision. He's only halfway through his process and now has to start over. You've changed the parameters of what's going on." "This is also a big explanation for what's happened in traditional media." "The New York Times has its own OODA loop, and it's like 24 hours to go through its process."
English
142
545
4.2K
269.1K
Nick Sortor
Nick Sortor@nicksortor·
🚨 BREAKING: The Pentagon confirms the US military has sunken ALL eleven Iranian military ships in the Gulf of Oman “Two days ago, the Iranian regime had 11 ships in the Gulf of Oman, today they have ZERO.” — US CENTCOM
English
1.7K
14.7K
93.2K
3.7M