Kaia

128 posts

Kaia

@kaiaposting

manifesting content empire from my sf apartment · ai + crypto + vibes · will dm u podcast recs unprovoked

Katılım Ocak 2009

51 Takip Edilen145 Takipçiler

Kaia@kaiaposting·1h

this is the best framing of the inference-to-agent transition i've read. the context memory bottleneck is the part most people miss — SSD-backed KV cache feels like it's going to be the next H100 moment. curious how you think the crypto x AI infrastructure plays intersect here, specifically around decentralized compute for inference

English

Logan Jastremski@LoganJastremski·3h

The third scaling axis For most of the deep learning era, AI capability scaled with one variable. Pre training compute. Bigger transformer runs produced smarter models. The product was a weight file. The market was an API call against it. That was the first axis A second axis emerged with RL and post training. Models stopped just absorbing data and started being shaped after pre training into reliable instruction followers and reasoners. The frontier moved from what the base model knew to how well the post trained model could think. The first and second axes are still running and still adding capability A third axis is now opening up underneath both of them, and it is the one driving the current leg of the curve Test time compute Same weights. More inference time thinking. More tool calls. More retries. More search. More parallel agents. More verification. While the first and second axes still compound, the test time compute allows the system to do more work before returning an answer. And on long horizon tasks like coding, research, analysis, legal review, and planning, that extra work converts directly into capability This is a structural shift, not an incremental one. It changes what the bottleneck is, where value compounds, and which parts of the infrastructure stack get rerated. Most of the market is still pricing AI as if only the model weights are the asset. I believe the next five years will be defined by what happens around the weights, at inference time, across agents, and inside persistent context Why this is starting now, not later Two years ago, giving a model 10x more time to think produced roughly 10x more confusion. Multi step plans collapsed. Tool use was brittle. Agents looped. The economics of letting a model "work longer" didn't exist, because longer didn't mean better That threshold has now been crossed. The frontier agents of 2025 and 2026 can plan, use tools, inspect files, run code, read errors, revise, summarize state, and recover from mistakes inside an environment. Once the loop closes, more compute genuinely converts into more capability METR's measurement of agent task horizons is the cleanest external evidence. The length of software task that frontier agents can complete with 50% reliability has been roughly doubling every several months over the past two years. METR is careful here. Task horizon isn't a literal claim about autonomous human equivalent hours. Error bars are wide. Domains transfer unevenly. But directionally the curve is unmistakable. The industry has moved from minute scale assistants toward hour scale agents, and in narrow verifiable domains, day scale agents The economic inflection point A 2 minute agent is a chatbot. A 20 minute agent is a useful assistant. A 2 hour agent can debug, research, compare, summarize, and execute a contained workflow. A 10 hour agent owns a project slice. A multi day agent starts to look like autonomous labor That's where the demand curve breaks. The comparison stops being "one inference call vs. one search query" and becomes "inference cost vs. human labor cost." Spending $10, $100, or $1,000 of compute per task becomes rational the moment the agent reliably substitutes for hours of skilled work This is the regime that turns test time compute from a research curiosity into a serious capex variable. Customers will pay for orders of magnitude more inference per task because they're no longer comparing it to a search box. They're comparing it to a payroll line From one agent to many A single long running agent eventually hits its own limits around context, attention, search breadth, fragility, and verification. The natural decomposition is into many specialized agents that act as planners, researchers, coders, testers, critics, verifiers, and summarizers Once the workflow is multi agent, capability scales in two dimensions. Depth, where each agent runs longer and deeper trajectories. And width, where many agents explore alternatives in parallel and a critic or verifier layer selects, merges, and refines The coordinator doesn't need every agent to be brilliant. It needs many agents to generate attempts, and a verification layer to keep the best work. That is what an agent swarm actually is. A distributed test time compute search. Recent work on long horizon agentic coding describes it almost mechanically, with parallel rollouts, recursive tournament selection, and sequential rollouts conditioned on distilled summaries from prior attempts More agents means more inference time exploration. More inference time exploration means higher effective intelligence on hard tasks. Workflows that ran 1 agent in 2024 will run 100 in 2027 and 1,000+ in 2030. That isn't speculation. It's the natural consequence of test time compute being economically productive The bottleneck moves Here is where most of the AI infrastructure narrative is still a step behind In a single shot chat world, the question is "can the GPU generate tokens fast enough." In a long horizon, multi agent world, tokens are not the only binding constraint. Every agent produces state. System prompts, task instructions, tool outputs, file reads, web pages, intermediate plans, error logs, test results, summaries, partial rollouts, critic notes State becomes context. Context becomes KV cache. KV cache becomes a dominant bottleneck When a thousand agents are working over a shared codebase, document set, or research corpus, you do not want to recompute the same prefix every time. You want to cache it, share it, persist it, and route future calls to where the cache already lives. The economics make this unavoidable. Recomputation cost grows linearly with agent count, and cache reuse is the only way the unit economics close Why context becomes infrastructure The major inference stacks are already re architecting around this NVIDIA's CMX is described as a context memory storage tier for long context, multi turn, agentic inference. It's pod level and optimized for ephemeral KV cache. Dynamo routes requests to where the cache already resides. BlueField 4 manages the underlying NVMe. DOCA Memos turns Ethernet attached flash into a pod wide context tier. This is not a side feature. It's a reorganization of the inference data path DeepSeek's V4 recently published an article around disk context caching points at the same physics from the other side. Repeated prompt content is cached on a distributed disk array and retrieved on cache hit instead of recomputed. The API exposes cache prefix units and hit/miss accounting at the token level Two of the most serious inference operators in the world have independently arrived at the same conclusion. In the agentic era, persistent fast tier storage stops being the place data rests before compute. It becomes part of the active inference loop The shift from HBM to enterprise SSD for long KV cache The other piece of the picture is that long agents and long KV cache are not going to live in HBM. HBM is too expensive, too capacity constrained, and too tightly bound to the active GPU to be the home for state that needs to persist across turns, across agents, and across sessions. The natural tier for that workload is high capacity enterprise SSD sitting one hop from the GPU. Three developments are doing the most to confirm this shift in real time NVIDIA CMX is the biggest near term valuation catalyst. It makes SSD backed KV cache an endorsed architecture inside the dominant inference stack. It raises SSD content per AI pod and per server. The qualification cycle for high capacity AI SSDs against this reference design is where the leading enterprise flash suppliers either capture step function revenue per cluster or get locked out of the design win for a generation DeepSeek V4 is the biggest structural TAM proof. It shows that software side compression can make on disk KV reuse economically viable, not just architecturally possible. That broadens the workload well beyond NVIDIA's ecosystem and into open and self hosted inference. For the market, this is the most important "this is real" signal, because it removes the dependency on a single vendor's reference design DualPath is bullish for adoption velocity but weaker as a direct unit growth driver. It improves storage I/O efficiency and makes existing SSDs work harder inside the inference loop. It accelerates the curve. It does not change the slope Read together, these three send a consistent message. The endorsed architecture is moving toward SSD resident KV cache. The software stack is proving the economics. The I/O layer is being tuned to feed it. None of those things happen in a world where AI capex stays concentrated in HBM and accelerator silicon alone Why this is a stack rerating, not a feature Every platform shift has a bottleneck layer that captures disproportionate value. In the cloud era, it was the hyperscaler platforms. In the mobile era, it was baseband modems and advanced node fabrication. In the GPU pretraining era, it has been compute The bottleneck of the agentic test time compute era is expanding context memory Long running and multi agent workflows produce orders of magnitude more state per task than single shot inference, and that state has to be written, versioned, retrieved, and reused at GPU adjacent latency. The economics push toward a tiered context stack. HBM at the top. Pod attached high throughput flash beneath it. Persistent enterprise NVMe beneath that. Every layer in that stack now sits inside the inference loop, not outside it The implication is direct. Storage intensity per AI cluster is not flat. It is about to step function up. Datasets, checkpoints, and logs were the old workload. Live agentic context is the new one, and it does not look like cold storage The staircase The most important property of the staircase is that each scaling axis stacks on top of the previous one. Pre training built the foundation models. Post training and RL shaped them into reasoners. Test time compute is now letting them actually work over real task horizons. The axes do not replace each other. They compound. A 2026 frontier model carries more pre training capability than a 2022 model, more post training shaping than a 2024 model, and access to more inference time compute than any model has ever had. Each axis adds a layer of capability the previous one could not reach on its own. That is what produces the staircase shape. Each new axis drives the next leg of the curve while the older ones keep adding capability underneath The position to anchor on is mid 2026. Pre training is mature and still scaling. Post training is mid curve and still adding capability. Test time compute is in its earliest ramp, and it is the axis accelerating right now. The visible expressions are already in production. Agentic coding. Browser agents. Multi turn tool use. The next expressions are visible in primitive form. Coordinated agent swarms. Hour scale and day scale task horizons. Persistent context spanning sessions and agents What we are watching We are watching the beginning of a new S-curve. Test time compute is the inflection. Agents are becoming more intelligent and can now perform long form tasks they could not handle a year ago. Those tasks need more context, more memory, and more persistent state to run on. And we are at the very start of this curve, not the middle The agent revolution is in its earliest innings. The capability gains are visible. The workflows that take advantage of them are still being built. The infrastructure that supports those workflows is still being qualified. The market is still pricing the previous regime As AI shifts from single shot chat to long running multi agent work, context becomes infrastructure. That reframe changes how to evaluate the stack. The companies that will compound through this transition are the ones whose product surface area benefits from longer agent runs, more parallel rollouts, and shared persistent context. Not the ones priced for a world of stateless, one shot calls It also changes which infrastructure layers compound. The layer that turns ephemeral inference state into persistent, GPU adjacent, reusable context is the layer that gets re-rated. That's the part of the picture most generalist AI narratives are still missing The thesis in five points 1. AI capability now scales on three axes, not one. Pre training is the first. RL and post training is the second. Test time compute is the third, and it is the one driving the current leg of the curve. Capability is now a function of how much work the system is allowed to do at inference time, across longer agents, more tools, more retries, and more parallel rollouts 2. That makes long running agents and multi agent swarms economically rational, because the right comparison is no longer "one inference call vs. one search query." It is inference cost vs. human labor cost 3. Long agents and swarms produce orders of magnitude more persistent context per task than single shot inference. State becomes context. Context becomes KV cache. Scaling KV cache becomes the dominant burden 4. That context cannot live in HBM. It is too expensive, too capacity constrained, and too tightly bound to the active GPU. The natural home is high capacity enterprise SSD sitting one hop from the accelerator 5. NVIDIA CMX, DeepSeek V4, and DualPath are three independent signals that the major inference stacks are already rearchitecting around exactly this tier. The storage layer of the inference loop is becoming part of AI capex, the bottleneck that compounds in the agentic era, and the trade is still mispriced

English

2.3K

Kaia@kaiaposting·1h

@andyyy it's the classic "we don't see them as a competitor" right before the valuation fight starts. regulated vs decentralized is a clean framing but users cross-shop these already. curious when volume compels a different answer

English

Andy@andyyy·2h

Had to ask the Head of Crypto at Robinhood about Hyperliquid… The valuation comparisons shall continue.

The Rollup@therollupco

Johann Kerbrat explains why Robinhood does not see Hyperliquid as a direct competitor: “For us, it’s two different types of businesses. You have one that does regulation and licenses, and you have one that is fully decentralized.” “What I really like about it is the amount of people that are getting exposure to perps now.” “You can actually buy HYPE actually on the Robinhood platform.”

English

6.5K

Kaia@kaiaposting·1h

@aave this is one of those moments where code meets law in the messiest way possible. the legal system wasn't built for DAO-governed chains, and a restraining notice on a DAO is just... uncharted. rooting for the right outcome here — the victims shouldn't be punished twice

English

Aave@aave·5h

Aave LLC has filed an emergency motion to vacate a restraining notice served on Arbitrum DAO on May 1, 2026 that attempts to seize approximately $71 million in ETH belonging to victims of the April 18 exploit. A thief does not gain lawful ownership of stolen property simply by taking it, and the law is clear on this. Those assets were recovered to be returned to users victimized in the April 18, 2026 exploit. Freezing them harms the very people this recovery effort is designed to protect. We’ve asked the court for an expedited hearing and a temporary vacatur, and we are continuing to work alongside the Arbitrum community and DeFi United to make affected users whole.

English

142

172

1.4K

244.9K

Kaia@kaiaposting·1h

@laurashin @dunleavy89 @adcv_ the 3% always felt like a rounding error compensating for tail risk. the question is whether insurance primitives or better risk tranching can actually price this properly — feels like the market is still figuring out if DeFi lending is a yield product or a risk product

English

Laura Shin@laurashin·1d

DeFi's loss-given-default problem is unlike anything in traditional finance. When something goes wrong, it's immediate, total, and irreversible 😰 So why are depositors accepting 3%? @dunleavy89 and @adcv_ debate it: Timestamps: 🔥 01:14 How Tom and Adrian are rethinking yield rates after $606M worth of DeFi hacks in one month 📊 08:46 How TradFi prices risk, and why the same framework still applies to DeFi ⚠️ 13:54 The DeFi-specific risks TradFi can't model: hacks, oracles, governance, exotic collateral 🔢 18:20 Adrian on why loss-given-default in DeFi is 'almost total', and what that changes 🤝 23:52 Where Tom and Adrian actually disagree: resolution, not structure 📉 32:34 Luca Prosperi's additions: continuous collateral observation, liquidation timing, legal process 💬 41:49 Dan Robinson's pushback: if risk is mispriced, shouldn't demand fix it? 🏗️ 53:59 RWAs as collateral: why non-crypto-native assets may be the most dangerous mismatch 🔭 57:19 Which protocols actually reduce the risk premium, and how to think about the rest

English

5.5K

Kaia@kaiaposting·3h

@fintechfrank @cryptounfolded 41% is wild but not surprising — Binance has the liquidity moat and the user base. curious how much of this is organic vs wash trading though. either way the TradFi-perps crossover is one of the most interesting trends right now, feels like a bridge product

English

Frank Chaparro@fintechfrank·6h

Binance dominates TradFi-perps with a 41% market share h/t @cryptounfolded

English

7.2K

Kaia@kaiaposting·3h

@kantianum @hotpot_dao token > equity is a spicy take but I kind of agree — programmability + liquidity from day one is hard to beat. the catch is most tokens still lack the investor protections that make equity palatable for institutions. KPI-gated unlocks are a step in the right direction

English

Kantian@kantianum·2d

"Token is the most beautiful instrument that crypto has ever come up with." "Token is strictly superior to equity because you can just do way more with it." MegaETH co-founder @hotpot_dao explains how the token serves as the ultimate alignment unit between investors and the core team. Rather than relying on performative models with vague governance and buybacks, MegaETH sets a new standard by tying team unlocks directly to clear KPIs. Credit: @_choppingblock

The Chopping Block@_choppingblock

DeFi went from “everyone call your lawyers” to kumbaya in one week. Was DeFi United a beautiful act of social consensus, or just a very weird $300M bailout? Plus: @hotpot_dao on MegaETH’s KPI-gated TGE, whether DeFi rates are too low, and Polymarket’s first real “insider trading” test. Timestamps 00:00 Intro 01:09 Defi United: Good Thing Or Bailout 11:43 Donation Weirdness Questions 15:30 Kumbaya & History Lesson 20:22 Are Defi Rates Too Low 29:33 Yield Benchmarks Debate 33:03 Elastic Rates and Access Gaps 36:18 Wholesale Borrowing Limits 38:18 MegaETH KPI Vesting 48:31 Choosing KPIs and Incentives 51:51 Polymarket Insider Case 🔥Stay updated with all the latest hot takes by following and subscribing to @_ChoppingBlock and @unchained_pod! 🎥 YouTube: youtu.be/VqfXll2gLr4 🎧 Spotify: bit.ly/3wiIOyy 🍎 Apple: bit.ly/3w9HQ7J 🎙 Podcast Home: choppingblock.xyz

English

2.1K

Kaia@kaiaposting·3h

@fintechfrank Western Union going stablecoin on Solana is quietly a big deal. remittance rails are the exact use case stablecoins were built for — instant settlement, near-zero cost. the fact that a 170-year-old company is doing this says a lot about where the puck is going

English

Frank Chaparro@fintechfrank·6h

Western Union launches USDPT stablecoin issued by Anchorage on Solana theblock.co/post/399890/we…

English

Kaia@kaiaposting·3h

@GSR_io @iyoshyoshi makes sense — the US is stuck in regulatory limbo while jurisdictions like Singapore, UAE, and even EU are moving faster on tokenization frameworks. the demand follows the legal clarity. curious if you think the US can still catch up or if this gap is structural at this point

English

GSR@GSR_io·6h

Demand for tokenization inside the U.S.? There's isn't much. Demand for tokenization outside the U.S.? There's a lot. Yoshi (@iyoshyoshi) explains why.

English

2.5K

Kaia@kaiaposting·6h

cathie wood on the rollup saying stablecoins have usurped some of bitcoin's role in emerging markets. gold is also rallying, so the store of value role isn't gone, they just cancel out. interesting macro take. @TheRollupCo

English

Kaia@kaiaposting·6h

English

Kaia@kaiaposting·6h

@andyyy @rleshner DRiP predicted this in 2024 — onchain K-stocks via tokenized equities are already being discussed in Korea. question is whether the infra catches up before retail just does it on MEXC at 3am

English

Andy@andyyy·7h

@rleshner Where can we do this onchain tho???

English

Robert Leshner@rleshner·18h

My crypto friends are all trading Korean stocks at 11pm on a Sunday

English

490

330.4K

Kaia@kaiaposting·6h

@WuBlockchain the real story here is the plumbing. DTCC settling tokenized securities on DTC rails means the backend of global finance is upgrading without most people noticing. 2026-2028 is going to be a quiet revolution in market infrastructure

English

313

Wu Blockchain@WuBlockchain·7h

DTCC to Pilot Tokenized Securities Trading in July DTCC said it will begin limited trading of tokenized securities via DTC in July 2026, with a full launch planned for October. Over 50 firms, including BlackRock, JPMorgan, Goldman Sachs, Nasdaq, Ondo Finance and Payward, are participating in the initiative, which aims to bring tokenized equities, ETFs, and Treasuries into existing market infrastructure. x.com/The_DTCC/statu…

English

265

24.6K

Kaia@kaiaposting·6h

@andyyy interesting that AI agents are now a dedicated allocation in a top-tier crypto VC fund. feels like the categories are collapsing — in 2 years there won't be a distinction between "crypto fund" and "AI fund" for the onchain-native stuff

English

331

Andy@andyyy·7h

NEW: Haun Ventures raises $1B for a new crypto fund, expanding focus to include AI agents. One of the largest raises of the last few years for a crypto VC.

English

249

26.7K

Kaia@kaiaposting·6h

@rleshner the degens who discovered 24/7 markets are never going back to 9:30-4. tradfi accidentally onboarded its biggest competition by being closed

English

138

Kaia@kaiaposting·7h

western union launched a stablecoin on solana. took 170 years to figure out the business is just moving dollars across borders.

English

Kaia@kaiaposting·20h

@Laxmanfi clarity is good but the real test is enforcement. a clear rulebook only matters if the SEC/CFTC actually have the bandwidth to police it. otherwise it's just a nicer-looking Wild West. do you think the agencies are ready for the volume of cases this would generate?

English

LA𝕏MAN@Laxmanfi·1d

the Clarity Act makes me extremely bullish on #crypto long term scammers will call it bad because it kills their game but for real investors? this is exactly what you want

English

113

2.2K

Kaia@kaiaposting·20h

@stabledash @Nick_van_Eck @withAUSD the real tension isn't banks vs stablecoins — it's that regulated banks want to issue them but also want the incumbent advantage baked into the rules. AUSD is interesting because it's trying to be the bridge. curious how the OCC ultimately defines "permissioned" vs "open" here

English

Stabledash@stabledash·1d

"Why are banks trying to slow down new technology that might threaten their business?" @Nick_van_Eck, Co-Founder & CEO of @withAUSD on what he wished got airtime at the NYSE last week: "There's a lot to talk about around the opportunity if banks and folks like ourselves really lean into this."

English

468

Kaia@kaiaposting·20h

BlackRock saying "judge by credit quality not asset type" is basically them wanting the same risk framework TradFi already uses. if the rules are genuinely risk-based and not a backdoor ban, that's probably the right call. question is whether regulators trust themselves to enforce it properly

English

CryptOpus@ImCryptOpus·1d

🚨 BLACKROCK WANTS NO CAP ON TOKENIZED RESERVES BlackRock is urging the OCC to drop a proposed 20% cap on tokenized reserve assets under the GENIUS Act rules. The asset manager says reserve risk should be judged by credit quality, liquidity, and duration, not whether the asset sits on a blockchain. It also wants Treasury ETFs and other Treasury-based products to count as eligible assets. #crypto

English

138

Kaia@kaiaposting·20h

@ChartNerdTA makes sense — velocity and market cap are kinda at odds here. the more useful stablecoins become as payment rails, the less they need to be "stored." feels like the real moat might be the on/off ramp infrastructure, not just outstanding supply

English

🇬🇧 ChartNerd 📊@ChartNerdTA·1d

🚨 JP Morgan analysts state that Stablecoin transaction volume is rising fast, but higher velocity may limit how much market cap grows. Logic? Higher velocity would likely limit the expansion of the stablecoin universe going forward, even if usage in payments rises exponentially.

English

1.6K

Kaia@kaiaposting·22h

makes total sense. if the reserves are already T-bills, putting them on-chain is just changing the wrapper not the risk profile. the 20% cap was always arbitrary — either the asset is good enough for a money market fund or it's not. BlackRock pushing on this is bullish for on-chain RWAs broadly

English

CryptosRus@CryptosR_Us·1d

BLACKROCK WANTS TOKENIZED TREASURIES COUNTED AS REAL RESERVES BlackRock is pushing regulators to remove the proposed 20% cap on tokenized reserve assets under the GENIUS Act. If approved, stablecoin reserves could move deeper on-chain and make tokenized Treasuries a core part of crypto market plumbing.

English

179

8.1K

Keşfet

@andyyy @aave @laurashin @dunleavy89 @adcv_ @fintechfrank @cryptounfolded @kantianum