Sponge

1.7K posts

Sponge

@Sol_Sponge

In search of asymmetric bets

New York, NY 加入时间 Ağustos 2021

562 关注618 粉丝

置顶推文

Sponge@Sol_Sponge·1d

Here’s why the $KOPN base case is $13.50 and bull case $42. AI’s bottleneck isn’t compute — it’s the wires between GPUs (one of many bottlenecks). Copper hit a wall. Silicon photonics is expensive. MicroLED is the dark horse: sub-pJ/bit, no lasers, CMOS-compatible. $90B AI optics TAM by 2030.

English

6.3K

Sponge@Sol_Sponge·5h

Gonna do a write up of either $AAOI or $RKLB next

English

Sponge@Sol_Sponge·5h

Understanding Jevons paradox is essential for understanding what is happening in AI.

dylan ツ@demian_ai

Inference got a hundred times cheaper this year. The compute bill went up anyway. If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now. I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill. 12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60. Today, an equivalent quality of output costs roughly $0.50. Price /token of o1-level intelligence has dropped about a 128x in a year. Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped. By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money. The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago. The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve. Here is what happened underneath. A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens. This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them. The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing. The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill. The implications stack quickly. If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists. If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them. If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session. Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have. The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters. The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous. A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins. Cheaper tokens. More tokens. Same coal as 1865.

English

Sponge@Sol_Sponge·15h

Jevons Paradox is why every AI bottleneck is a buy signal: • DeepSeek made models 10x cheaper → inference demand exploded • H100 → B200 doubled efficiency → GPU orders doubled • CXL halves inference cost → hyperscalers run 2x more workloads Efficiency creates demand. Always has.

English

Sponge@Sol_Sponge·19h

Why $PENG specifically: • $1.85B market cap • 14.9× forward P/E (cheapest in AI infra) • First production KV cache server, shipping today • Validated by Astera Labs partnership • Demo’d at NVIDIA GTC 2026 Hyperscalers will spend $200B+ on inference by 2027. Half of that is solvable with smarter memory. Bull $165 / Base $75. Not advice. Size accordingly.

English

1.3K

Sponge@Sol_Sponge·19h

The fix is CXL memory expansion — pooled memory that sits next to the GPU at near-HBM speeds. $ALAB designs the chips (Leo controllers). $PENG builds the production server that actually deploys it at hyperscale. Joint demo: 2× inference throughput, +75% GPU utilization, 3.6× memory expansion.

English

832

Sponge@Sol_Sponge·19h

AI has a memory problem. $PENG has the solution. And it’s only a matter of time before the price reflects that. Every long AI conversation builds a “KV cache” inside the GPU. Long docs, agent tasks, big context windows — the cache overflows fast. When it overflows, GPUs recompute, stall, or refuse the request. That’s why inference is so expensive.🧵

English

891

Sponge@Sol_Sponge·1d

We have significant asymmetric upside on our hands in KOPN if: 1. MicroLEDs are viable and superior to copper interconnects as dust settles 2. KOPN can execute a prototype worthy of acquisition/partnership by summer

Sponge@Sol_Sponge

English

929

Sponge@Sol_Sponge·1d

@OliverOli25101 We shall see. Upside is there

English

Oliver@OliverOli25101·1d

I like that..🤞🏼👍🙏🇺🇸🇺🇸🇺🇸🇺🇸🇺🇸

Sponge@Sol_Sponge

English

Sponge@Sol_Sponge·1d

Players ranked by tech maturity: 🥇 Avicena (private — TSMC partner, 80 fJ/bit demo’d) 🥈 Credo / Hyperlume ($CRDO — hyperscaler channel) 🥉 Kopin / Fabric.AI ($KOPN $FABC — GaN fab + defense floor) $KOPN isn’t the tech leader. It’s the public lottery ticket with a defense business funding the wait. Bear $2.25 / Base $13.50 / Bull $42. Not advice. Size accordingly.

English

836

Sponge@Sol_Sponge·1d

English

6.3K

Sponge@Sol_Sponge·29 Nis

Polymarket airdrop is going to hit like crack

English

105

Sponge@Sol_Sponge·14 Nis

Someone check the BPA levels of those guys checking receipts at the Costco exit

@levelsio@levelsio

One of the most overlooked sources of toxic hormone disruptors are receipts: > Thermal receipt paper is often coated with Bisphenol-A (BPA) or its substitute, Bisphenol-S (BPS), which are endocrine-disrupting chemicals that can absorb into the skin. > Studies suggest that even brief 10-second exposure can exceed safe thresholds, potentially harming reproductive, metabolic, and hormonal health The thermal paper literally leaks BPA into your skin in seconds and then disrupts your hormones heavily: > BPA (Bisphenol A) is a potent endocrine disruptor that primarily mimics or blocks estrogen (estradiol), but also interferes with androgen, thyroid, and metabolic hormones > BPA instantly binds to estrogen receptors (ERa, ERb, GPER) and androgen receptors, causing reproductive dysfunction, decreased testosterone, altered FSH/LH levels, and metabolic issues like insulin resistance This one was wild for me and took me awhile to fact check, because we handle so many receipts in a single day and I couldn't believe it'd be this bad These days I try ask for a digital receipt or take a quick pic of the receipt without touching it a lot

English

Sponge@Sol_Sponge·26 Mar

Insane. VCX is trading at a 16x valuation. Would value Anthropocene at almost 6T market cap… Comparable tickers RVI and DXYZ trading at only a 1.2x valuation 👀

English

Sponge@Sol_Sponge·25 Mar

@PM_TopTraders Yes but when??

English

Polymarket Top Traders@PM_TopTraders·25 Mar

@Sol_Sponge we all are

English

Sponge@Sol_Sponge·25 Mar

Okay I am ready for Polymarket airdrop $POLY

English

121

Sponge@Sol_Sponge·20 Mar

Has anyone else been polling markets this entire time?

GIF

Lorden@lorden_eth

x.com/i/article/2028…

English

Sponge@Sol_Sponge·20 Mar

I had Claude Code build me some brackets and now I’m one of 10k perfect brackets left. This March Madness thing is fun

English

Sponge@Sol_Sponge·12 Mar

Bookmark this post. If you are building anything with AI, you will need it.

aditya@adxtyahq

try public APIs for free - just found out there’s a GitHub repo where you can find legit 1000s of free API endpoints. not even kidding 😭 one link, endless APIs (infinite side projects 👀) github.com/public-apis/pu…

English

Sponge@Sol_Sponge·10 Mar

Big fan of wegmans coffee

English

Sponge@Sol_Sponge·5 Mar

Prediction market take that will annoy people: "Following smart money" on Polymarket is not a strategy. It's being the exit liquidity for someone who already has the position they want. By the time you see a large wallet accumulate, the price has already moved. You're buying at their target, not their entry. Build your own model. Even a bad model that's yours is better than perfectly copying someone who entered 6 hours ago.

English

发现

@OliverOli25101 @PM_TopTraders @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA