Data_is_King

1.9K posts

Data_is_King

@data_is_king

DMs open | Intelligence is the ability to learn, not knowing many things | Investing in the digitalization of the economy | Not investment advice

Cyberspace Katılım Ağustos 2013

2.3K Takip Edilen1.4K Takipçiler

Data_is_King retweetledi

Bill@wabuffo·28 Nis

If things feel a bit deflationary at the moment, this is your reminder that due to a bumper April tax haul, the US Treasury is running a short-term, but significant surplus (currently >$250B in April m-t-d). This temporary reduction in the creation of new sovereign obligations (ie, USD monetary assets) becomes a seasonal factor that adds friction to economic growth (probably thru the summer). Yes, there are other macro/geopolitical factors, yada, yada... but this surplus doesn't help things until the cumulative effect reverses by Sept-Oct.

English

13.5K

Data_is_King retweetledi

Tesla AI@Tesla_AI·8 Nis

New release of FSD Supervised now starting to roll out This update brings 20% faster reaction time to further increase safety, among many other improvements Full release notes below Full Self-Driving (Supervised) v14.3 includes - Upgraded the Reinforcement Learning (RL) stage of training the FSD neural network, resulting in improvements in a wide variety of driving scenarios. - Upgraded the neural network vision encoder, improving understanding in rare and low-visibility scenarios, strengthening 3D geometry understanding, and expanding traffic sign understanding. - Rewrote the AI compiler and runtime from the ground up with MLIR, resulting in 20% faster reaction time and improving model iteration speed. - Mitigated unnecessary lane biasing and minor tailgating behaviors. - Increased decisiveness of parking spot selection and maneuvering. - Improved parking location pin prediction, now shown on a map with a (P) icon. - Enhanced response to emergency vehicles, school buses, right-of-way violators, and other rare vehicles. - Improved handling of small animals by focusing RL training on harder examples and adding rewards for better proactive safety. - Improved traffic light handling at complex intersections with compound lights, curved roads, and yellow light stopping – driven by training on hard RL examples sourced from the Tesla fleet. - Improved handling for rare and unusual objects extending, hanging, or leaning into the vehicle path by sourcing infrequent events from the fleet. - Improved handling of temporary system degradations by maintaining control and automatically recovering without driver intervention, reducing unnecessary disengagements. Upcoming Improvements - Expand reasoning to all behaviors beyond destination handling. - Add pothole avoidance. - Improve driver monitoring system sensitivity with better eye gaze tracking, eye wear handling, and higher accuracy in variable lighting conditions.

English

1.4K

11.2K

38.7M

Data_is_King retweetledi

SemiAnalysis@SemiAnalysis_·3 Nis

Memory is taking over Hyperscaler CapEx. In CY23 and CY24, memory was ~8% of total Hyperscaler spend. We estimate it hits 30% in CY26 and moves higher in CY27. That's a near-4x shift in just four years. (1/4) 🧵

English

134

790

230.2K

Data_is_King retweetledi

Trade Whisperer@TradexWhisperer·2 Nis

$MU Memory is now a moat for AI buildout, Smartphone, PC, and Cars companies. Companies that locked in supply will grow into every one of those markets. Companies that didn't will spend 5 years explaining to shareholders why competitors are pulling ahead. I've spoken

English

224

13.1K

Data_is_King retweetledi

Bill@wabuffo·31 Mar

As I like to do when we are in a US employment data-heavy week, here is some more data to throw into the mix. US employment tax receipts are a rough-cut measure of employment income growth in real-time and they continue to be solid. FWIW.

English

125

25.8K

Data_is_King@data_is_king·31 Mar

@TeslaTrip @a_meta4 Every time someone compresses the KV-cache, an agentic workload that was waiting in line gets to run. Compression into supply constraints is net additive to memory demand. Jevons Paradox never misses.

English

TeslaTrip@TeslaTrip·31 Mar

@a_meta4 Could be a million times efficient and still consume all.

English

Data_is_King retweetledi

Meta4@a_meta4·31 Mar

This paper has having a well deserved splash: TurboQuant Executive summary: there are new ways to reduce the RAM requirement without any precision loss for transformers.. 6x less. RAM is the biggest bottleneck. NFA: The reality is that we will likely still consume the excess supply that opens up.. but be more efficient with it ;)

English

1.1K

Data_is_King retweetledi

Rihard Jarc@RihardJarc·31 Mar

People are bearish on memory, but the leaked Claude Code source code is showing us some additional memory demand that the market hasn't priced in IMO. 1. The market thinks about AI memory demand as a server-side story: HBM on H100s/B200s for inference. What the bug reports reveal in this code is that the client-side of AI coding agents is also extraordinarily memory-hungry. Idle Claude Code processes growing to 15GB each, active sessions hitting 93-129GB. This matters because the feature flag pipeline (DAEMON, PROACTIVE, CRON) points toward future always-on background agents. If a developer has a persistent daemon agent running alongside their active sessions, you're looking at baseline memory consumption of 15-30GB+ just for Claude Code on a developer workstation - before they even open their IDE, browser, or anything else. This means either enterprise IT needs a big uplift to higher-RAM workstations or we move even more memory-hungry workloads towards the cloud. 2. The Auto Dream consolidation feature runs background Claude sessions to clean up memory files. One observed consolidation took 8-9 minutes processing 913 sessions. In other words, a meaningful fraction of Anthropic's token consumption is the system managing its own memory, not the user doing productive work. As memory systems get more sophisticated (team sync, cross-session event buses, memory consolidation), this overhead grows. It's a recursive cost - more memory features require more inference to manage memory. I don't think anyone is modeling this as a distinct line item in token consumption estimates. 3. 1M token context windows for Claude Code. Moving from 200K to 1M context is a 5x increase in KV cache memory per session on the server side. Combined with multi-agent (5-15x per user) and the proactive/daemon features (sessions that persist for hours/days instead of minutes), you get a compounding memory demand curve that's steeper than linear adoption growth that many analysts model. Memory demand per active user is increasing faster than user count, because each user's sessions are getting longer, wider (more agents), and deeper (larger context windows).

Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

763

234.2K

Data_is_King retweetledi

Chris@chatgpt21·30 Mar

I think we just got a demo of Mythos and I’m surprised nobodies talking about it.. 💔 In what might be the first instance the general public has seen of Claude Mythos Mythos (TBD) just uncovered a critical zero-day vulnerability in Ghost, an open-source platform with over 50,000 stars on GitHub that has never had a critical security flaw in its entire history. It identified a highly complex "blind SQL injection" a flaw so subtle you can't even see the output, only how the server delays its response. When asked to prove the severity of the bug, the model autonomously wrote a custom Python exploit script that successfully navigated the blind injection to extract the admin API key, secret, and password hashes from the database completely unauthenticated… This is genuinely game changing because it proves frontier models can now actively discover, reason through, and successfully build exploits for invisible vulnerabilities in enterprise-grade architecture that human developers missed for years. Cyber security companies are cooked.

English

102

153

1.9K

289.1K

Data_is_King retweetledi

Sawyer Merritt@SawyerMerritt·29 Mar

This 93 year old has found new freedom after she bought a new @Tesla Model Y with FSD. She also uses Grok navigation. "Although she has always been a good driver, my mom can now drive without the fear or fatigue that can naturally come with age. No more relying on others for every trip. No more feeling stuck. This is true mobility that can spark new adventures in a still adventurous women!" (via Dan Doyle's Family Channel. Full video below)

English

952

23.4K

2.1M

Data_is_King retweetledi

Jukan@jukan05·27 Mar

[KIS — Chae Min-sook / Kim Yeon-jun] Semiconductor Industry Note: After TurboQuant and DeepSeek, the Conclusion is Clear ● TurboQuant's Opening Shot On March 25 (local time), Google officially published the TurboQuant algorithm via its official blog, claiming it can compress KV Cache by up to 6x with no performance degradation. The market interpreted this as a reduction in memory requirements, causing memory semiconductor stocks to fall sharply. However, this reflects a misreading stemming from a confusion between the roles of memory capacity and memory bandwidth. The bottleneck in AI inference is not a shortage of memory capacity, but rather a problem determined by memory access speed and data movement efficiency. TurboQuant should instead be understood as a technology that partially alleviates this bottleneck and improves GPU efficiency, enabling more tokens to be processed per unit time with the same GPU resources. ● The Structure of AI Inference: Prefill and Decode LLM inference consists of two stages: Prefill and Decode. The Prefill stage is a compute-intensive task where GPU computational power limits performance, while the Decode stage is a memory-intensive task where the speed of data movement determines performance. In Decode, the existing KV cache must be repeatedly referenced with each new token generated, making it structurally sensitive to memory bandwidth and access latency. Since the Decode stage largely determines the response speed that users experience, it sits at the heart of AI inference optimization. ● How TurboQuant Actually Works When TurboQuant claims to compress KV Cache by up to 6x, this is less about reducing the required memory capacity itself and more about significantly lowering the data size occupied by the KV Cache and the resulting memory access burden. This means the volume of data that must be handled within the same HBM bandwidth is reduced. As a result, memory access latency is alleviated and the time GPUs spend waiting for data can decrease. Because KV Cache accesses occur repeatedly during the Decode stage of LLM inference, reducing data size has a direct impact on relieving the memory bottleneck. The proportion of time GPUs spend waiting for memory responses diminishes, and compute resources are utilized more efficiently. This improves actual GPU utilization rates and increases the number of tokens that can be processed per unit time (throughput) within the same hardware environment — which can also be interpreted as a reduction in cost per token. ● The Real Bottleneck in AI Inference: Bandwidth, Not Capacity It appears the market interpreted TurboQuant as reducing memory capacity usage and therefore decreasing HBM demand. However, the core bottleneck in AI inference is not a shortage of memory capacity but rather the speed at which data is read from memory — i.e., memory bandwidth and access latency. Because GPU compute cores process data far faster than HBM can supply it, GPUs sit in a wait state while expecting data from memory. Industry research indicates that in the Decode stage, more than 50% of attention computation cycles are spent in a wait state due to memory access latency — meaning GPUs waste more than half their theoretical performance waiting for memory responses. According to research published by Google DeepMind in January 2026, Nvidia GPU 64-bit FLOPS grew approximately 80x between 2012 and 2022, while memory bandwidth grew only about 17x over the same period; this gap is expected to continue widening going forward (arXiv:2601.05047, forthcoming in IEEE Computer). TurboQuant will narrow the gap between GPU compute capability and memory bandwidth, increasing the number of tokens that can be processed per unit time on the same hardware. This lowers cost per token, encouraging broader AI usage, drawing in more services and users, and ultimately leading to an increase — not a decrease — in KV Cache consumption. ● A Bottleneck TurboQuant Cannot Solve: Inter-Chip Communication Latency There is another bottleneck in AI inference that TurboQuant does nothing to address. Large models exceed the memory capacity of a single GPU and are therefore distributed across multiple GPUs. During the Decode stage, because the model is spread across multiple GPUs, intermediate computation results must be exchanged between GPUs with each token generated. These data transfers are small in size but occur at very high frequency, which can introduce inter-chip communication latency. To reduce this latency, a single GPU must be able to handle more parameters and KV Cache — which ultimately demands more HBM capacity per GPU. In the current AI environment, where model sizes and context lengths continue to expand, this need for greater HBM capacity will only intensify. The fact that Nvidia has consistently increased HBM capacity with each successive GPU generation is likely not unrelated to this dynamic. ● TurboQuant's Limitations: Still in Early Validation While TurboQuant carries significant technical implications, there are also limitations in its scope of application and validation that deserve consideration. The published performance benchmarks are confined to relatively simple test environments centered on single-query information retrieval tasks such as LongBench and Needle-In-A-Haystack. Experiments were also conducted primarily on comparatively small models with around 8 billion parameters; whether the same effects can be reproduced in the large-scale models of hundreds of billions of parameters used in real-world industry settings has yet to be verified. More importantly, no validation has been conducted in the rapidly expanding Agentic AI environments. In these environments, models execute repeated multi-step judgments across longer contexts and more complex KV Cache structures, giving rise to memory usage patterns that differ entirely from those seen in single-response benchmarks. $MU $SNDK

English

131

26.4K

Data_is_King@data_is_king·27 Mar

If you think Google's TurboQuant is bearish for HBM demand, you don't understand Jevons' paradox. 6x KV cache compression doesn't mean you need less memory. It means you can finally run 1M+ token contexts and 8x larger batch sizes. That's a demand ACCELERANT. Meanwhile B200 rental rates just spiked 40% off the Jan lows to ~$6/hr. The scarcest compute on earth is repricing higher while the bears are writing notes about efficiency gains. Coal didn't get less popular when the steam engine got more efficient. It got more popular.

English

117

Data_is_King retweetledi

Rihard Jarc@RihardJarc·26 Mar

The market is worried about memory because of $GOOGL's TurboQuant compression of the KV Cache shows you how little investors know about specific technical changes and what they actually mean. Some of my thoughts on this topic: 1. The TurboQuant concept is not new; it was already published one year ago as an arXiv (April 2025). 2. The paper has benchmarked the 8x performance increase in computing attention when there is an FP32 setup and a 6x memory reduction assessment on the FP16 baseline. Frontier AI labs do not run FP32/FP16 KV cache for inference in production today. Most of the inference on leading AI labs is run on FP8, some even FP4, so the claimed savings on HBM are much lower if you compare them to actual current productions. Also, every frontier AI lab is already compressing their KV cache as much as they can before this paper was published (this is nothing new in terms of the direction of the market). So the savings compared to what is being used mostly today are smaller than the 8x and 6x numbers. 2. When we get some savings on memory (they are happening all the time, through different iterations), we get better models that are able to serve a bigger context window. The difference in quality between a model with a 1M context window and a smaller context window model is enormous. This time will be no different; usage will grow even more as a result of this, as models get better and are able to have bigger context windows, and HBM demand will continue to grow (even for inference, where these optimizations are taking place). 4. Funny enough, with DeepSeek's Multi-head Latent Attention introduction (back in Jan 2025), the KV cache was compressed by a lot more than this and still resulted in drastically more memory demand.

English

375

53.9K

Data_is_King retweetledi

Guy Berger@EconBerger·26 Mar

Claims: 1/ No sign of layoffs rising. Indeed, the opposite, initial claims are lower than they've been in each of the past 3 years. We'll see if this survives persistently high energy prices...

English

5.5K

Data_is_King retweetledi

Geiger Capital@Geiger_Capital·24 Mar

Very weak 2yr auction. 24.1% Primary Dealer takedown. Highest since 2022.

English

590

96.5K

Data_is_King retweetledi

Wall St Engine@wallstengine·24 Mar

Costco just rolled out Kirkland Signature sparkling energy drinks. It’s a 24-pack for $16.99, with peach, orange, and tropical flavors. Each can has 200 mg of caffeine, so this is clearly Costco taking a shot at the Celsius crowd.

English

168

203

7.3K

6.5M

Data_is_King retweetledi

Aaron Chan@RecurveCapital·24 Mar

I was able to host $CCOI's founder and CEO Dave Schaeffer for another interview almost exactly three years after our first one. Link here: youtu.be/OOn4XKZoDws. Lots of great material in the full video. 8 takeaways below:

YouTube

English

9.3K

Data_is_King retweetledi

محمدباقر قالیباف | MB Ghalibaf@mb_ghalibaf·23 Mar

2/ No negotiations have been held with the US, and fakenews is used to manipulate the financial and oil markets and escape the quagmire in which the US and Israel are trapped.

English

5.3K

19.8K

3.8M

Keşfet

@TeslaTrip @a_meta4 @Tesla @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates