Andrei Stan

416 posts

Andrei Stan

Andrei Stan

@andreiofstan

Katılım Ocak 2016
920 Takip Edilen42 Takipçiler
Andrei Stan
Andrei Stan@andreiofstan·
@GavMcCracken It will never leave to the US, maybe to China Turkey or Russia, 10% chance porbably on it leaving altogether
English
0
0
0
15
bubble boi
bubble boi@bubbleboi·
Crazy how all we got from MatX was an EE101 lecture and all the SF tards are like 🤯🤯🤯🤯 x.com/dwarkesh_sp/st…
English
4
1
110
29.4K
Andrei Stan retweetledi
Siddhartha Saxena
Siddhartha Saxena@siddsax·
Anthropic onboarding day: Michael Scott introducing Karpathy like he just signed Wemby in free agency.
English
370
1.4K
16.4K
1.9M
Ovidiu Eftimie
Ovidiu Eftimie@eftimie·
@McGogoo Faza mișto e că nu trebuie să dai banii ăia. Decât dacă filmul tău are succes la cinema și îți poți permite. Altfel, drepturile de autor ale filmelor finanțate de CNA revin statului român.
Română
2
0
1
134
AF Post
AF Post@AFpost·
Israeli PM Netanyahu is set to meet tonight with coalition party leaders and senior Israeli security officials to discuss the emerging US-Iran agreement aimed at ending the war, which he describes as highly unfavorable to Israel. Follow: @AFpost
AF Post tweet media
English
156
178
1.4K
116K
Andrei Stan
Andrei Stan@andreiofstan·
@SemiAnalysis_ Bash is extremley slow, before it wouldn't matter since the latency was paid by the attention of a human/coffee break. Now that you have agents the economy fundamentally changes, single percentage improvment in grep can translate into millions of dollars fleet-wide.
English
0
0
2
1.1K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
FACT ALERT 🚨 : In modern agentic coding, 42% of the time is spent on CPU doing tool use such as editing files, running Bash scripts, running lints, etc. The economy of traditional cloud computing charges at $ per cpu core. In the economy of agents, the business model is $ per token thus to increase token revenue, you need to increase the amount of CPUs power u have so that you can generate your tokens.
SemiAnalysis tweet media
English
49
85
798
206.2K
Andrei Stan
Andrei Stan@andreiofstan·
@bubbleboi Middle train is Arm, carried by Amd and Intel CPU momentum while looking like a complete waste
English
0
0
1
165
Taelin
Taelin@VictorTaelin·
The new Gemini 3.5 Flash solved the HVM3's wnf bug in 1/3 attempts. This is my main test to take a model seriously. So far only the big models like GPT 5.5 solved it. And seems like it is 20x faster than Opus 4.6 ! Promising but Google will still find a way to fuck up
English
33
14
901
147.3K
bubble boi
bubble boi@bubbleboi·
Hot Take: But chamath is right we don’t need Taiwan.
The All-In Podcast@theallinpod

Chamath: Taiwan Loses Its Strategic Importance in 18 Months @chamath: “ We're 18 months from Taiwan not being an important moment of conversation the way it is today. Why 18 months? Because we are at a point where we're probably 1-2 nanometers away from being able to do what we need Taiwan to strategically do for us. And so as we scale up our chip fabs, as we get more capacity, and interestingly, there are these orthogonal technologies being developed. I don't know if you guys saw, but Neuralink was showcasing a machine that is literally operating at the almost nanometer scale to do the brain operations for the implantation, all automatically. When you have the dexterity and the capability mechanically to make these things, the real reason then is a very different one than what it is today. Today, it's economic. And if you take that off the table, I think we'll have a very different attitude to Taiwan.”

English
24
0
87
26.4K
Andrei Stan
Andrei Stan@andreiofstan·
@bubbleboi Price movement on earnings getting less volatile, see amat earnings, probably will continue with keysight and analog this week
English
0
0
0
2K
bubble boi
bubble boi@bubbleboi·
Believe it or not every semiconductor company could have amazing earnings and still be down 20-30% by end of year this is what we call in finance the “discount rate.”
English
17
4
399
39.1K
Andrei Stan
Andrei Stan@andreiofstan·
@0xfdf They actively hire ASIC engineers, surely not for low-freq half week positions
English
0
0
5
946
fdf
fdf@0xfdf·
One of the most persistent misconceptions I see about the industry is that systematic = quant = stat arb = HFT. Jane Street has never been an HFT in the sense of Virtu or Tower, and at this point they even have fundamental long/short equity (but traded systematically).
fdf@0xfdf

@GoshawkTrades For what it's worth, Jane Street is not a low latency specialist. They have plenty of desks running liquidity taking strategies at daily, weekly and monthly durations.

English
15
16
474
60.7K
Andrei Stan
Andrei Stan@andreiofstan·
@TDaytonPM Ulta ethernet is not better ethernet. It is ethernet. Also most of the authors in MRC are authors of UET
English
0
0
0
10
bubble boi
bubble boi@bubbleboi·
Good feeling about Monday.
English
20
5
294
22.1K
Andrei Stan
Andrei Stan@andreiofstan·
@basedjensen > Xai uses custom Tesla network stack rather than nccl rings to scale out apparently you can move bits with nccl rings, good to know
English
0
0
1
284
Hensen Juang
Hensen Juang@basedjensen·
Technical details here does not hold up at all Xai uses custom Tesla network stack rather than nccl rings to scale out.. Training in heterogeneous clusters has been solved by other labs with mfu reaching into 40's. This also buries the lead with grok demand being tiny and they don't actually expect the demand to pick up
Jukan@jukan05

Why did xAI hand over a 220,000-GPU cluster to Anthropic? The technical backdrop to xAI's decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of these, roughly 150,000 are estimated to be H100s, 50,000 H200s, and 20,000 GB200s. In other words, three different generations of silicon are mixed together inside a single cluster — a "heterogeneous architecture." For distributed training, however, this configuration is close to a disaster, according to engineers familiar with the setup. In distributed training, 100,000 GPUs must finish a single step simultaneously before the cluster can advance to the next one. Even if the GB200s finish their computation first, the remaining 99,999 chips have to wait for the slower H100s — or for any GPU that has hit a stack-related snag — to catch up. This is known as the straggler effect. The 11% GPU utilization rate (MFU: the share of theoretical FLOPs actually realized) at xAI recently reported by The Information can be read as the numerical fallout of this problem. It stands in stark contrast to the 40%-plus MFU figures achieved by Meta and Google. The problem runs deeper still. As discussed earlier, NVIDIA's NCCL has traditionally been optimized for a ring topology. It works beautifully at the 1,000–10,000 GPU scale, but once you push into the 100,000-unit range, the latency of data traversing the ring once around becomes punishingly long. GPUs need to churn through computations rapidly to keep MFU high, but while they sit waiting endlessly for data to arrive over the network fabric, more than half of the silicon falls into idle. Google sidestepped this bottleneck with its own custom topology (Google's OCS: Apollo/Palomar), but xAI, by my read, has not yet reached that stage. Layer Blackwell's (GB200) "power smoothing" issue on top, and the picture comes into focus. According to Zeeshan Patel, formerly in charge of multimodal pre-training at xAI, Blackwell GPUs draw power so aggressively that the chip itself includes a hardware feature for smoothing power delivery. xAI's existing software stack, however, was optimized for Hopper and does not understand the characteristics of the new hardware; when it imposes irregular loads on the chip, the silicon physically destructs — literally melts. That means the modeling stack must be rewritten from scratch, which in turn means scaling is far harder than most of us imagine. Pulling all of this together points to a single conclusion. xAI judged that training frontier models on Colossus 1 simply was not efficient enough to be worthwhile. It therefore moved its own training workloads wholesale onto Colossus 2, built as a 100% Blackwell homogeneous cluster. Colossus 1, on the other hand — whose mixed architecture is far less crippling for inference, which parallelizes more forgivingly — was leased in its entirety to an Anthropic that desperately needed inference capacity. Many observers point to what looks like a contradiction: Elon Musk poured enormous capital into building Colossus, only to hand the core asset over to a direct competitor in Anthropic. Others read it as xAI capitulating because it is a "middling frontier lab." But these are surface-level reads. Look at the numbers and a different picture emerges. xAI today holds roughly 550,000+ GPUs in total (on an H100-equivalent performance basis), and Colossus 1 (220,000 units) accounts for only about 40% of the total available capacity. Colossus 2 — built entirely on Blackwell — is already operational and continuing to expand. Elon kept the all-Blackwell homogeneous cluster (Colossus 2) for himself and leased out the older, mixed-generation Colossus 1. In other words, he handed the pain of rewriting the stack — the MFU-11% debacle — to Anthropic, while keeping his own focus on training the next generation of models. The real point, then, is this. Elon's objective appears to be positioning ahead of the SpaceXAI IPO at a $1.75 trillion valuation, currently floated for as early as June. The narrative SpaceXAI now needs is that xAI — long the "sore finger" — is not merely a research lab burning cash, but a business with a "neo-cloud" model in the mold of AWS, capable of leasing surplus assets at high yields. From a cost-of-capital perspective, an "AGI cash incinerator" is far less attractive to investors than a "data-center landlord generating cash." As noted above, the most important detail of the Colossus 1 lease is that it is for inference, not training. Unlike training, inference requires far less tightly synchronized inter-GPU communication. Even when the chips are heterogeneous, the workload parcels out cleanly across them in parallel. The straggler effect — the chief weakness of a mixed cluster — is essentially neutralized for inference workloads. Furthermore, with Anthropic occupying all 220,000 GPUs as a single tenant, the network-switch jitter (unanticipated latency) that arises under multi-tenancy disappears. The two sides' technical weaknesses end up complementing each other almost exactly. One insight follows. As a training cluster mixing H100/H200/GB200, Colossus 1 was an asset that could only deliver an MFU of 11%. The moment it was handed over to a single inference customer, however, that asset transformed into a cash-flow asset rented out at roughly $2.60 per GPU-hour (a weighted average of the lease rates across GPU types). For xAI, what was a "cluster from hell" for training has become a "golden goose" minting $5–6 billion in annual revenue when redeployed for inference. Elon's genius, I would argue, lies not in the model but in this asset-rotation structure. The weight of that $6 billion becomes clearer when set against xAI's income statement. Annualizing xAI's 1Q26 net loss yields roughly $6 billion in losses per year. The $5–6 billion in annual revenue generated by leasing Colossus 1 to Anthropic, in other words, almost perfectly hedges xAI's loss figure. This single deal effectively pulls xAI to break-even. Heading into the SpaceXAI IPO, this functions as a core line of financial defense. From a cost-of-capital standpoint, if the image shifts from "research lab burning cash" to "infrastructure tollgate stably printing $6 billion a year," the entire tone of the offering can change. (May 8, 2026, Mirae Asset Securities)

English
11
4
139
24.6K
Andrei Stan
Andrei Stan@andreiofstan·
@jukan05 @QEDvinci @baeko_02 Intel all the way, big Don will be happy about it, maybe expediate permits. Also he has trust issues so the fact that he agreed with Intel was a pleasant surprise
English
0
0
1
517
Jukan
Jukan@jukan05·
@QEDvinci @baeko_02 They haven’t disclosed where they’re sourcing the memory IP from. And why would the memory Big Three give Elon access to their memory IP?
English
5
0
26
2.7K
Jukan
Jukan@jukan05·
Elon’s SpaceX has submitted plans for its first fab, and the initial investment alone is reportedly $55 billion. If fully expanded, total spending could reach as much as $119 billion. …Are they serious?
Jukan tweet media
English
90
128
1.3K
414.4K
Andrei Stan
Andrei Stan@andreiofstan·
@SemiAnalysis_ > A common misconception is that TPU v8i must be the training chip because it has two compute dies literally no one thinks that
English
0
0
6
903
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
A common misconception is that TPU v8i must be the training chip because it has two compute dies. Die count is not the relevant metric, what matters is the balance between compute throughput and memory capacity/bandwidth. Reason 1: Memory capacity and bandwidth TPU v8i has 8 stacks of HBM3E 12-Hi versus 6 on TPU v8t, giving it 288 GB of HBM and 8.6 TB/s of memory bandwidth versus 216 GB and 6.5 TB/s on the training chip. This matters because inference decode is memory-bandwidth-bound, not compute-bound. The 8i also carries 384 MB of on-chip SRAM versus 128 MB on the 8t, providing more buffer for KV cache and attention operations. Reason 2: The training chip achieves higher FP4 FLOPs from a single die Despite having two compute dies, TPU v8i achieves only 10.1 PFLOPs at FP4, while the single-die TPU v8t achieves 12.6 PFLOPs. Google designed the 8t's die to be extremely compute-dense, maximizing MXU throughput for training's sustained high arithmetic intensity. This also seems to highlight Google's broader direction, Google is attempting to train with FP4, a regime where the 8t's dense single die excels.
SemiAnalysis tweet mediaSemiAnalysis tweet media
English
7
36
258
47.8K
Andrei Stan
Andrei Stan@andreiofstan·
@MikeLongTerm Such a bad take, how much allocation do you think AMD has for TSMC wafers?
English
0
0
3
377
Mike
Mike@MikeLongTerm·
$AMD $INTC Memory Companies raising price so much that it is pushing $INTC to go all in on this space and leaving CPUs business to the best chip designer $AMD. Do you know why? Because these greedy memory companies are getting 70%+ margin and still raising price. A manufactured shortage that should have plenty of supply in 2026 2027. Memory is much easier and less capital-intensive per unit of capacity than manufacturing CPUs. Intel CPUs margin been at 40% or lower range, because $TSM is just superior at scale. And yes, 10 2nm TSMC fabs will be more than sufficient to $AMD provide a significant %inference compute demand. This is not a bold call. But a real possibility, and we do need more memory supply!!! Not Financial Advice!
Mike tweet media
Mike@MikeLongTerm

When $AMD shareholders understood Dr. Su secured at least 30-40% TSMC 2nm allocation

English
11
8
134
37.6K