Brian Kurtz

334 posts

Brian Kurtz banner
Brian Kurtz

Brian Kurtz

@Math4Good

Accelerating inference @positron_ai.

Se unió Haziran 2020
154 Siguiendo47 Seguidores
Brian Kurtz
Brian Kurtz@Math4Good·
@tunguz So you think models will get smaller and denser over time?
English
0
0
0
37
Brian Kurtz
Brian Kurtz@Math4Good·
@FelixCLC_ Put another way, show me your attention perf not your expert ffn perf
English
0
0
0
37
@fclc
@fclc@FelixCLC_·
Rule of thumb: any HW performance claims on LLMs >=100B on less than ~32K tokens is a waste of yours and everyone else's time. Nvidia has brilliant people who know this... Even an explore sub agent wants a good 8K+ these days
@fclc tweet media
English
2
4
51
4.4K
Brian Kurtz
Brian Kurtz@Math4Good·
@IanCutress Smokey and the Groq. A whole lot of ffn and not a lot of attention.
English
0
0
0
1.1K
𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠
OK here we go. Same architecture, scaled up. 500 MB SRAM per chip. Jensen saying up to 25% of your datacenter could be groq. 8 way systems for 4 GB SRAM. Use dynamo for attention Decode only. Working over ethernet on special mode to half latency. Samsung LP4X. Ship in 2H/3Q.
𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠 tweet media𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠 tweet media
English
12
44
454
66.1K
Brian Kurtz retuiteado
Brian Kurtz
Brian Kurtz@Math4Good·
@jukan05 So roughly 2TB weight capacity in total? Surprisingly low.
English
0
0
0
662
Jukan
Jukan@jukan05·
Groq, AI Chip Startup Acquired by NVIDIA, to Ramp Up Production at Samsung Foundry Groq, the AI chip startup that NVIDIA effectively acquired for approximately $20 billion (around 29 trillion won), has reportedly requested a production increase from Samsung Electronics' foundry (contract manufacturing) division. As demand grows for inference-optimized AI chips capable of maximizing power efficiency, Samsung Foundry is expected to deepen its collaboration with Groq and accelerate improvements to its profitability. According to industry sources on the 9th, Groq has recently decided to increase its wafer-based production volume at Samsung Foundry from approximately 9,000 wafers to around 15,000 wafers. While last year's production was essentially at the sample chip level — aimed at evaluating whether the chips could be effectively used for AI inference — this year is seen as marking the early stages of full-scale mass production for commercial deployment. Groq is the AI chip startup that NVIDIA reportedly acquired through an indirect structure in December of last year for approximately $20 billion. Rather than taking direct managerial control, NVIDIA announced it would partner with Groq through a "non-exclusive technology license agreement." Groq CEO Jonathan Ross and other executives are said to have joined NVIDIA following the license deal, tasked with integrating Groq's chip designs into NVIDIA's products. The strategy is understood to have been chosen so that NVIDIA could absorb key talent and achieve an acquisition-equivalent outcome while sidestepping antitrust scrutiny. The process of advancing AI models is typically divided into two phases: "training" and "inference." Training is the stage in which a model "learns" patterns from large volumes of data, while inference is the process of using a trained model to "derive" predictions or conclusions from new data. Companies like NVIDIA and AMD, which currently dominate the AI chip market, mass-produce chips specialized for training. However, growing concerns over excessive power consumption and high chip costs are driving increasing demand for inference-optimized AI chips capable of running AI models more efficiently. The prevailing view is that NVIDIA — already dominant in the training chip market — pursued the indirect acquisition of Groq in order to extend its ecosystem into the inference market as well. While the volume Groq has commissioned from Samsung is not large, the analysis is that Samsung Foundry aggressively pursued the order as a foundation for securing future inference chip business. In addition to Groq, Samsung Foundry is also the sole manufacturer of processors for HyperAccel, a domestic inference AI chip startup. Samsung produces AI chips for both Groq and HyperAccel on its 4-nanometer (nm) process node. A semiconductor industry official noted: "The 4nm process that Samsung Foundry uses to mass-produce Groq's AI chips incorporates a wide range of improvements aimed at enhancing chip performance. Given that the process carries a high unit cost and that 4–5nm demand is the strongest in the industry, winning this business is also meaningful as a reference win to remain competitive against TSMC. With NVIDIA entering the AI chip market and Groq scaling up production, expectations are growing that the inference AI chip market is on the verge of a full-scale breakout." Meanwhile, market interest in inference-optimized AI chips is intensifying amid reports that NVIDIA plans to unveil an inference-specialized chip at GTC 2026 based on Groq's chip architecture. Industry observers expect NVIDIA to leverage Groq's inference chip design — which uses SRAM (Static RAM) in place of the High Bandwidth Memory (HBM) found in conventional AI chips. Replacing HBM with SRAM in AI chips is said to offer advantages including faster data transfer speeds, improved power efficiency, and lower chip costs.
Jukan tweet media
English
21
50
463
95.8K
Brian Kurtz
Brian Kurtz@Math4Good·
@chamath What’s the realistic market size though.
English
0
0
0
77
Chamath Palihapitiya
Chamath Palihapitiya@chamath·
I plan to buy and deploy large fleets around the country when possible. Should pay back and be positive FCF < 2 years…
Teslaconomics@Teslaconomics

I plan on owning my own Tesla Robotaxi fleet one day. And the more I run the numbers, the more I realize this new business could become one of the most powerful income opportunities I've ever seen. This is how I'm thinking about it. Based on many analyst models and Tesla’s long-term vision, a reasonable base case assumption is about ~$30,000 per year in net profit per Robotaxi to the owner. This is after things like Tesla’s platform fee, charging, tires, maintenance, insurance, and cleaning. Of course, the network is still early and Tesla is just beginning to roll this out in pilot programs in a few cities, so there’s no official real-world owner earnings yet... but using reasonable assumptions around utilization, pricing per mile, and operating costs, the math starts to get really interesting. If one Robotaxi can earn around $30,000 per year, here’s what a fleet might look like: • $100,000 per year → about 4 Robotaxis • $500,000 per year → about 17 Robotaxis • $1,000,000 per year → about 34 Robotaxis It may sound a bit crazy at first, but when you break it down, it starts to make more sense. These vehicles could potentially drive 50,000 to 100,000+ miles per year in high demand areas. If the economics land somewhere around $0.25-$0.50 profit per mile after all costs, you end up right around that ~$30k per vehicle per year range. And remember, the Tesla’s Robotaxi network is going to work a lot like Airbnb for cars. You add your vehicle to the network, Tesla handles the software, routing, payments, and rider experience, and they take a platform fee (often modeled around 25-35%). The owner keeps the rest after operating costs. Another thing that makes this interesting is the expected cost of the vehicles themselves. Tesla has talked about the purpose-built Cybercabs costing roughly $25k-$30k and Elon told me production is starting in 1 month! If that’s even close to reality, a fleet capable of generating around $1 million per year could theoretically cost somewhere around $850k-$1M in vehicles. That ROI is pretty freakin good! Now to be clear, none of this is guaranteed. I'm just thinking out loud and sharing it with you... a lot still depends on regulations, how fast unsupervised FSD scales, demand in each city, insurance costs, and how Tesla structures the network. But if the system works the way Elon has described it for years, owning a Robotaxi fleet could become one of the most powerful forms of passive income I've ever seen. And I plan on sharing the numbers with everyone on 𝕏 when the day comes. Personally, that’s why I’m paying such close attention. Bc one day, owning a fleet of autonomous Teslas working for me 24/7 might be the modern version of owning a rental property, except instead of tenants, you’ve got robots driving people around all day while you sleep. This next book of Tesla is going to be so exciting!

English
690
516
6.5K
1.4M
Brian Kurtz retuiteado
Mitesh
Mitesh@mitesh711·
We will be world’s first terabyte plus memory density silicon and will be in production in 2027. Another cool feature that weaver allows us is to have configurable silicon sku for amount of memory, so instead of only one set amount of memory per chip, we can have anywhere from 576GB to 2304GB per chip based on customer’s application and this can be done at system build out time.
Ben Pouladian@benitoz

$CRDO Q3 call revealed two things the market is sleeping on: Weaver gearbox: 10x memory IO density. Positron building a 2TB inference XPU on it for speed! Lasers are out: ZeroFlap optics 1000x more reliable, half the power. Production ramp Q1 FY27. Listen Credo:

English
3
4
23
2.9K
Kal Grinberg
Kal Grinberg@KalGrinberg·
I spec'd out the mission.. Droid is warning me that this is an enormous project - I like the sound of that
Kal Grinberg tweet media
habibi@habibislop

@KalGrinberg @FactoryAI Build a high-fidelity 3d simulation of the earth, outer space, and the moon, and use the codebase for the Apollo 11 Guidance Computer (linked below) to recreate the moon landing within that simulation github.com/chrislgarry/Ap…

English
3
2
22
1.2K
Suhail
Suhail@Suhail·
It feels like someone should make a post-git-hook where it asks the AI model to look at the diff of what you changed for a merged PR and update the repo’s various readmes and other documentation to make it easier for an LLM to be able to write code and reference things faster rather than reading every single line of source code that might be relevant constantly. The agents need their own docs.
English
63
9
354
53.8K
Brian Kurtz
Brian Kurtz@Math4Good·
@benitoz Groqs compiler is built on the premise of a purely deterministic graph
English
0
0
0
48
Ben Pouladian
Ben Pouladian@benitoz·
Groq isn't about SRAM. It's the compiler NVIDIA paid $20B for the hardest compiler ever built. Schedules everything before silicon wakes up They have the exact IP to fix Groq's weaknesses Dedicated inference engine. No CoWoS. No HBM Additive TAM Custom ASIC & TPU boyz scared
Ben Pouladian@benitoz

lol WSJ don’t know @insane_analyst GTC gonna be epic 🚀🚀

English
10
12
139
23.6K
Brian Kurtz retuiteado
Joe Fioti
Joe Fioti@joefioti·
Sneak peek at Luminal Inference OS, here running a vllm-style LLM inference server. It's running slowed down for visualization purposes. Compute graphs are compiled and run near-roofline thanks to the Luminal compiler.
English
4
8
64
5.8K
Brian Kurtz
Brian Kurtz@Math4Good·
@FactoryAI love you guys! curious how you might manage/nudge missions over time. So much can be accomplished with up front planning, but its hard to control/track once things are in flight. Curious to see how you can monitor while the things are progressing and nudge if needed
English
1
0
4
1.8K
Factory
Factory@FactoryAI·
Droids can now pursue goals autonomously over multi-day horizons. You describe what you want, approve the plan, and come back to finished work. We call these Missions.
English
47
73
829
370.3K
Balaji
Balaji@balajis·
AI is amazing for small-TAM custom software. Indeed, the smaller the market, the more amazing it is. Because small markets typically don’t support the costs of software development.
English
187
171
2.2K
198K
Dan Hockenmaier
Dan Hockenmaier@danhockenmaier·
This piece shows a profound lack of understanding of how marketplaces work and why they are defensible. “A competent developer could deploy a functional competitor in weeks, and dozens did, enticing drivers away from DoorDash and Uber Eats by passing 90-95% of the delivery fee through to the driver.” Anyone could have done that at any time in the last ten years. Why was no one able to? Because the hard part has nothing to do with building the app or attracting the drivers. The hard part is building a liquid marketplace with all of the best supply and a massive series of optimizations and investments to drive down prices and delivery times and drive up reliability and quality. DoorDash and Eats have built this when no one else could, and they will not allow agents to transact on their apps, nor will they have a legal requirement to allow it. But the real story isn’t as sensational, so it doesn’t get the engagement.
English
42
12
598
563.8K
Citrini
Citrini@citrini·
JUNE 2028. The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation. What happened?​​​​​​​​​​​​​​​​ citriniresearch.com/p/2028gic
English
1.9K
4.3K
27.9K
28.6M
Jordan Nanos
Jordan Nanos@JordanNanos·
8 exaFLOPs = 64 wafers
Jordan Nanos tweet media
Cerebras@cerebras

Proud to partner with @G42ai and @mbzuai (Mohamed bin Zayed University of Artificial Intelligence) to deliver a national-scale AI supercomputer in India with 8 exaflops of compute capacity. This cluster is designed to support researchers, startups, enterprises, and government entities, and will serve as a foundational asset under the India AI Mission, accelerating AI innovation tailored to India’s needs.

Nederlands
2
0
12
5.7K
Brian Kurtz
Brian Kurtz@Math4Good·
@gfodor It really cannot. It’s 1000 times bigger and the weights have to be stored on chip, and somehow they have to maintain the right balance of mults and memory. With sparse models you only activate 5% of chips. Weird and crazy.
English
2
0
2
443
gfodor.id
gfodor.id@gfodor·
@Math4Good It will run this fast one day, not sure what your point is
English
2
0
21
1.3K
Brian Kurtz
Brian Kurtz@Math4Good·
@JordanNanos @taalas_inc What if you could run huge sparse MoEs twice as fast as the SRAM guys, but in one node. Like k2 huge.
English
0
0
0
40
Jordan Nanos
Jordan Nanos@JordanNanos·
Many have experienced the difference between O(10) tok/s and O(100) tok/s and compare small/fast with big/slow models Some have experienced O(1000) tok/s with Groq, Cerebras Now @taalas_inc has demo’d O(10,000) tok/s A glimpse of the future
swyx@swyx

yesterday we chatted with @martin_casado and @sarahdingwang on the pod and he happened to do basic math™ on the logic of asics today @taalas_inc launched their HC1 asic that can inference 17k tok/s. Sure, it's a shitty 3.1 8B today which is a 1.5 year gap. But read the details to the HC2 this winter, and do the math — this timeline will converge to 0 in the next 2 years. Build accordingly.

English
2
2
17
3.3K