Eric Quinnell

563 posts

Eric Quinnell

Eric Quinnell

@divBy_zero

Principal Engineer, AWS Tranium. fmr Tesla Dojo, CPUs (x86+ARM). PhD Computer Arithmetic

Katılım Aralık 2022
379 Takip Edilen2.4K Takipçiler
Eric Quinnell
Eric Quinnell@divBy_zero·
@0xmer_ Also, more FLOPs. Even if you can’t use them.
English
2
0
8
674
0xm℮r
0xm℮r@0xmer_·
Top 6 things we need in 2026: 1. Another DSL for ML kernels 2. Another article explaining FlashAttention 3. Another Neural Network from scratch in C 4. Another Arena Allocator 5. Another Game Engine from scratch in C 6. Another 400 RVV instructions
English
18
7
325
17.5K
Eric Quinnell
Eric Quinnell@divBy_zero·
@itsclivetime And after your light warmup, questioned simultaneously by a panel of experts, then normalized and softmax’d for a single answer before the 2nd lap
English
0
0
6
370
Clive Chan
Clive Chan@itsclivetime·
i wonder what it feels like to be quantized, hadamard rotated, and speculatively decoded
English
20
9
194
12.7K
Eric Quinnell
Eric Quinnell@divBy_zero·
@HarshVamja Samsung ExynosM5, AMD K8, IntelP6, Tesla Dojo v1/2, ARM A9/A15…all these could open source at this point to train the LLMs, but said tribes would have to release them…
English
0
1
1
139
Hava
Hava@HarshVamja·
@divBy_zero 💯 Also hallucinations are expensive in hardware. AI-generated RTL still misses corner cases in complex designs, which can be catastrophic in silicon. How is AI ever going to mine decades of undocumented tribal knowledge. :)
English
1
1
2
152
Eric Quinnell
Eric Quinnell@divBy_zero·
AI sucks at RTL for two big reasons: 1. Trained on github open source technical prose pieces and not production code 2. Assumes the EDA toolchain supports the whole System Verilog spec
English
8
3
46
3.5K
Eric Quinnell
Eric Quinnell@divBy_zero·
@bertverrycken Claude. Not passing the “it compiles” outside the trivial, so doesn’t get a “quite good” from me
English
0
0
2
135
BERT eating AI
BERT eating AI@bertverrycken·
@divBy_zero Which AI, deploying claude code rn and it writes correct RTL and does the verification. Touch wood but it looks quite good at it.
English
1
0
1
215
@fclc
@fclc@FelixCLC_·
The matrix decomposition will continue until peak flops improve
English
1
1
32
4.5K
Eric Quinnell
Eric Quinnell@divBy_zero·
@Arronwei3n This. Can’t say it enough, thx @NuttyCLD PS GJ on citing the quantum qbits interconnect issue. Cmos has gain to clean up SNR, ppl forget it’s not just a switch PPS no PAM8/10 pls, I beg you
English
1
0
5
167
Aaron
Aaron@Arronwei3n·
"Today, $NVDA's latest GPUs can theoretically perform thousands of trillions of operations per second, but actual utilization in AI inference workloads is only 30-40%. The rest of the time? Waiting for data to arrive."
Nutty@NuttyCLD

x.com/i/article/2016…

English
2
1
31
4.3K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
IMPORTANT: Blackwell Ultra solves an attention operation performance issue found in Blackwell. As @tri_dao presented at the SemiAnalysis Hackathon, one of the biggest bottlenecks in the core attention operation is not GEMMs but the softmax (exponential)  part. In Hopper & Blackwell, the exponential operation's throughput was too slow to effectively overlap with the 2 GEMMs in attention. Blackwell Ultra solves this by having 2x higher exp throughput than Blackwell  Note that other chips have this performance mismatch issue too such as Trainium3 having to increase their exponential function throughput by 4x compared to Trainium2.
SemiAnalysis tweet media
English
9
29
360
35.9K
Eric Quinnell
Eric Quinnell@divBy_zero·
@trishume Love the Zachtronics reference. TIS-100 style as a performance predictor. Really clever
English
0
0
0
1K
Tristan Hume
Tristan Hume@trishume·
I wrote a blog post on my journey trying to make a take-home that increasingly powerful models couldn't beat. We also released the original take-home for you to try, people tend to find it really fun! It's optimizing a parallel tree traversal on a simulated SIMD VLIW machine.
Anthropic@AnthropicAI

New on the Anthropic Engineering Blog: We give prospective performance engineering candidates a notoriously difficult take-home exam. It worked well—until Opus 4.5 beat it. Here's how we designed (and redesigned) it: anthropic.com/engineering/AI…

English
15
18
471
56.7K
Eric Quinnell
Eric Quinnell@divBy_zero·
If the liquid boils somewhere along the way I suppose that clears out the biological contamination? But if liquid internal convection/conduction is used somewhere along the way internally, it’s a perfect biome for things to grow. Deployed datacenter liquid cooling has a while sub field on organics growing in the tubes, especially pure water. I’ve smelled it myself, it’s very gross. Said space plan needs some verbiage on the organics in liquid problem imo
English
1
0
0
82
Jason Newton
Jason Newton@sleep_deprivado·
don't know how much liquid would be needed but surely they're not going to grow too much in space, sort of a materials problem to find the best one for it. IIRC for radiation shielding they intended to have liquid water cause it's great at that, I'd assume whatever fluid you have for that is what you would use to distribute it to the radiation surface but that's not necessarily needed. Thermal capacity and boiling point vs pressure is probably the things you'd look through carefully to determine which. If the liquid boils you get a large increase in pressure.
English
1
0
0
49
Clive Chan
Clive Chan@itsclivetime·
Dojo1 was ultra ambitious and really pushed a lot of technologies (packaging, power delivery, system design, clock distribution, even number formats), but execution and focus could've been better. If Dojo3 is anything like the ideas floated around the office years ago it'll be nuts. Will be interesting to see if things move faster when Elon's focus is on the chips - gogogo and good luck!
Elon Musk@elonmusk

Now that the AI5 chip design is in good shape, Tesla will restart work on Dojo3. If you’re interested in working on what will be the highest volume chips in the world, send a note to AI_Chips@Tesla.com with 3 bullet points on the toughest technical problems you’ve solved.

English
43
96
1.9K
391K
Eric Quinnell
Eric Quinnell@divBy_zero·
Addendum: Transformers and the self attention architectures are notable and clever, but still mix may older ideas. My original point is that our infrastructure is truly new, yet breakthrough ideas are rare and many stem from the ages before us. Check the hubris of “new” AI
English
0
1
2
383
Eric Quinnell
Eric Quinnell@divBy_zero·
McCulloch-Pitts neuron 1943. Rosenblatt’s Perceptron 1958. Back propagation 1986. The ideas are old, the compute and the data are new.
English
1
2
8
904
Eric Quinnell
Eric Quinnell@divBy_zero·
@FelixCLC_ I.e. passed the “engineering” threshold and not the “scientific” one.
English
0
0
3
133
@fclc
@fclc@FelixCLC_·
the problem with git (as deployed and used within most interfaces) is that it's clearly well passed the "good enough" threshold, but not clearly in the "great" threshold :/
English
2
1
14
707
Eric Quinnell
Eric Quinnell@divBy_zero·
@yacineMTB Nobody mentioned the most important part: 100pF is pronounced “One Hundred Puff Capacitor” Not joking, really truly that’s the lingo. Ask about resistor color bands sometime, it’s NSFW
English
0
0
0
243
kache
kache@yacineMTB·
is the capacitors in series (is that the right word) basically just there to smooth out any upstream spikes/brownouts? why is the ferrite bead there? why is the ferrite bead followed by a resistor? why those two resistors? can someone explain
kache tweet media
English
91
6
262
377.5K
スッチョム
スッチョム@MukkyunGovernor·
@mu_chrinovic However, there is no clear advantage compared to fixed-length SIMD instructions like AVX. The benefits aren't really apparent...
English
2
0
4
81
Chrinovic .M
Chrinovic .M@mu_chrinovic·
Scalable Vector Extension
Español
1
1
4
612