Eric Quinnell

582 posts

Eric Quinnell

Eric Quinnell

@divBy_zero

Founding Chip Architect, Stealth Startup. Fmr AWS Tranium, Tesla Dojo, CPUs (x86+ARM). PhD Computer Arithmetic

Katılım Aralık 2022
391 Takip Edilen2.4K Takipçiler
Osvaldo Pinali Doederlein
As a SW guy what puzzles me most in chip design is that it's a total monopoly of one programming language: Verilog. Yeah there's some higher-level stuff, some variants, but all ultimately translated to Verilog. Where's the fun without a hundred different PLs to confuse you
English
4
0
11
1.5K
Eric Quinnell
Eric Quinnell@divBy_zero·
@corsix @opinali This. Also all the different EDA tools only support different random subsections of Verilog. Part of the fun is finding the sub-sub-set that actually works in them all without losing chip intent
English
1
0
2
47
Pete Cawley
Pete Cawley@corsix·
@opinali Verilog already comes with the confusion level of a hundred different PLs
English
1
0
11
294
Eric Quinnell
Eric Quinnell@divBy_zero·
@Leik0w0 It’s usually better for hardware systolic arrays. Hw doesn’t execute matmuls in the same physical directions as paper and pencil matrices - the row and col weights come from the same physical place, so it has to be row+row or col+col. Someone is doing the xpose somewhere
English
0
0
1
116
Léo
Léo@Leik0w0·
WHO DECIDED WE WOULD USE COL MAJOR BY DEFAULT ????
Léo tweet media
English
3
0
14
2.3K
Irrational Analysis
Irrational Analysis@insane_analyst·
It's been over an hour and I still can't import external SPICE models correctly. WTF am I doing wrong I made a spice directive like the tutorials said.
Irrational Analysis tweet media
English
9
1
53
12.2K
Eric Quinnell retweetledi
Clive Chan
Clive Chan@itsclivetime·
i believe it was @divBy_zero that wisely said that, contrary to common sense, it is always easier to fix performance problems in silicon than change an entrenched sw stack
English
3
2
19
4K
Clive Chan
Clive Chan@itsclivetime·
why isn't there a startup doing a RISC-V extension for the Python Virtual Machine put a bunch of PyCores on a chip and you win the whole "Agentic CPU" market
English
21
9
214
27.2K
Eric Quinnell
Eric Quinnell@divBy_zero·
@yacineMTB Confirmed anecdotal data point. (Mid 40s, but close enough. This vibe coding stuff is legit)
English
0
0
11
1.5K
kache
kache@yacineMTB·
Old computer professionals, people in their 50s, 60s, that started with assembly, are about to become weapons of mass destruction as they discover what they can do with codex-level tools
English
97
125
2.5K
146.6K
Eric Quinnell
Eric Quinnell@divBy_zero·
Callout to the many leads and engineers who worked this over the years, esp @rawat_ritvik @aaronsrogers and Pete. This will be a massive (and needed) upgrade to all cars and bots.
English
0
0
8
472
Eric Quinnell retweetledi
NASA
NASA@NASA·
Hello, Moon. It’s great to be back. Here’s a taste of what the Artemis II astronauts photographed during their flight around the Moon. Check out more photos from the mission: nasa.gov/artemis-ii-mul…
NASA tweet mediaNASA tweet mediaNASA tweet mediaNASA tweet media
English
10K
174K
809.9K
29.7M
Eric Quinnell
Eric Quinnell@divBy_zero·
@LottoLabs @ptremblay Pedantically you are correct, yes. Way less loss than current quantization, and that detail would derail non technical folks ever further, so I didn’t split the hairs
English
0
0
1
44
Lotto
Lotto@LottoLabs·
@ptremblay @divBy_zero Fair its no loss in accuracy but not lossless in the traditional sense of recovering bit for bit
English
1
0
2
49
Eric Quinnell
Eric Quinnell@divBy_zero·
@CliffLattner Yes, exactly. If doing inference, the compute will hide under the dram loads, even at high batch. For training, it is extra compute to pay.
English
2
0
3
151
Cliff Lattner
Cliff Lattner@CliffLattner·
@divBy_zero IMO its a no brainer that if you are willing to spend more cycles on quantization/dequantization, and forgo the savings you get from computing on the quantized data, you can do better than ordinary abs-max. I doubt that it is anywhere close to 8x over say fp8.
English
1
0
3
244
Eric Quinnell
Eric Quinnell@divBy_zero·
Two days of weird takes, so I must: “8x perf” is 32-bit baseline vs 4-bit compressed “KV cache” is merely a use case and hard to capitalize full perf. And yes, many already compress here. But it’s lossless. The others aren’t. We should have been using this all along.
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
5
3
51
7.3K
Eric Quinnell
Eric Quinnell@divBy_zero·
My flight lands, I exit the plane, look for the Baggage Claim sign, and then there it was: A 448Gpbs PAM4 Keysight waveform analyzer advertisement, on an LED backlight mega screen. Thank you SJC
English
0
1
61
2.9K
Eric Quinnell
Eric Quinnell@divBy_zero·
@GoogleResearch Classic compressor, seems obvious in retrospect, we should have all done this earlier. Well done Google Research, truly
English
0
0
7
1.2K
Google Research
Google Research@GoogleResearch·
Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI
GIF
English
1K
5.8K
39K
19.4M
Eric Quinnell retweetledi
kache
kache@yacineMTB·
ahahahahahahahaahhaahahahahahaha okay you guys were right this hardware shit is hard.
English
122
47
2.2K
152K