Paco

347 posts

Paco

Paco

@ProctorMan

参加日 Eylül 2010
94 フォロー中44 フォロワー
Paco
Paco@ProctorMan·
@BillSimmons if the bulls lose on Thursday they’re 0 for February
English
0
0
0
2
Lance
Lance@lfc_lance·
Guess the player 99.7% will fail this
Lance tweet media
English
8.8K
537
15.6K
2M
Paco
Paco@ProctorMan·
@ryenarussillo - catching up on the pod during my commute this am…he’s got a ton of goats…appreciate the laugh
English
0
0
0
23
Paco
Paco@ProctorMan·
@BrianRoemmele This paper came out a year ago. Why are hyperscaler capex investments still so high? Oracle still inking 5 year deals…
English
0
0
0
31
Brian Roemmele
Brian Roemmele@BrianRoemmele·
BOOM! MAJOR AI SPEEDUP! Hot Rod AI 100 times faster inference 100,000 times less power! — Reviving Analog Circuits: A Leap Toward Ultra-Efficient AI with In-Memory Attention I got my start in analog electronics when I was a kid and always thought analog computers would make a comeback. Analog computing of the 1960s neural networks used voltage-based circuits rather than binary clocks. Analog is Faster Than Digital Large language models at their core lies the transformer architecture, where self-attention mechanisms sift through vast sequences of data to predict the next word or token. On conventional GPUs, shuttling data between memory caches and processing units devours time and energy, bottlenecking the entire system. They require a clock cycle to precisely move bits in and out of memory and registers and this is >90% of the time and energy overhead. But now a groundbreaking study proposes a custom in-memory computing setup that could slash these inefficiencies, potentially reshaping how we deploy generative AI. The innovation centers on "gain cells"—emerging charge-based analog memories that double as both storage and computation engines. Unlike digital GPUs, which laboriously load token projections from cache into SRAM for each generation step, this architecture keeps data where the math happens: right ON THE CHIP! With a clock speed near the THE SPEED OF LIGHT because it is never on/off like in digital binary. By leveraging parallel analog dot-product operations, the design computes self-attention natively, sidestepping the data movement that plagues GPU hardware. To bridge the gap between ideal digital models and the noisy realities of analog circuits, the researchers devised a clever initialization algorithm. This method adapts pre-trained LLMs, such as GPT-2, without the need for full retraining, ensuring seamless performance parity despite non-idealities like voltage drifts or precision limits. The results are nothing short of staggering! Simulations show the system slashing attention latency to 100 times faster inference for token generation—while curbing energy use by a jaw-dropping five orders of magnitude, or 100,000 times less power-hungry than GPU baselines. For context, this could mean running a full LLM on a device no larger than a a card deck, without any thermal throttling or grid-straining demands of today's data centers. The approach targets the attention block specifically, the transformer’s energy hog, but slso broader integration with other in-memory techniques to turbocharge the entire model pipeline. Analog tech isn't pie-in-the-sky quantum wizardry; it's grounded in ancient mature electronics theory, with gain cells already prototyped in labs. The only engineering issue, and it is simple: tolerances for noise, scaling arrays of cells, and fabricating at microchip densities. Existing CMOS processes tweaks for analog fidelity. From there, Full ecosystem integration, including software stacks for model adaptation, could happen in a year, disrupting GPU dominance sooner than skeptics predict. Risks are low but hybrid digital-analog interfaces could introduce unforeseen bugs. However this can be rapidly iterated and addressed. This isn't just hardware tinkering; it's a philosophical pivot back to AI's analog origins, where computation flows continuously rather than ticking in discrete cycles. This in-memory attention could democratize AI power, making low power, lightning-fast AI not a luxury, but an inevitability to even the smallest devices. Most have no idea how big this is: It is the biggest shift in AI since the invention of LLMs. The world will struggle to find true experienced analog engineers, most are gone. In my garage I will have a test Analog CMOS Gain Cells using off the self parts in the next few days, if Radio Shack was still around I would have have done today. I suspect I can scale to a proto AI model in a few weeks. PAPER: arxiv.org/abs/2409.19315
Brian Roemmele tweet media
English
123
215
1.2K
475.1K
Cernovich
Cernovich@Cernovich·
The EU has declared war on the United States of America.
English
1.7K
1.5K
14.7K
6.4M
Paco
Paco@ProctorMan·
@BillSimmons - first the prem pod and now Russillo? That’s my commuting routine. This hurts
English
0
1
1
124
Paco
Paco@ProctorMan·
@AnthonyDabbundo Hopefully not podding while watching will help this time around
English
0
0
1
150
Anthony Dabbundo
Anthony Dabbundo@AnthonyDabbundo·
i bet spain -131 🇪🇸a couple days ago — fully expecting england to ruin me once again on some late match absurdity
Paco@ProctorMan

@AnthonyDabbundo what’s the play for the women’s final?

English
1
0
6
4.2K
Anthony Dabbundo
Anthony Dabbundo@AnthonyDabbundo·
Sinner/Alcaraz first ever grand slam final meeting on Sunday at 9am is appointment television These two guys are remarkably clear of the rest of the world right now
English
8
2
32
7.8K
Paco
Paco@ProctorMan·
@JoeFortenbaugh - You're gonna be a genius anyway! Nothing new, but you’re the man
English
0
0
0
9
Paco
Paco@ProctorMan·
@Ceruti nil - nil +700 looks great for this early match. I’m a United fan and I just don’t see how we score
English
0
0
0
21
Paco
Paco@ProctorMan·
@fantasyEK - your years long stat about NBA home teams with extra rest saved me tonight - much appreciated
English
0
0
0
28