grant pal

8K posts

grant pal

@itsgrantpal

a little bit of this, and a little bit of that

Chicago, IL Katılım Kasım 2007

570 Takip Edilen456 Takipçiler

Sabitlenmiş Tweet

grant pal@itsgrantpal·16 Oca

A reply personified in a structure jsonified from tokens classified by an algorithm amplified with hardware commodified in a world forever unsatisfied

English

2.6K

grant pal@itsgrantpal·5h

@SIGKITTEN happy pappi day, daddy

English

grant pal@itsgrantpal·6h

@sierracatalina i had this phone! (but in actual chocolate colour)

English

⚪️ sierra catalina@sierracatalina·6h

here.

Old Media@oldmedia

LG Chocolate (2006)

English

2.2K

grant pal@itsgrantpal·8h

@SanthProject @ajambrosino 10-4!

Santh@SanthProject·8h

@itsgrantpal @ajambrosino Cli is good too buddy

English

Andrew Ambrosino@ajambrosino·8h

meanwhile

Arnav Gupta@championswimmer

If Codex wins over Claude Code it will be purely because 1. Claude team truly treats the user interface like shit (they don't fix widely reported bugs and inconveniences for months, idk what does Boris run his infinite token loops for even?) 2. They keep overselling this "coding is solved" when clearly they cannot create a good frontend product across their mobile app, their website or their TUI. Claude mobile app is a horrible product, the desktop app is so buggy, conversations hang, get lost, remain dangling.... it is almost as if no one in the team ever tries their own products for 5 minutes

English

194

26.1K

grant pal@itsgrantpal·8h

@SamSullivan happy twitter long time

English

Sam@SamSullivan·17h

Old nuclear site on the left. Indian Point. Also, happy 15 years on X to me. Kinda miss the old days. Road tripping upstate this weekend. ☀️☀️☀️

English

208

grant pal@itsgrantpal·8h

@gabriel1 not enough meatbags

English

gabriel@gabriel1·10h

this took so long for me to understand: the bottleneck to more innovation is not more high intelligence people, but more people having an interest in hard problems it's impossible to create new useful things if you don't get immense happiness from making that thing

English

105

1.4K

150.7K

grant pal@itsgrantpal·8h

@OrdinaryInds people who use hand sanitizer end up with these keyboards 3-5x as fast btw

English

Jack Fields@OrdinaryInds·9h

Wash 👏 your 👏 hands 👏

Maaz Perwez@MaazMz

The biggest problem with Mac...

English

3.3K

grant pal@itsgrantpal·9h

@mweinbach @CXCarroll @HotAisle thanks for explaining!

English

Max Weinbach@mweinbach·9h

You can look at the math to complete the operation and the memory bandwidth to generate a token Both of these are set in hardware as peak performance. You can make the math less intensive (generally helps prefill) but decode is bound by memory bandwidth. You can speed this up with smaller models for speculative decoding (generates a token and larger model approves or denies), but you still have a compute cost that’s limited and you’re able to calculate You could MAYBE do 30 tok/s but this doesn’t meaningfully change it. There’s bottlenecks everywhere.

English

Max Weinbach@mweinbach·11h

The minimum to run the model is ~$20K in hardware and you get ~20 tok/s out ~$20K gets you around 34.6B tokens at a 12:1 input to output ratio assuming good token caching If you ran the hardware 24/7, it would take roughly 5.5 years to break even

Jordan Nanos@JordanNanos

GLM 5.2 costs $1.40/4.40 per Mtok at 40 tok/sec and people seriously consider buying GPU rigs for it

English

679

128.2K

grant pal@itsgrantpal·10h

@usr_bin_roygbiv @wolfie_ from cotton fields to tokon yields

English

Roy@usr_bin_roygbiv·10h

@wolfie_ rent some blackwells

English

247

wolfie@wolfie_·10h

glm 5.2 at glm 5.1 speeds would be so OP

San Francisco, CA 🇺🇸 English

507

grant pal@itsgrantpal·10h

@CXCarroll @mweinbach i trust @HotAisle to tell me you're right and i need to learn more

English

CXCarroll@CXCarroll·10h

@itsgrantpal @mweinbach The software primitives for mathematical operations are written in Assembly and are highly optimized for each hardware stack (Intel OneMKL, AMD BLIS, etc.). The engineers know what the theoretical limits are (because it's math) and they're basically at the limit.

English

grant pal@itsgrantpal·10h

@hotschmoe 2-3 years earlier than I was dabbling with deltas! and even then ('17) it was like spinning 200 plates to yield a successful print. couldn't imagine..

English

StrongEngineer_@hotschmoe·10h

even though we had a $10,000+ printer in the office, it was still geniuely faster to build models with foam board (circa 2015)

grant pal@itsgrantpal

@hotschmoe early 3d print days were ROUGH

English

113

grant pal@itsgrantpal·10h

@CXCarroll @mweinbach i still believe optimizing the stack yields benefits beyond calling hardware an immovable metric

English

141

CXCarroll@CXCarroll·10h

@itsgrantpal @mweinbach Inference is basically a ton of linear algebra. From a Comp Sci standpoint, math like that is "solved" in terms of how much can theoretically be done on a given piece of hardware in a given period of time. The software primitives for math are highly optimized. Max is correct.

English

148

grant pal@itsgrantpal·10h

@jenbegakis unreal tournament vibes

English