Pavan Jayasinha

558 posts

Pavan Jayasinha

@pavanjayasinha

ECE @UWaterloo // prev: quant research @citsecurities, cuda gpu perf @Modular, ASICs @extropic_ai + @UntetherAI, neutral atom QC @QuantumIQC

E7 4417 Katılım Şubat 2017

463 Takip Edilen3K Takipçiler

Sabitlenmiş Tweet

Pavan Jayasinha@pavanjayasinha·4 May

I implemented an LLM end-to-end in hardware, and ran it on an FPGA. Zero Python. Zero CUDA. Just pure SysVerilog. All my progress + everything I learned from 200h of LLM chip design (demo at the end)👇

English

274

2.7K

347K

Pavan Jayasinha retweetledi

AT@AliesTaha·26 Mar

x.com/i/article/2037…

ZXX

623

72.4K

Pavan Jayasinha retweetledi

AT@AliesTaha·7 Mar

- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.

AT@AliesTaha

x.com/i/article/2029…

English

107

1.2K

146.1K

Pavan Jayasinha@pavanjayasinha·26 Şub

@MajmudarAdam Soldidy supreme

English

358

adammaj@MajmudarAdam·26 Şub

"intention density" is behind the visceral difference between AI outputs that feel beautiful, human, designed vs. uninspired/slop it points at something much more specific than taste: how many distinct, willful decisions went into an output? how much of its structure can be attributed to intentionality vs. inevitability? when I watch a Ghibli film, I know that every detail and expression in every frame has been crafted with specific intent (Miyazaki personally drew/edited 80,000 of 144,000 frames in Princess Mononoke). I can feel the creator through the creation. in contrast, AI tools encourage work with far lower intention density by default. starting from a blank canvas, you're forced to confront thousands of micro decisions to create a final output. but now that you can write a one-sentence prompt and get a full app or video one-shot, all of these decisions get outsourced, often without you noticing they exist. there can still be high intention in the final work (ex: codex generated apps still feeling pretty good), but the source of this intention is "the way things are usually done" (coming from the model) rather than a particular vision or design. there's no reason AI output has to be like this though we can think of the creative process in 2 parts: 1. intention - what do you want to create? why? 2. execution - how do you create it? AI agents will clearly replace ~100% of the execution part of the creative process. they already have in software and will soon be in film/animation. as they shift up the chain and replace intention as well, creative output starts to feel more trite and indistinguishable. but for those who are careful to preserve and expand rather than offload their intentionality, they have more time than ever to focus on the details and create far more/better software, art, etc.

English

343

56.5K

Pavan Jayasinha@pavanjayasinha·25 Şub

@MajmudarAdam Code and writing

English

adammaj@MajmudarAdam·25 Şub

@pavanjayasinha thanks dogz, in what context?

English

Pavan Jayasinha@pavanjayasinha·25 Şub

@MajmudarAdam Those two words themselves unlock much for me since ive been dancing around that concept without a good compressed ptr word for it.

English

Pavan Jayasinha@pavanjayasinha·19 Şub

@AliesTaha 🔥⚡️ a soldid bog piece of work

English

112

AT@AliesTaha·19 Şub

we quantized the best open-source diffusion model on the market 4 bits huge speedup (almost) no quality loss this is a full explanation of the trillion dollar industry's oldest trick

English

8.2K

Pavan Jayasinha@pavanjayasinha·1 Ara

@hayou_soufiane @saagnikkk @lifan__yuan @dilekhakkanitur @haopeng_uiuc I see, weights are in BF16, gradients/optimizer states in fp32 is default in VERL. Relevant work you might be interested in: arxiv.org/pdf/2510.26788

English

Soufiane Hayou@hayou_soufiane·1 Ara

@pavanjayasinha @saagnikkk @lifan__yuan @dilekhakkanitur @haopeng_uiuc yeah my original question is about precision, which in that figure I can only see 1e-5 tolerance. Is your bitwise comparison in FP16?

English

113

Sagnik@saagnikkk·30 Kas

🚨New Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) 🤯 Looks like a "free lunch". Maybe It’s time to rethink the optimizers for RLVR 🧵

English

480

172.7K

Pavan Jayasinha@pavanjayasinha·1 Ara

@hayou_soufiane @saagnikkk @lifan__yuan @dilekhakkanitur @haopeng_uiuc exactly-zero means weight_after_update is bitwise identical to weight_before_update. Does not necessitate that the gradient is 0, nor even the update is bitwise 0.

English

Soufiane Hayou@hayou_soufiane·1 Ara

@saagnikkk @pavanjayasinha @lifan__yuan @dilekhakkanitur @haopeng_uiuc So "Exactly-Zero Updates" means the gradient is within 1e-5 distance from 0?

English

120

Pavan Jayasinha@pavanjayasinha·1 Ara

Long live SGD! Surprising to me that LR=0.1 (!!) with zero momentum worked so well. We invite you to ponder: what would an RL-native PEFT algorithm that turns RLVR’s parameter efficiency into compute efficiency look like? 🤔

Sagnik@saagnikkk

English

914

Pavan Jayasinha retweetledi

Andrew Zhao@_AndrewZhao·30 Kas

Literally who’s Adam

Sagnik@saagnikkk

English

9.4K

Pavan Jayasinha@pavanjayasinha·1 Ara

@josancamon19 @saagnikkk @lifan__yuan @dilekhakkanitur @haopeng_uiuc It is 36x more param efficient in the sense that it achieves same validation performance while modifying 36x less parameters. Re compute efficiency - we observed SGD GRPO to be faster than Adam GRPO but not by a lot (expected). The highlight here is instead memory efficiency.

English

146

Joan Cabezas@josancamon19·1 Ara

@saagnikkk @pavanjayasinha @lifan__yuan @dilekhakkanitur @haopeng_uiuc 2nd point is bait? Had to think for a second if it was more computer efficient, no, right?

English

795

Pavan Jayasinha@pavanjayasinha·20 Kas

@sanepplichar @hyhieu226 I see 😂

English

197

feci sekilde geseli@sarapplichar·20 Kas

@pavanjayasinha @hyhieu226 Thats the joke sir…

English

530

Hieu Pham@hyhieu226·20 Kas

Yeah. The nuance of that story is even more impressive. Not only did Google not use any NVIDIA GPUs, they also completely avoided CUDA. I bet your favorite tech bros didn't tell you that.

Kyle Chan@kyleichan

This is the big story here. Google trained Gemini 3 Pro on Google’s own TPUs. No mention of Nvidia chips.

English

138.4K

Pavan Jayasinha@pavanjayasinha·15 Kas

@Almondgodd Super Soldidy

English

226

anandmaj@Almondgodd·14 Kas

I spent the past year building AI for robots at Tesla Optimus and Dyna Now I'm introducing ANDROID DREAMS: an essay of my predictions for the next 20 years of robotics, inspired by Situational Awareness and AI2027. I predict EGI by 2031 and more robots than humans by 2045👇🏼

English

113

248

342.9K

Pavan Jayasinha@pavanjayasinha·21 Eki

@edenchan @omkizzy @omkizzy 🐐

QME

258

Eden Chan@edenchan·21 Eki

@pavanjayasinha management engineering is tuff look at @omkizzy

English

333

Pavan Jayasinha@pavanjayasinha·20 Eki

Ngl it should be rated lower

Jlau@JustinLau04

management engineering at waterloo is insanely underrated you get waterloo’s co-op, way easier courses, brand name, professors that care about their jobs, and more software exposure than any other engineering program

English

8.2K

Pavan Jayasinha@pavanjayasinha·15 Eki

@itsclivetime ❤️

QME

124

Clive Chan@itsclivetime·13 Eki

Really happy to be announcing the chips we’ve been cooking the past 18 months! OpenAI kicked off the reasoning wave with o1, but months before that we’d already started designing a chip tuned precisely for reasoning inference of OpenAI models. In January 2024, I joined OpenAI as a hybrid gpu programmer & custom chip designer, the first IC on an oddly positioned hardware team that hadn’t yet committed to the idea of custom chips. These past 21 months I’m so lucky to have gotten the chance to learn from this incredibly talented and tiny team, accelerated by tight codesign with our ML team, Broadcom, and a few really cool new AI tools ;) Now we’re 9 months away from what is I think the fastest and largest volume ramp of any first time chip. Looking forward to pushing the cost and latency of intelligence to zero.

OpenAI Newsroom@OpenAINewsroom

We're partnering with Broadcom to deploy 10GW of chips designed by OpenAI. Building our own hardware, in addition to our other partnerships, will help all of us meet the world’s growing demand for AI. openai.com/index/openai-a…

English

1.1K

338.6K

Keşfet

@MajmudarAdam @AliesTaha @hayou_soufiane @saagnikkk @lifan__yuan @dilekhakkanitur @haopeng_uiuc @josancamon19