Pavan Jayasinha

558 posts

Pavan Jayasinha banner
Pavan Jayasinha

Pavan Jayasinha

@pavanjayasinha

ECE @UWaterloo // prev: quant research @citsecurities, cuda gpu perf @Modular, ASICs @extropic_ai + @UntetherAI, neutral atom QC @QuantumIQC

E7 4417 Katılım Şubat 2017
463 Takip Edilen3K Takipçiler
Sabitlenmiş Tweet
Pavan Jayasinha
Pavan Jayasinha@pavanjayasinha·
I implemented an LLM end-to-end in hardware, and ran it on an FPGA. Zero Python. Zero CUDA. Just pure SysVerilog. All my progress + everything I learned from 200h of LLM chip design (demo at the end)👇
Pavan Jayasinha tweet media
English
90
274
2.7K
347K
Pavan Jayasinha retweetledi
AT
AT@AliesTaha·
- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.
AT@AliesTaha

x.com/i/article/2029…

English
19
107
1.2K
146.1K
adammaj
adammaj@MajmudarAdam·
"intention density" is behind the visceral difference between AI outputs that feel beautiful, human, designed vs. uninspired/slop it points at something much more specific than taste: how many distinct, willful decisions went into an output? how much of its structure can be attributed to intentionality vs. inevitability? when I watch a Ghibli film, I know that every detail and expression in every frame has been crafted with specific intent (Miyazaki personally drew/edited 80,000 of 144,000 frames in Princess Mononoke). I can feel the creator through the creation. in contrast, AI tools encourage work with far lower intention density by default. starting from a blank canvas, you're forced to confront thousands of micro decisions to create a final output. but now that you can write a one-sentence prompt and get a full app or video one-shot, all of these decisions get outsourced, often without you noticing they exist. there can still be high intention in the final work (ex: codex generated apps still feeling pretty good), but the source of this intention is "the way things are usually done" (coming from the model) rather than a particular vision or design. there's no reason AI output has to be like this though we can think of the creative process in 2 parts: 1. intention - what do you want to create? why? 2. execution - how do you create it? AI agents will clearly replace ~100% of the execution part of the creative process. they already have in software and will soon be in film/animation. as they shift up the chain and replace intention as well, creative output starts to feel more trite and indistinguishable. but for those who are careful to preserve and expand rather than offload their intentionality, they have more time than ever to focus on the details and create far more/better software, art, etc.
English
27
35
343
56.5K
Pavan Jayasinha
Pavan Jayasinha@pavanjayasinha·
@MajmudarAdam Those two words themselves unlock much for me since ive been dancing around that concept without a good compressed ptr word for it.
English
0
0
1
19
AT
AT@AliesTaha·
we quantized the best open-source diffusion model on the market 4 bits huge speedup (almost) no quality loss this is a full explanation of the trillion dollar industry's oldest trick
AT tweet media
English
3
6
32
8.2K
Sagnik
Sagnik@saagnikkk·
🚨New Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) 🤯 Looks like a "free lunch". Maybe It’s time to rethink the optimizers for RLVR 🧵
Sagnik tweet media
English
16
58
480
172.7K
Pavan Jayasinha
Pavan Jayasinha@pavanjayasinha·
Long live SGD! Surprising to me that LR=0.1 (!!) with zero momentum worked so well. We invite you to ponder: what would an RL-native PEFT algorithm that turns RLVR’s parameter efficiency into compute efficiency look like? 🤔
Sagnik@saagnikkk

🚨New Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) 🤯 Looks like a "free lunch". Maybe It’s time to rethink the optimizers for RLVR 🧵

English
1
1
5
914
Pavan Jayasinha
Pavan Jayasinha@pavanjayasinha·
@josancamon19 @saagnikkk @lifan__yuan @dilekhakkanitur @haopeng_uiuc It is 36x more param efficient in the sense that it achieves same validation performance while modifying 36x less parameters. Re compute efficiency - we observed SGD GRPO to be faster than Adam GRPO but not by a lot (expected). The highlight here is instead memory efficiency.
English
1
0
3
146
anandmaj
anandmaj@Almondgodd·
I spent the past year building AI for robots at Tesla Optimus and Dyna Now I'm introducing ANDROID DREAMS: an essay of my predictions for the next 20 years of robotics, inspired by Situational Awareness and AI2027. I predict EGI by 2031 and more robots than humans by 2045👇🏼
English
113
248
2K
342.9K
Clive Chan
Clive Chan@itsclivetime·
Really happy to be announcing the chips we’ve been cooking the past 18 months! OpenAI kicked off the reasoning wave with o1, but months before that we’d already started designing a chip tuned precisely for reasoning inference of OpenAI models. In January 2024, I joined OpenAI as a hybrid gpu programmer & custom chip designer, the first IC on an oddly positioned hardware team that hadn’t yet committed to the idea of custom chips. These past 21 months I’m so lucky to have gotten the chance to learn from this incredibly talented and tiny team, accelerated by tight codesign with our ML team, Broadcom, and a few really cool new AI tools ;) Now we’re 9 months away from what is I think the fastest and largest volume ramp of any first time chip. Looking forward to pushing the cost and latency of intelligence to zero.
OpenAI Newsroom@OpenAINewsroom

We're partnering with Broadcom to deploy 10GW of chips designed by OpenAI. Building our own hardware, in addition to our other partnerships, will help all of us meet the world’s growing demand for AI. openai.com/index/openai-a…

English
51
70
1.1K
338.6K