
Gokul Santhanam
7K posts

Gokul Santhanam
@gokstudio
Senior ML Engineer @🍎, Working in the intersection btwn Multimodal LLMs, efficient image encoders, and on device ML. Views my own, Retweet != Endorsement.

















The Evolution Of Indian Currency:


Spent time with Q-learning and value-based RL today. Fun thing I noticed: the Bellman equation follows the same logic as dynamic programming.



in our most recent work we study data sparsity (ρ) - the dual axis to weight sparsity in standard token-choice MoEs. composing both weight and data sparsity improves training compute efficiency.


Introducing Cowork: Claude Code for the rest of your work. Cowork lets you complete non-technical tasks much like how developers use Claude Code.









It’s crazy how every study done on coffee shows significant benefits for basically every organ system




Google 7 and 8 year old TPUs are running at 100% utilization Old fully depreciated chips running hot




