Chetan Verma

342 posts

Chetan Verma

Chetan Verma

@chtnverma

guy who forwards reddit jokes. also ml engineer. @iitmadras @ucsd @twitter @google

San Francisco, CA Katılım Şubat 2010
1.2K Takip Edilen405 Takipçiler
Chetan Verma retweetledi
Prateek Jain
Prateek Jain@jainprateek_·
We are hiring Research Scientists for our Frontiers-of-AI team at Google DeepMind Bangalore, Singapore, Mountain View. If you're passionate about cutting-edge AI research and building thinking, efficient, elastic, customized, and safe LLMs, we'd love to hear from you. We are looking for candidates with a PhD and a strong demonstrated record of ideating and executing deep research projects. If interested, please apply here: job-boards.greenhouse.io/deepmind/jobs/…
English
30
90
833
355.6K
Chetan Verma
Chetan Verma@chtnverma·
Please come talk to me at #KDD2025 if you're interested in learning more :)
English
1
0
1
130
Chetan Verma
Chetan Verma@chtnverma·
📢 Excited to present our paper at ACM KDD 2025 Conference Matryoshka Model Learning for Improved Elastic Student Models lnkd.in/gYgXrngq 🪆🙌↓
Aditya Timmaraju@tadityasrinivas

The Matryoshka🪆wave strikes again! 🚀 Excited to share our latest work, accepted to KDD 2025: Matryoshka Model Learning for Improved Elastic Student Models! arxiv.org/abs/2505.23337 We introduce MatTA, a novel nested distillation framework which enables the extraction of multiple high quality student models from a single training run, enhancing adaptability in production ML systems. A thread. 🧵 (1/6) cc @ManishGuptaMG1 @jainprateek_

English
1
3
16
2K
Chetan Verma retweetledi
After Dinner
After Dinner@AfterDinnerCo·
@friedberg Emergency pod, but no Jason.
English
25
5
881
92.6K
Chetan Verma retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Over the last ~2 hours I curated a new Podcast of 10 episodes called "Histories of Mysteries". Find it up on Spotify here: open.spotify.com/show/3K4LRyMCP… 10 episodes of this season are: Ep 1: The Lost City of Atlantis Ep 2: Baghdad battery Ep 3: The Roanoke Colony Ep 4: The Antikythera Mechanism Ep 5: Voynich Manuscript Ep 6: Late Bronze Age collapse Ep 7: Wow! signal Ep 8: Mary Celeste Ep 9: Göbekli Tepe Ep 10: LUCA: Last Universal Common Ancestor Process: - I researched cool topics using ChatGPT, Claude, Google - I linked NotebookLM to the Wikipedia entry of each topic and generated the podcast audio - I used NotebookLM to also write the podcast/episode descriptions. - Ideogram to create all digital art for the episodes and the podcast itself - Spotify to upload and host the podcast I did this as an exploration of the space of possibility unlocked by generative AI, and the leverage afforded by the use of AI. The fact that I can, as a single person in 2 hours, curate (not create, but curate) a podcast is I think kind of incredible. I also completely understand and acknowledge the potential and immediate critique here, of AI generated slop taking over the internet. I guess - have a listen to the podcast when you go for walk/drive next time and see what you think.
Andrej Karpathy tweet media
English
383
786
7.6K
705.6K
Chetan Verma retweetledi
Awni Hannun
Awni Hannun@awnihannun·
The Transformer architecture has changed surprisingly little from the original paper in 2017 (over 7 years ago!). The diff: - The nonlinearity in the MLP has undergone some refinement. Almost every model uses some form of gated nonlinearity. A silu or gelu nonlinearity is common. - The placement of normalization layers. This tends to vary a little from architecture to architecture. Sometimes more normalization layers per Transformer block (e.g.Gemma 2). Sometimes keys and queries are normalized (e.g. Command+R). - The type of normalization layer. RMS norm is commonly used instead of Layer Norm. All of Llama 3, Phi 3 and Gemma 2 use RMS norm now. Seems like vanilla Layer Norm is becoming a little less common. - Group-query attention is now a staple as it really speeds up inference for larger KV cache's (e.g. longer prompts / generations). - And of course the positional encodings have changed from sinusoidal to rotary (aka RoPE). Not too much variation otherwise.
English
24
141
1K
124.3K
Chetan Verma retweetledi
Gaby Goldberg
Gaby Goldberg@gaby_goldberg·
Every tech groupchat rn
Gaby Goldberg tweet media
English
52
541
8.7K
663.9K
Chetan Verma retweetledi
Mckay Wrigley
Mckay Wrigley@mckaywrigley·
You can give ChatGPT a picture of your team’s whiteboarding session and have it write the code for you. This is absolutely insane.
English
623
4.6K
29.6K
11.5M
Chetan Verma retweetledi
Brian Feroldi
Brian Feroldi@BrianFeroldi·
15 visuals every investor should memorize: 1: In the long run, stocks win:
Brian Feroldi tweet media
English
592
3.6K
18.5K
8.7M
Chetan Verma
Chetan Verma@chtnverma·
friendships were forged
Chetan Verma tweet media
English
0
0
9
538
Chetan Verma
Chetan Verma@chtnverma·
@sdachen Yeah if your LinkedIn doesn’t have “He …” then you still haven’t made it, Scott :)
English
0
0
2
112
Scott Deeann Chen
Scott Deeann Chen@sdachen·
@chtnverma I wrote it in an compacted (news headlines) style where "I" is the implied/omitted subjects of all sentences. I suppose this doesn't count. 😂
English
1
0
2
70
Chetan Verma
Chetan Verma@chtnverma·
have you really made it if your linkedin isn't written in 3rd person?
English
1
0
4
558
Chetan Verma retweetledi
Adam Grant
Adam Grant@AdamMGrant·
We pay too much attention to the most confident voices—and too little attention to the most thoughtful ones. Certainty is not a sign of credibility. Speaking assertively is not a substitute for thinking deeply. It's better to learn from complex thinkers than smooth talkers.
English
302
6.7K
25K
0
Chetan Verma retweetledi
Jane Manchun Wong
Jane Manchun Wong@wongmjane·
Twitter’s source code is full of libs. It’s time to get rid of them
Jane Manchun Wong tweet media
English
172
680
10.1K
0