Chetan Verma

342 posts

Chetan Verma

@chtnverma

guy who forwards reddit jokes. also ml engineer. @iitmadras @ucsd @twitter @google

San Francisco, CA Katılım Şubat 2010

1.2K Takip Edilen405 Takipçiler

Chetan Verma retweetledi

Prateek Jain@jainprateek_·29 Eki

We are hiring Research Scientists for our Frontiers-of-AI team at Google DeepMind Bangalore, Singapore, Mountain View. If you're passionate about cutting-edge AI research and building thinking, efficient, elastic, customized, and safe LLMs, we'd love to hear from you. We are looking for candidates with a PhD and a strong demonstrated record of ideating and executing deep research projects. If interested, please apply here: job-boards.greenhouse.io/deepmind/jobs/…

English

833

355.6K

Chetan Verma@chtnverma·29 Tem

cc: @tadityasrinivas @inderjit_ml @cho_jui_hsieh @jainprateek_

129

Chetan Verma@chtnverma·29 Tem

Please come talk to me at #KDD2025 if you're interested in learning more :)

English

130

Chetan Verma@chtnverma·29 Tem

📢 Excited to present our paper at ACM KDD 2025 Conference Matryoshka Model Learning for Improved Elastic Student Models lnkd.in/gYgXrngq 🪆🙌↓

Aditya Timmaraju@tadityasrinivas

The Matryoshka🪆wave strikes again! 🚀 Excited to share our latest work, accepted to KDD 2025: Matryoshka Model Learning for Improved Elastic Student Models! arxiv.org/abs/2505.23337 We introduce MatTA, a novel nested distillation framework which enables the extraction of multiple high quality student models from a single training run, enhancing adaptability in production ML systems. A thread. 🧵 (1/6) cc @ManishGuptaMG1 @jainprateek_

English

Chetan Verma retweetledi

After Dinner@AfterDinnerCo·5 Haz

@friedberg Emergency pod, but no Jason.

English

881

92.6K

Chetan Verma retweetledi

Andrej Karpathy@karpathy·3 Eki

Over the last ~2 hours I curated a new Podcast of 10 episodes called "Histories of Mysteries". Find it up on Spotify here: open.spotify.com/show/3K4LRyMCP… 10 episodes of this season are: Ep 1: The Lost City of Atlantis Ep 2: Baghdad battery Ep 3: The Roanoke Colony Ep 4: The Antikythera Mechanism Ep 5: Voynich Manuscript Ep 6: Late Bronze Age collapse Ep 7: Wow! signal Ep 8: Mary Celeste Ep 9: Göbekli Tepe Ep 10: LUCA: Last Universal Common Ancestor Process: - I researched cool topics using ChatGPT, Claude, Google - I linked NotebookLM to the Wikipedia entry of each topic and generated the podcast audio - I used NotebookLM to also write the podcast/episode descriptions. - Ideogram to create all digital art for the episodes and the podcast itself - Spotify to upload and host the podcast I did this as an exploration of the space of possibility unlocked by generative AI, and the leverage afforded by the use of AI. The fact that I can, as a single person in 2 hours, curate (not create, but curate) a podcast is I think kind of incredible. I also completely understand and acknowledge the potential and immediate critique here, of AI generated slop taking over the internet. I guess - have a listen to the podcast when you go for walk/drive next time and see what you think.

English

383

786

7.6K

705.6K

Chetan Verma retweetledi

Awni Hannun@awnihannun·2 Tem

The Transformer architecture has changed surprisingly little from the original paper in 2017 (over 7 years ago!). The diff: - The nonlinearity in the MLP has undergone some refinement. Almost every model uses some form of gated nonlinearity. A silu or gelu nonlinearity is common. - The placement of normalization layers. This tends to vary a little from architecture to architecture. Sometimes more normalization layers per Transformer block (e.g.Gemma 2). Sometimes keys and queries are normalized (e.g. Command+R). - The type of normalization layer. RMS norm is commonly used instead of Layer Norm. All of Llama 3, Phi 3 and Gemma 2 use RMS norm now. Seems like vanilla Layer Norm is becoming a little less common. - Group-query attention is now a staple as it really speeds up inference for larger KV cache's (e.g. longer prompts / generations). - And of course the positional encodings have changed from sinusoidal to rotary (aka RoPE). Not too much variation otherwise.

English

141

124.3K

Chetan Verma@chtnverma·21 May

grug be true. warning: stomach hurt laughing grugbrain.dev

English

Chetan Verma retweetledi

Gaby Goldberg@gaby_goldberg·18 Kas

Every tech groupchat rn

English

541

8.7K

663.9K

Chetan Verma retweetledi

Mckay Wrigley@mckaywrigley·27 Eyl

You can give ChatGPT a picture of your team’s whiteboarding session and have it write the code for you. This is absolutely insane.

English

623

4.6K

29.6K

11.5M

Chetan Verma retweetledi

Brian Feroldi@BrianFeroldi·28 Haz

15 visuals every investor should memorize: 1: In the long run, stocks win:

English

592

3.6K

18.5K

8.7M

Chetan Verma@chtnverma·24 Şub

friendships were forged

English

538

Chetan Verma@chtnverma·8 Şub

@sdachen Yeah if your LinkedIn doesn’t have “He …” then you still haven’t made it, Scott :)

English

112

Scott Deeann Chen@sdachen·8 Şub

@chtnverma I wrote it in an compacted (news headlines) style where "I" is the implied/omitted subjects of all sentences. I suppose this doesn't count. 😂

English

Chetan Verma@chtnverma·8 Şub

have you really made it if your linkedin isn't written in 3rd person?

English

558

Chetan Verma retweetledi

Adam Grant@AdamMGrant·9 Ara

We pay too much attention to the most confident voices—and too little attention to the most thoughtful ones. Certainty is not a sign of credibility. Speaking assertively is not a substitute for thinking deeply. It's better to learn from complex thinkers than smooth talkers.

English

302

6.7K

25K

Chetan Verma retweetledi

Jane Manchun Wong@wongmjane·9 Ara

Twitter’s source code is full of libs. It’s time to get rid of them

English

172

680

10.1K

Keşfet

@tadityasrinivas @inderjit_ml @cho_jui_hsieh @jainprateek_ @friedberg @sdachen @elonmusk @BarackObama