Sumanth Dathathri

914 posts

Sumanth Dathathri

@sdathath

Researcher at Google DeepMind. Past: Caltech, IIT Madras.

Pasadena, CA Beigetreten Kasım 2014

641 Folgt366 Follower

Sumanth Dathathri@sdathath·2d

@reasoningmodels @SawyerMerritt Extending the same line of reasoning, Apple & Google should have a valuation of 14 T (they're both around ~4.5x smaller)

English

Reasoning Models@reasoningmodels·2d

@SawyerMerritt Pretty incredible, generating $2B/mo, so $24B/year. So 35x annual revenue multiple. This is like a company doing $10M/year in ARR raising at a $350M valuation. Which I think does happen quite a bit, so maybe nothing too unusual here right??

English

921

Sawyer Merritt@SawyerMerritt·2d

NEWS: OpenAI just announced that it has officially closed their latest funding round with $122 billion in committed capital at a post money valuation of $852 billion. "We are now generating $2B in revenue per month. At this stage, we are growing revenue four times faster than the companies who defined the Internet and mobile eras, including Alphabet and Meta. ChatGPT has more than 900 million weekly active users, and over 50 million subscribers. Search usage has nearly tripled in a year, and our ads pilot reached more than $100 million in ARR in under six weeks. Momentum is just as strong on the enterprise side, which now makes up more than 40% of our revenue, and is on track to reach parity with consumer by the end of 2026. GPT‑5.4 is driving record engagement across agentic workflows. Our APIs now process more than 15 billion tokens per minute. Codex now serves over 2 million weekly users, up 5x in the past three months, with usage growing more than 70% month over month."

English

269

262

2.6K

1.9M

Sumanth Dathathri@sdathath·4d

@pfau TBF, Alec is an excellent name for a chatbot!

English

539

David Pfau@pfau·4d

Oh god are we really doing this? Jeff Dean trained an n-gram model on the entire internet in 2007. Jelinek coined the term "language model" in the '70s. It's called "Claude" because Claude Shannon was estimating the entropy rate of the English language in 1951!

Aran Komatsuzaki@arankomatsuzaki

While Alec is one of the best ML researchers of all time, LLM started way before. Here's one from 2013 for non-neural architecture and one from 2016, which is afaik the first neural LLM if we define LLM as LM w/ >1B params.

English

1.4K

467.9K

Sumanth Dathathri@sdathath·5d

@deedydas Very good engagement farming!

English

Deedy@deedydas·5d

Google Senior Staff Engineer to me: “Yeah, I have no clue what Claude Code / Codex is but I hear it’s all the rage. No, I don’t really care, I just need GOOG to hit $400 and keep this job for 2-3 more years so I can retire!”

English

262

102

845.6K

Sumanth Dathathri@sdathath·20 Mar

@sedielem @volokuleshov I understand Elon's tweet as implying it's the modeling paradigm that all the hype should be about, not the architecture. He knows it all.

English

489

Sander Dieleman@sedielem·20 Mar

@volokuleshov What does he know? Not the difference between a model architecture and a modelling paradigm, apparently🙃

English

291

9.5K

Volodymyr Kuleshov 🇺🇦@volokuleshov·20 Mar

Elon knows.

Elon Musk@elonmusk

@realAtlasPress So diffusion, not transformers?

English

15.1K

Sumanth Dathathri@sdathath·27 Şub

@srodan Don't know if the Bengals famine victims fully agree.

English

315

Simone Rodan-Benzaquen@srodan·27 Şub

This is what so-called « pro-Palestinians » did last night to the man who saved the free world.

English

1.8K

11.2K

1.8M

Sumanth Dathathri retweetet

Ian Goodfellow@goodfellow_ian·23 Şub

I'd like to thank @daniel_rossett for his help in my recovery from the POTS version of Long COVID. Daniel was key in bringing me back from highly disabled and suffering to being able to do what I want to again. This X account is mostly focused on ML / AI. From that point of view, many of you know that in December 2024, I wasn't able to do the test of time award talk at NeurIPS, even by video call. Daniel started working with me in March 2025. By April, I started to have days of no POTS symptoms, by June I was off all heart rate lowering medications, by September I was back to work. I'm back to full exercise, running, lifting weights, mountain biking, and have even done things I hadn't done before I got sick, like riding Whistler Mountain Bike Park. I'm now getting the word out to help Daniel build a company that will bring this approach to more people.

English

170

2.6K

202.1K

Sumanth Dathathri@sdathath·21 Şub

@sunny2dutta @original_ngv "You cannot chose any major you want. You have to take appropriate courses say Intro to Programming, math courses if you do well there you can chose a CS major"-- this isn't true. You need to make it through a set of core courses, that set of courses is the same for all majors.

English

Debarya Dutta@sunny2dutta·21 Şub

1. You cannot chose any major you want. You have to take appropriate courses say Intro to Programming, math courses if you do well there you can chose a CS major. Many advanced electives have limits. IIT equivalent would be branch change, you do very well in year 1 (GPA 9+) you can chose any major you want. 2. Also different universities operate differently Oxford, Cambridge you are supposed to chose your field right in the beginning some are more competitive than others.

English

3.2K

enji vi@original_ngv·21 Şub

Then why do Ivy Leagues let students choose their own majors? Surley a university like MIT or Harvard is smarter than IIT, no?

Rajesh Balasubramanian@RajeshB18566468

Pure BS. The difference in intellectual ability between JEE AIR 5 and JEE AIR 10000 is higher than AIR 10000 and a clueless class XII student. Everyone everywhere will want to do CSE. WTF do so many Indians find the need to comment about the IITs?

English

558

40.7K

Sumanth Dathathri@sdathath·12 Şub

@__apf__ @devahaz @rohindhar As someone who lives in central-ish London, 2000ft is like the Buckingham Palace.

English

Adriana Porter Felt@__apf__·12 Şub

@devahaz @rohindhar any house in the lower hills (meaning it has views but isn't in the high risk fire zone) is going for crazy amounts. check out this one. 1.2M in 2019, 2.2M a few months later, 2.6 in 2024, and 3.4 in 2025. only 2000 sqft. zillow.com/homedetails/19…

English

168

Rohin Dhar@rohindhar·12 Şub

San Francisco home sale in Pacific Heights (not necessarily the fancy part) Listed for $5.995MM Just sold for $8MM

English

496

97.7K

Sumanth Dathathri@sdathath·12 Şub

@NirantK Would be very happy to sell you USD for 95 INR sir.

English

199

Nirant@NirantK·12 Şub

PSA: If you're doing USD to INR conversion mentally, can just use 100 now.

English

3.1K

105.2K

Sumanth Dathathri@sdathath·8 Şub

@iliaishacked That's already what's happening with self-driving. The perception systems are just far from robust.

English

Sumanth Dathathri retweetet

Gabriel@gbrl_dick·7 Şub

west coast: we are weeks from the singularity east coast: what will this mean for saas multiples? also, somehow, tacos are ableist melbourne, australia: 22c/72f and sunny, nice breeze, might take my family to the zoo

English

1.9K

96.6K

Sumanth Dathathri retweetet

Ilia Shumailov🦔@iliaishacked·7 Şub

Folks, we are hiring a few systems researchers/engineers (full-time, part-time or internships) with the following requirements: * Experience in systems research * Familiarity with inference stacks such as vLLM, SGL, or TensorRT * Python, CUDA, and Rust experience is a strong bonus

English

349

24.2K

Sumanth Dathathri@sdathath·6 Şub

@iScienceLuvr Inb4 automated labs.

English

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·6 Şub

Honestly, I expect AI researchers to get replaced by AI agents before other researchers because most other research disciplines require physical experimentation.

English

339

27.2K

Sumanth Dathathri@sdathath·6 Şub

@vishalmisra Where's the spike once you've had a kid?

English

Vishal Misra@vishalmisra·5 Şub

Most depressing chart I have seen in years :-(

Dr. Dominic Ng@DrDominicNg

By age 18, you’ve already spent ~90% of your in-person time with your parents. By 30, it’s closer to 95%.

English

3.8K

Sumanth Dathathri retweetet

Kamalika Chaudhuri@kamalikac·5 Şub

I started my career as a theorist, and am now an empirical LLM researcher. In today's blog post, I talk about the parallels between theory and empirical research: kamalikachaudhuri.substack.com/p/a-theorists-…

English

406

44.9K

Sumanth Dathathri@sdathath·4 Şub

@SawyerMerritt @Tesla How do bats and dolphins navigate their environments? Clearly with their eyes.

English

Sawyer Merritt@SawyerMerritt·4 Şub

Ashok Elluswamy, VP of AI at @Tesla on self-driving: "It's so obvious you can solve this with cameras. Why wouldn't you solve with cameras? It's 2026. The self-driving problem is not a sensor problem, it's an AI problem. The cameras have enough information already. It's a problem of extracting the information, which is an AI problem." (via @aelluswamy's presentation at the 2026 ScaledML Conference on January 29th)

Ian Teetzel@ianteetzel

Ashok Elluswamy, VP of AI at Tesla, discusses building end-to-end foundational models for self driving at the 2026 ScaledML Conference presented by Matroid. youtu.be/LFh9GAzHg1c?si…

English

296

837

9.9K

1.8M

Sumanth Dathathri@sdathath·2 Şub

@iliaishacked @MLStreetTalk does nothing

English

Ilia Shumailov🦔@iliaishacked·2 Şub

@MLStreetTalk adversarial training

English

270

Machine Learning Street Talk@MLStreetTalk·2 Şub

writing is the adversarial process with yourself to become more coherent

English

3.9K

Sumanth Dathathri@sdathath·2 Şub

@geoffreyirving We may actually have a use case for this! Very useful!

English

157

Geoffrey Irving@geoffreyirving·1 Şub

One of the more useless things I did while at Google Brain was write down random access into xorshift128+, the hardware random number generator on TPUs. Purely a stunt: it could theoretically have meant TPU-native Jax-style random numbers faster than Threefry, but in practice random numbers are cheap and the complexity is not worth it. I still have the code in a branch, but certainly no one has ever used it. Fun way to learn about finite field isomorphisms, though. github.com/tensorflow/ten…

English

133

14.2K

Sumanth Dathathri@sdathath·1 Şub

@kaytraser Not sure I follow. E.g., the KL regularization does push the model towards pretraining, so it should increase the influence of pretraining?

English

Sarthak@kaytraser·1 Şub

@sdathath this seems to be more aligned with the task of pure next token generation hence the suspension of being more influenced by changes in pre training

English

Sarthak@kaytraser·26 Oca

did synth data generation for the same task in Sept 2024 and today fighting mode collapse was so hard back then and is completely absent now we've came a long way, wondering if it is only because models got larger or did the labs actually get an improved data distribution

English

11.1K

Sumanth Dathathri@sdathath·1 Şub

@kaytraser I think RL inherently isn't mean to take away all entropy, the way it's been doing for a while. I think it's a sign people are getting better at RL with LLMs.

English

Sumanth Dathathri@sdathath·1 Şub

@kaytraser You mean worse diversity? It depends on your KL regularization wrt original LLM dist. If you have very strong reg, you'll stay close to the original distribution. I've seen things trending towards a bit more care around stopping the model from collapsing these days.

English

121

Entdecken

@reasoningmodels @SawyerMerritt @pfau @deedydas @sedielem @volokuleshov @srodan @daniel_rossett