Sumanth Dathathri

914 posts

Sumanth Dathathri

Sumanth Dathathri

@sdathath

Researcher at Google DeepMind. Past: Caltech, IIT Madras.

Pasadena, CA Beigetreten Kasım 2014
641 Folgt366 Follower
Reasoning Models
Reasoning Models@reasoningmodels·
@SawyerMerritt Pretty incredible, generating $2B/mo, so $24B/year. So 35x annual revenue multiple. This is like a company doing $10M/year in ARR raising at a $350M valuation. Which I think does happen quite a bit, so maybe nothing too unusual here right??
English
1
0
0
921
Sawyer Merritt
Sawyer Merritt@SawyerMerritt·
NEWS: OpenAI just announced that it has officially closed their latest funding round with $122 billion in committed capital at a post money valuation of $852 billion. "We are now generating $2B in revenue per month. At this stage, we are growing revenue four times faster than the companies who defined the Internet and mobile eras, including Alphabet and Meta. ChatGPT has more than 900 million weekly active users, and over 50 million subscribers. Search usage has nearly tripled in a year, and our ads pilot reached more than $100 million in ARR in under six weeks. Momentum is just as strong on the enterprise side, which now makes up more than 40% of our revenue, and is on track to reach parity with consumer by the end of 2026. GPT‑5.4 is driving record engagement across agentic workflows. Our APIs now process more than 15 billion tokens per minute. Codex now serves over 2 million weekly users, up 5x in the past three months, with usage growing more than 70% month over month."
Sawyer Merritt tweet media
English
269
262
2.6K
1.9M
David Pfau
David Pfau@pfau·
Oh god are we really doing this? Jeff Dean trained an n-gram model on the entire internet in 2007. Jelinek coined the term "language model" in the '70s. It's called "Claude" because Claude Shannon was estimating the entropy rate of the English language in 1951!
Aran Komatsuzaki@arankomatsuzaki

While Alec is one of the best ML researchers of all time, LLM started way before. Here's one from 2013 for non-neural architecture and one from 2016, which is afaik the first neural LLM if we define LLM as LM w/ >1B params.

English
35
84
1.4K
467.9K
Deedy
Deedy@deedydas·
Google Senior Staff Engineer to me: “Yeah, I have no clue what Claude Code / Codex is but I hear it’s all the rage. No, I don’t really care, I just need GOOG to hit $400 and keep this job for 2-3 more years so I can retire!”
English
262
102
6K
845.6K
Sumanth Dathathri
Sumanth Dathathri@sdathath·
@sedielem @volokuleshov I understand Elon's tweet as implying it's the modeling paradigm that all the hype should be about, not the architecture. He knows it all.
English
0
0
1
489
Sander Dieleman
Sander Dieleman@sedielem·
@volokuleshov What does he know? Not the difference between a model architecture and a modelling paradigm, apparently🙃
English
8
6
291
9.5K
Simone Rodan-Benzaquen
Simone Rodan-Benzaquen@srodan·
This is what so-called « pro-Palestinians » did last night to the man who saved the free world.
Simone Rodan-Benzaquen tweet media
English
1.8K
2K
11.2K
1.8M
Sumanth Dathathri retweetet
Ian Goodfellow
Ian Goodfellow@goodfellow_ian·
I'd like to thank @daniel_rossett for his help in my recovery from the POTS version of Long COVID. Daniel was key in bringing me back from highly disabled and suffering to being able to do what I want to again. This X account is mostly focused on ML / AI. From that point of view, many of you know that in December 2024, I wasn't able to do the test of time award talk at NeurIPS, even by video call. Daniel started working with me in March 2025. By April, I started to have days of no POTS symptoms, by June I was off all heart rate lowering medications, by September I was back to work. I'm back to full exercise, running, lifting weights, mountain biking, and have even done things I hadn't done before I got sick, like riding Whistler Mountain Bike Park. I'm now getting the word out to help Daniel build a company that will bring this approach to more people.
English
170
83
2.6K
202.1K
Sumanth Dathathri
Sumanth Dathathri@sdathath·
@sunny2dutta @original_ngv "You cannot chose any major you want. You have to take appropriate courses say Intro to Programming, math courses if you do well there you can chose a CS major"-- this isn't true. You need to make it through a set of core courses, that set of courses is the same for all majors.
English
0
0
1
72
Debarya Dutta
Debarya Dutta@sunny2dutta·
1. You cannot chose any major you want. You have to take appropriate courses say Intro to Programming, math courses if you do well there you can chose a CS major. Many advanced electives have limits. IIT equivalent would be branch change, you do very well in year 1 (GPA 9+) you can chose any major you want. 2. Also different universities operate differently Oxford, Cambridge you are supposed to chose your field right in the beginning some are more competitive than others.
English
4
0
15
3.2K
Adriana Porter Felt
Adriana Porter Felt@__apf__·
@devahaz @rohindhar any house in the lower hills (meaning it has views but isn't in the high risk fire zone) is going for crazy amounts. check out this one. 1.2M in 2019, 2.2M a few months later, 2.6 in 2024, and 3.4 in 2025. only 2000 sqft. zillow.com/homedetails/19…
English
2
0
0
168
Rohin Dhar
Rohin Dhar@rohindhar·
San Francisco home sale in Pacific Heights (not necessarily the fancy part) Listed for $5.995MM Just sold for $8MM
Rohin Dhar tweet media
English
57
13
496
97.7K
Nirant
Nirant@NirantK·
PSA: If you're doing USD to INR conversion mentally, can just use 100 now.
English
24
37
3.1K
105.2K
Sumanth Dathathri
Sumanth Dathathri@sdathath·
@iliaishacked That's already what's happening with self-driving. The perception systems are just far from robust.
English
0
0
1
5
Sumanth Dathathri retweetet
Gabriel
Gabriel@gbrl_dick·
west coast: we are weeks from the singularity east coast: what will this mean for saas multiples? also, somehow, tacos are ableist melbourne, australia: 22c/72f and sunny, nice breeze, might take my family to the zoo
English
32
50
1.9K
96.6K
Sumanth Dathathri retweetet
Ilia Shumailov🦔
Ilia Shumailov🦔@iliaishacked·
Folks, we are hiring a few systems researchers/engineers (full-time, part-time or internships) with the following requirements: * Experience in systems research * Familiarity with inference stacks such as vLLM, SGL, or TensorRT * Python, CUDA, and Rust experience is a strong bonus
English
16
25
349
24.2K
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Honestly, I expect AI researchers to get replaced by AI agents before other researchers because most other research disciplines require physical experimentation.
English
35
11
339
27.2K
Sawyer Merritt
Sawyer Merritt@SawyerMerritt·
Ashok Elluswamy, VP of AI at @Tesla on self-driving: "It's so obvious you can solve this with cameras. Why wouldn't you solve with cameras? It's 2026. The self-driving problem is not a sensor problem, it's an AI problem. The cameras have enough information already. It's a problem of extracting the information, which is an AI problem." (via @aelluswamy's presentation at the 2026 ScaledML Conference on January 29th)
Ian Teetzel@ianteetzel

Ashok Elluswamy, VP of AI at Tesla, discusses building end-to-end foundational models for self driving at the 2026 ScaledML Conference presented by Matroid. youtu.be/LFh9GAzHg1c?si…

English
296
837
9.9K
1.8M
Machine Learning Street Talk
Machine Learning Street Talk@MLStreetTalk·
writing is the adversarial process with yourself to become more coherent
English
4
9
63
3.9K
Geoffrey Irving
Geoffrey Irving@geoffreyirving·
One of the more useless things I did while at Google Brain was write down random access into xorshift128+, the hardware random number generator on TPUs. Purely a stunt: it could theoretically have meant TPU-native Jax-style random numbers faster than Threefry, but in practice random numbers are cheap and the complexity is not worth it. I still have the code in a branch, but certainly no one has ever used it. Fun way to learn about finite field isomorphisms, though. github.com/tensorflow/ten…
English
4
5
133
14.2K
Sumanth Dathathri
Sumanth Dathathri@sdathath·
@kaytraser Not sure I follow. E.g., the KL regularization does push the model towards pretraining, so it should increase the influence of pretraining?
English
1
0
0
68
Sarthak
Sarthak@kaytraser·
@sdathath this seems to be more aligned with the task of pure next token generation hence the suspension of being more influenced by changes in pre training
English
1
0
0
75
Sarthak
Sarthak@kaytraser·
did synth data generation for the same task in Sept 2024 and today fighting mode collapse was so hard back then and is completely absent now we've came a long way, wondering if it is only because models got larger or did the labs actually get an improved data distribution
English
1
0
8
11.1K
Sumanth Dathathri
Sumanth Dathathri@sdathath·
@kaytraser I think RL inherently isn't mean to take away all entropy, the way it's been doing for a while. I think it's a sign people are getting better at RL with LLMs.
English
0
0
1
18
Sumanth Dathathri
Sumanth Dathathri@sdathath·
@kaytraser You mean worse diversity? It depends on your KL regularization wrt original LLM dist. If you have very strong reg, you'll stay close to the original distribution. I've seen things trending towards a bit more care around stopping the model from collapsing these days.
English
2
0
1
121