Hossein Mobahi

1.4K posts

Hossein Mobahi banner
Hossein Mobahi

Hossein Mobahi

@TheGradient

Rεsεαrch Sciεητisτ @GoogleDeepMind. I ∈ Optimization ∩ Machine Learning. Here to discuss research 🤓. Like heavy music🤘.Origin=🇮🇷 Citizen=🇺🇸.

Mountain View, CA Se unió Aralık 2010
781 Siguiendo6.4K Seguidores
Hossein Mobahi
Hossein Mobahi@TheGradient·
@Google PhD Fellowship: Applications are now open! Fellowships directly support graduate students doing exceptional and innovative research in computer science and related fields as they pursue their PhD. Learn more and apply by April 30 at goo.gle/phdfellowship
English
0
5
20
1.2K
Hossein Mobahi retuiteado
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
1/ We found that deep sequence models memorize atomic facts "geometrically" -- not as an associative lookup table as often imagined. This opens up practical questions on reasoning/memory/discovery, and also poses a theoretical "memorization puzzle."
GIF
English
59
247
1.5K
88.5K
Hossein Mobahi
Hossein Mobahi@TheGradient·
@mmbronstein @docmilanfar Thanks Michael! Just a bit correction. That's Arabic! In Farsi you say به امید خدا "be omide khoda" if you believe in god or امیدوارم "omidvaram" otherwise.
0
0
8
432
Peyman Milanfar
Peyman Milanfar@docmilanfar·
Here’s to a free and democratic Iran. Her good people have suffered long enough.
Peyman Milanfar tweet media
English
1
98
994
43.7K
Hossein Mobahi retuiteado
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
1/ New paper! "Wait, Wait, Wait… Why Do Reasoning Models Loop?" Under greedy/low-temp decoding, reasoning LLMs get stuck in loops repeating themselves, wasting test-time compute and sometimes never terminating! We study why this🔁 happens and why increasing temp is a band-aid
Dimitris Papailiopoulos tweet media
English
26
85
747
93.6K
Hossein Mobahi retuiteado
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
We introduce epiplexity, a new measure of information that provides a foundation for how to select, generate, or transform data for learning systems. We have been working on this for almost 2 years, and I cannot contain my excitement! 1/7
Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English
35
189
1.3K
161.3K
Hossein Mobahi retuiteado
Zeyuan Allen-Zhu, Sc.D.
Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·
Continuing Tutorial II for Physics of Language Models. We often trust large-scale results simply because they are large; but once noise is removed, the synthetic pretrain playground starts to push back — hard! The second video (Part 4.1b, 90 minutes) makes this pushback concrete. From it, I derive 20+ architectural principles, organized into 12 result blocks. Two highlights that consistently surprise even experienced readers: Result 2.1 (new): "Why Canon layers actually work." Not because of multi-token attention — that explanation only applies to the first layer. The real mechanism is how Canon reshapes hierarchical learning across depth. Result 11: "Why linear models reason 4× shallower than Transformers." This has nothing to do with memory size — it is a structural failure shared by nearly all linear architectures. In Result 12, I show which of these principles already emerge at academic-scale pretraining (1.3B / 100B) — with orders-of-magnitude lower cost and far cleaner signals than many real-life large-scale runs. The remaining principles do not disappear; they only emerge when scaling to 8B / 1T, which I will show in the third video (Part 4.2). ⏮️ Previous: Part 4.1a — methodology & playground design ▶️ This: Part 4.1b — architectural principles from the playground 🔜 Next: Part 4.2 — when the playground reshapes real-life pretraining
Zeyuan Allen-Zhu, Sc.D. tweet media
English
13
100
717
184K
Hossein Mobahi
Hossein Mobahi@TheGradient·
@roydanroy Congrats Dan! Can’t wait to chit chat with you at Google DeepMind!
English
1
0
2
2.2K
Dan Roy
Dan Roy@roydanroy·
Big announcement time... Today is my last day as Research Director at the Vector Institute. It has been my incredible privilege over the past 2.5 years to serve the Vector community and help build an institution that supports world-class ML research and real-world impact.
English
36
10
606
54.9K
Hossein Mobahi retuiteado
Spencer Frei
Spencer Frei@sfrei_·
I'm hiring a Student Researcher to work on scaling laws at Google DeepMind! Project is for 16 weeks, starting spring/summer '26, in-person in SF (pic from the amazing office). If you're interested, fill out this form: forms.gle/MsgPfJumTLLobN…
Spencer Frei tweet media
English
19
69
758
73.3K
Azalia Mirhoseini
Azalia Mirhoseini@Azaliamirh·
Thrilled to share that @annadgoldie and I are launching @RicursiveAI, a frontier lab enabling recursive self-improvement through AIs that design their own chips. Our vision for transforming chip design began with AlphaChip, an AI for layout optimization used to design four generations of TPUs, data center CPUs, and smartphones. AlphaChip offered a glimpse into a future where AI designs the silicon that fuels it. Ricursive extends this vision to the entire chip stack, building AI that architects, verifies, and implements silicon, enabling models and chips to co-evolve in a tight loop. We sat down with WSJ’s @berber_jin1 to discuss Ricursive: wsj.com/tech/this-ai-s…
Ricursive Intelligence@RicursiveAI

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

English
125
137
1.5K
225.8K
Dorsa Sadigh
Dorsa Sadigh@DorsaSadigh·
Just realized I haven't shared life status or been on twitter for a while, so here is a status dump 🧵 1/4
English
23
3
329
51.6K
Hossein Mobahi retuiteado
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
Don't let people underestimate you. I remember interviewing for a postdoc at an industry lab, where I introduced spectral mixture kernels. I was told my work was "NIPS-y". It wasn't a compliment and I didn't get the position. 10 years later I was asked to autograph that paper.
English
7
13
523
49.2K
Hossein Mobahi
Hossein Mobahi@TheGradient·
@Yuchenj_UW To be clear: A degree is not a magic wand. Classes alone don't create capability. But a PhD is a forcing function for the analytical rigor and depth required for foundational work. Can you acquire those tools without the program? Yes, but it’s a much steeper climb.
English
2
1
13
1.1K
Hossein Mobahi
Hossein Mobahi@TheGradient·
@Yuchenj_UW And those who created the foundations of all this (LeCun, Hinton, Bengio, and Schmidhuber) each hold a PhD. The question is where you want to contribute? Expand the breadth of what's possible with current foundations or go deep to build future foundations.
English
1
1
43
1.6K
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
The creator of GPT doesn’t have a PhD. The creator of PyTorch doesn’t have a PhD. The research lead at Cursor dropped out of NEU. You don’t need a PhD or a top school to become a great researcher or engineer. You can just do things!
English
453
546
6K
1.9M
Hossein Mobahi
Hossein Mobahi@TheGradient·
@modular_ell @GoogleDeepMind Deep understanding of the theory of finite-dimensional vector spaces is a "must-have" as we will need to rigorously analyze and construct proofs using concepts like vector subspaces, orthogonality, and spectral theory. Familiarity with numerical linear algebra is a nice plus.
English
0
0
6
957
Hossein Mobahi
Hossein Mobahi@TheGradient·
🚨Intern Hiring🚨 Join Peter Bartlett and me at @GoogleDeepMind in Mountain View to study hierarchical learning in deep networks. Ideal for PhD students with a strong background in ML, optimization, linear algebra, and Python (JAX preferred). Apply here docs.google.com/forms/d/1dXZiv…
English
7
31
330
27.8K
Hossein Mobahi
Hossein Mobahi@TheGradient·
@HazanPrinceton Sorry to hear about no slides to share, and the board was erased, but at least it presents a creative proof (by construction) for maximizing regret🤪
English
0
0
1
451
Elad Hazan
Elad Hazan@HazanPrinceton·
Paris workshop on regret and optimization, I have no slides to share, the board was erased, but will post new papers soon!
Elad Hazan tweet media
English
10
7
156
12.3K
Hossein Mobahi
Hossein Mobahi@TheGradient·
An exciting moment for AI in math! A new paper by a team of mathematicians, including Terence Tao tackled 67 pure math problems and in several cases found solutions that improved on the best-known human results. arxiv.org/abs/2511.02864
Hossein Mobahi tweet media
English
1
1
8
1.3K
Yisong Yue
Yisong Yue@yisongyue·
Since she's way too shy to post this herself, please join me in congratulating my amazing colleague and friend @klbouman for receiving tenure at @caltech! 🥳🎉
Yisong Yue tweet media
English
108
137
9.1K
2.7M
Hossein Mobahi retuiteado
Daniel Machlab
Daniel Machlab@Daniel_Machlab·
I was impacted by the Meta Superintelligence Labs layoffs yesterday I’m an AI researcher passionate about training large language models to be both more capable and safer. I’ve worked R&D of Llama Guard, Meta’s multimodal LLM safety guardrail, and on continued pre-training, post-training, and preference alignment of LLMs for custom use cases. Seeking new opportunities in LLM training and safety research If you know of teams working on these challenges, please reach out
English
41
40
480
69.2K
Hossein Mobahi retuiteado
Luis Pineda
Luis Pineda@luisenp·
After 7 years at FAIR, I've been affected by the recent AI layoffs. If you are interested in robotics learning, let's chat :)
English
61
54
1K
135.7K