Hossein Mobahi

1.4K posts

Hossein Mobahi

@TheGradient

Rεsεαrch Sciεητisτ @GoogleDeepMind. I ∈ Optimization ∩ Machine Learning. Here to discuss research 🤓. Like heavy music🤘.Origin=🇮🇷 Citizen=🇺🇸.

Mountain View, CA Se unió Aralık 2010

781 Siguiendo6.4K Seguidores

Hossein Mobahi@TheGradient·5h

@Google PhD Fellowship: Applications are now open! Fellowships directly support graduate students doing exceptional and innovative research in computer science and related fields as they pursue their PhD. Learn more and apply by April 30 at goo.gle/phdfellowship

English

1.2K

Hossein Mobahi retuiteado

Vaishnavh Nagarajan@_vaishnavh·8 Oca

1/ We found that deep sequence models memorize atomic facts "geometrically" -- not as an associative lookup table as often imagined. This opens up practical questions on reasoning/memory/discovery, and also poses a theoretical "memorization puzzle."

GIF

English

247

1.5K

88.5K

Hossein Mobahi@TheGradient·10 Oca

@mmbronstein @docmilanfar Thanks Michael! Just a bit correction. That's Arabic! In Farsi you say به امید خدا "be omide khoda" if you believe in god or امیدوارم "omidvaram" otherwise.

432

Michael Bronstein@mmbronstein·9 Oca

@docmilanfar 🤞ان‌شاءالله

العربية

2.8K

Peyman Milanfar@docmilanfar·9 Oca

Here’s to a free and democratic Iran. Her good people have suffered long enough.

English

994

43.7K

Hossein Mobahi retuiteado

Dimitris Papailiopoulos@DimitrisPapail·6 Oca

1/ New paper! "Wait, Wait, Wait… Why Do Reasoning Models Loop?" Under greedy/low-temp decoding, reasoning LLMs get stuck in loops repeating themselves, wasting test-time compute and sometimes never terminating! We study why this🔁 happens and why increasing temp is a band-aid

English

747

93.6K

Hossein Mobahi retuiteado

Andrew Gordon Wilson@andrewgwils·7 Oca

We introduce epiplexity, a new measure of information that provides a foundation for how to select, generate, or transform data for learning systems. We have been working on this for almost 2 years, and I cannot contain my excitement! 1/7

Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English

189

1.3K

161.3K

Hossein Mobahi retuiteado

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·31 Ara

Continuing Tutorial II for Physics of Language Models. We often trust large-scale results simply because they are large; but once noise is removed, the synthetic pretrain playground starts to push back — hard! The second video (Part 4.1b, 90 minutes) makes this pushback concrete. From it, I derive 20+ architectural principles, organized into 12 result blocks. Two highlights that consistently surprise even experienced readers: Result 2.1 (new): "Why Canon layers actually work." Not because of multi-token attention — that explanation only applies to the first layer. The real mechanism is how Canon reshapes hierarchical learning across depth. Result 11: "Why linear models reason 4× shallower than Transformers." This has nothing to do with memory size — it is a structural failure shared by nearly all linear architectures. In Result 12, I show which of these principles already emerge at academic-scale pretraining (1.3B / 100B) — with orders-of-magnitude lower cost and far cleaner signals than many real-life large-scale runs. The remaining principles do not disappear; they only emerge when scaling to 8B / 1T, which I will show in the third video (Part 4.2). ⏮️ Previous: Part 4.1a — methodology & playground design ▶️ This: Part 4.1b — architectural principles from the playground 🔜 Next: Part 4.2 — when the playground reshapes real-life pretraining

English

100

717

184K

Hossein Mobahi@TheGradient·1 Oca

@roydanroy Congrats Dan! Can’t wait to chit chat with you at Google DeepMind!

English

2.2K

Dan Roy@roydanroy·1 Oca

Big announcement time... Today is my last day as Research Director at the Vector Institute. It has been my incredible privilege over the past 2.5 years to serve the Vector community and help build an institution that supports world-class ML research and real-world impact.

English

606

54.9K

Hossein Mobahi retuiteado

Spencer Frei@sfrei_·5 Ara

I'm hiring a Student Researcher to work on scaling laws at Google DeepMind! Project is for 16 weeks, starting spring/summer '26, in-person in SF (pic from the amazing office). If you're interested, fill out this form: forms.gle/MsgPfJumTLLobN…

English

758

73.3K

Hossein Mobahi@TheGradient·3 Ara

@Azaliamirh @annadgoldie @RicursiveAI Wonderful! Wishing you best of luck.

English

314

Azalia Mirhoseini@Azaliamirh·2 Ara

Thrilled to share that @annadgoldie and I are launching @RicursiveAI, a frontier lab enabling recursive self-improvement through AIs that design their own chips. Our vision for transforming chip design began with AlphaChip, an AI for layout optimization used to design four generations of TPUs, data center CPUs, and smartphones. AlphaChip offered a glimpse into a future where AI designs the silicon that fuels it. Ricursive extends this vision to the entire chip stack, building AI that architects, verifies, and implements silicon, enabling models and chips to co-evolve in a tight loop. We sat down with WSJ’s @berber_jin1 to discuss Ricursive: wsj.com/tech/this-ai-s…

Ricursive Intelligence@RicursiveAI

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

English

125

137

1.5K

225.8K

Hossein Mobahi@TheGradient·1 Ara

@DorsaSadigh Killing it Dorsa! Congrats on all these🎉🎈

English

1.8K

Dorsa Sadigh@DorsaSadigh·1 Ara

Just realized I haven't shared life status or been on twitter for a while, so here is a status dump 🧵 1/4

English

329

51.6K

Hossein Mobahi retuiteado

Andrew Gordon Wilson@andrewgwils·16 Kas

Don't let people underestimate you. I remember interviewing for a postdoc at an industry lab, where I introduced spectral mixture kernels. I was told my work was "NIPS-y". It wasn't a compliment and I didn't get the position. 10 years later I was asked to autograph that paper.

English

523

49.2K

Hossein Mobahi@TheGradient·15 Kas

@Yuchenj_UW To be clear: A degree is not a magic wand. Classes alone don't create capability. But a PhD is a forcing function for the analytical rigor and depth required for foundational work. Can you acquire those tools without the program? Yes, but it’s a much steeper climb.

English

1.1K

Hossein Mobahi@TheGradient·15 Kas

@Yuchenj_UW And those who created the foundations of all this (LeCun, Hinton, Bengio, and Schmidhuber) each hold a PhD. The question is where you want to contribute? Expand the breadth of what's possible with current foundations or go deep to build future foundations.

English

1.6K

Yuchen Jin@Yuchenj_UW·14 Kas

The creator of GPT doesn’t have a PhD. The creator of PyTorch doesn’t have a PhD. The research lead at Cursor dropped out of NEU. You don’t need a PhD or a top school to become a great researcher or engineer. You can just do things!

English

453

546

1.9M

Hossein Mobahi@TheGradient·12 Kas

@modular_ell @GoogleDeepMind Deep understanding of the theory of finite-dimensional vector spaces is a "must-have" as we will need to rigorously analyze and construct proofs using concepts like vector subspaces, orthogonality, and spectral theory. Familiarity with numerical linear algebra is a nice plus.

English

957

Hossein Mobahi@TheGradient·12 Kas

🚨Intern Hiring🚨 Join Peter Bartlett and me at @GoogleDeepMind in Mountain View to study hierarchical learning in deep networks. Ideal for PhD students with a strong background in ML, optimization, linear algebra, and Python (JAX preferred). Apply here docs.google.com/forms/d/1dXZiv…

English

330

27.8K

Hossein Mobahi@TheGradient·7 Kas

@HazanPrinceton Sorry to hear about no slides to share, and the board was erased, but at least it presents a creative proof (by construction) for maximizing regret🤪

English

451

Elad Hazan@HazanPrinceton·7 Kas

Paris workshop on regret and optimization, I have no slides to share, the board was erased, but will post new papers soon!

English

156

12.3K

Hossein Mobahi@TheGradient·7 Kas

An exciting moment for AI in math! A new paper by a team of mathematicians, including Terence Tao tackled 67 pure math problems and in several cases found solutions that improved on the best-known human results. arxiv.org/abs/2511.02864

English

1.3K

Hossein Mobahi@TheGradient·30 Eki

@yisongyue @klbouman @Caltech Wonderful news! Hearty congratulations @klbouman 🎈🎉 and thanks @yisongyue for sharing.

English

240

Yisong Yue@yisongyue·27 Eki

Since she's way too shy to post this herself, please join me in congratulating my amazing colleague and friend @klbouman for receiving tenure at @caltech! 🥳🎉

English

108

137

9.1K

2.7M

Hossein Mobahi retuiteado

Daniel Machlab@Daniel_Machlab·24 Eki

I was impacted by the Meta Superintelligence Labs layoffs yesterday I’m an AI researcher passionate about training large language models to be both more capable and safer. I’ve worked R&D of Llama Guard, Meta’s multimodal LLM safety guardrail, and on continued pre-training, post-training, and preference alignment of LLMs for custom use cases. Seeking new opportunities in LLM training and safety research If you know of teams working on these challenges, please reach out

English

480

69.2K

Hossein Mobahi retuiteado

Luis Pineda@luisenp·23 Eki

After 7 years at FAIR, I've been affected by the recent AI layoffs. If you are interested in robotics learning, let's chat :)

English

135.7K

Descubrir

@Google @mmbronstein @docmilanfar @roydanroy @Azaliamirh @annadgoldie @RicursiveAI @berber_jin1