Afroz Mohiuddin
528 posts

Afroz Mohiuddin
@afrozenator
@OpenAI, ex @Google, @AIAtMeta. Interested in Science, Psychology, Investing and generally everything. Good Thoughts, Good Words, Good Deeds.




Today, we are launching NextToken - a single place to build production-grade agents, apps and analytics. Code-forward. Cloud Hosted. Zero setup. Extremely affordable. @nexttoken_co



hmm I sort of disagree and I am bullish for TML. I think they really really have the top talents that I admire in the field, e.g. Jeremy and Sam for optimization, Songlin for Attn, Lia for MoE, Andrew for FSDPv2, and a bunch more folks it's just natural that it takes a while to publish good models: - dpsk starts to publish papers in 2023, even piblished dspkv2 (which I think is already amazing) in mid 2024 and nobody cares, until dpskv3 and r1 - msh took 10+ month to deliver a first not bad long ctx model in 2023 and be silent for the whole 2024 year, and starts to catch up gradually in 2025 - qwen starts to be a much better model than llama until qwen2.5, mid or late 2024, while the lab has been there forever it takes time to get infra and data done, but as long as you have good folks, and principled ways of doing science and experiments, some time or later, scaling laws will pay back


Charlie Munger: "Politicians are never so bad that you don't live to want them back." "You laugh, you young people, but you're going to live to wish that Nancy Pelosi and Donald Trump were immortal."








While at Meta, I worked on this optimizer-wrapper (outer step lookahead momentum) we're calling Snoo (arxiv.org/abs/2510.15830). You can use it with AdamW or Muon and see really strong scaling. Here's a plot where we ran it against (tuned) AdamW up to 1e23 training flop scales. The "x"s in the plot are compute-factors i.e the baseline needs "x" more flops to reach the same loss (instead of simply measuring in steps). - We further established a medium-track WR on modded-nanogpt (github.com/KellerJordan/m…) With amazing co-authors (Dominik,Vishal,Michael).

Google has reached a remarkable milestone not seen since the heyday of Bell Labs: 5 of its current/former employees are science Nobel laureates. This remarkable concentration of talent signals a major shift: fundamental discoveries are no longer confined to the halls of academia

I've spent years pushing the boundaries of pretraining—first as lead author on PaLM, then as a lead contributor on Gemini pre-training. Now I'm at Reflection, building open-weight agentic models at the frontier from the ground up. Today we're announcing our Series B to accelerate this mission. What excites me most is the team of world-class researchers who are deeply bought into this mission and the opportunity to build a frontier lab from scratch. Pre-training at scale. RL at scale. Agentic reasoning. The full stack. It's rare to get the resources, the talent, and the mission aligned like this. If you're passionate about this mission and pre-training/RL at scale to advance the open frontier, join us on our ambitious journey! DM me. We’re hiring in SF, New York and London: reflection.ai/careers




