Lechao Xiao

190 posts

Lechao Xiao

@Locchiu

Research Scientist @GoogleDeepMind / Google Brain. Tackle scaling, along the path to AGI.

New York, NY Entrou em Eylül 2009

619 Seguindo1.3K Seguidores

Lechao Xiao@Locchiu·4 Mar

@JustinLin610 Best wishes for your next adventure!

English

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

738

13.6K

6.5M

Lechao Xiao@Locchiu·26 Şub

@hyhieu226 @OpenAI @xai Take care Hieu!

English

Hieu Pham@hyhieu226·26 Şub

I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.

English

1.1K

418

14K

1.2M

Lechao Xiao@Locchiu·23 Şub

@JCHaswell @TheGregYang That’s a really tough shift, especially you were in great shape before.

English

JC Haswell@JCHaswell·23 Şub

@TheGregYang @Locchiu I was in excellent shape my whole life leading up to lyme, don't have a pre-lyme numbers but last couple years HRV has averaged 15-20 and HR 70+.

English

118

Greg Yang@TheGregYang·22 Şub

my HRV and RHR have steadily worsened since october kinda strange since my life was much more stressful before october

English

256

34.5K

Lechao Xiao@Locchiu·22 Şub

@TheGregYang In my case, it correlates with exercise / sleep quite well ; exercise helps to improve both. Magnesium also boost hrv for 3-5 pts for me. No alcohol also helps 😂

English

518

Greg Yang@TheGregYang·22 Şub

@Locchiu I didn't do much at all till September and then stopped after an injury in october no change in magnesium I can recall

English

2.2K

Lechao Xiao@Locchiu·20 Şub

The originality and the depth of science are really impressive. High thinking, signal to flop ratio. Congrats Damien, Elliot, Courtney et al.

Damien Ferbach@damien_ferbach

1/10 We built ADANA, an optimizer that gets better as you scale. It extends AdamW with log-time schedules for momentum and weight decay — same hyperparameter count, no extra engineering. Scaled from 45M to 2.6B, it saves ~40% compute vs tuned AdamW, and the gap keeps growing.🧵

English

4.1K

Lechao Xiao@Locchiu·16 Şub

@Jianlin_S Thanks, Jianlin! Hope you have a wonderful Chinese New Year!

English

281

jianlin.su@Jianlin_S·15 Şub

Beyond MuP: 2. Linear Layers and Steepest Descent kexue.fm/archives/11605 The last blog post before the 2026 Spring Festival. Happy Chinese New Year!

English

252

37.7K

Lechao Xiao retweetou

trieu@thtrieu_·2 Şub

Mathematicians 🤝AI researchers arxiv.org/abs/2601.22401. Our take on AI solving Erdos problems: * Many "Open" problems are actually just obscure: many cases the AI didn't find something new, only rediscovered solutions buried in the literature. We present our systematic approach to reporting AI results on Erdos. * The real bottleneck is still human labor, e.g. we spent lots of time filtering out technically correct but meaningless solutions (AI missed Erdos’s original intent). * Acceleration in solving low-hanging fruits is real, but we also need to highlight the many more misses that require human auditing. Clear research directions ahead though, and we feel optimistic about drastically increasing the signal-to-noise ratio. More to come!

Thang Luong@lmthang

Here's the paper link to our scaled effort for tackling Erdős problems. We started with 700 problems marked ‘Open’ in the database. Our agent #Aletheia identified potential solutions to 200 problems. Initial human grading revealed 63 correct answers, followed by deep expert evaluation and discussion to eventually arrive at meaningful proofs to 13 Erdős problems. arxiv.org/abs/2601.22401

English

207

29.5K

Lechao Xiao@Locchiu·12 Oca

@peterjliu It feels like the beginning of vibe proof. With human in the loop as verifiers.

English

Lechao Xiao@Locchiu·12 Oca

@peterjliu ”My preference would still be for the final writeup for this result to be primarily human-generated in the most essential portions of the paper, though I can see a case for delegating routine proofs to some combination of AI-generated text and Lean code. But to me, the ... ”

English

Peter J. Liu@peterjliu·12 Oca

Wow, Terrence Tao is basically saying AI can produce acceptable math research papers: "This resulted in a new writeup of the proof drive.google.com/file/d/1MRQfcH… that had less of the feel of a generic AI-produced document, and which I judge to be at a level of writing within ballpark of an acceptable standard for a research paper, although there is still room for further improvement." @tao/115855840223258103" target="_blank" rel="nofollow noopener">mathstodon.xyz/@tao/115855840…

English

865

Lechao Xiao@Locchiu·1 Oca

@roydanroy Welcome!

English

Dan Roy@roydanroy·1 Oca

In mid-January, I’ll join Google DeepMind’s Science unit as a Visiting Research Scientist, on leave from the University of Toronto. I'm excited to be joining Google DeepMind's efforts to accelerate mathematical research with AI.

English

302

13.6K

Dan Roy@roydanroy·1 Oca

Big announcement time... Today is my last day as Research Director at the Vector Institute. It has been my incredible privilege over the past 2.5 years to serve the Vector community and help build an institution that supports world-class ML research and real-world impact.

English

606

55K

Lechao Xiao@Locchiu·26 Ara

@_arohan_ All-in-one shampoo is all you need

English

263

rohan anil@_arohan_·25 Ara

I did a bad job naming Shampoo and its variants since 2018. We obviously went deep into various aspects of preconditioning but my colleagues were insistent on the shampoo brand. Now its clear to me that we should have named it: shampoo-pro-ultra-max-high shampoo-lite-medium-high Merry Christmas!

English

202

20.6K

Lechao Xiao retweetou

Amr Khalifa@AmrMAlameen·16 Ara

I am hiring a student researcher to work with our team in Montreal on LLMs architecture and pre-training in spring-summer 2026, if you're excited to push the frontier of research forward, join us to help keeping the TPUs warm. fill out this form: forms.gle/1AfdyCbzjdKi2y…

English

471

35.9K

Lechao Xiao@Locchiu·26 Kas

@QuanquanGu Scaling research + research scaling

English

449

Quanquan Gu@QuanquanGu·26 Kas

Scaling isn’t research?🤣 Scaling is actually some of the most exciting research nowadays.

Yuchen Jin@Yuchenj_UW

“From 2012 to 2020, it was the age of research. From 2020 to 2025, it was the age of scaling. Now, it's back to the age of research again.” I agree.

English

142

33.5K

Lechao Xiao@Locchiu·20 Kas

@eliebakouch posttrained in us <- designed in california

English

745

elie@eliebakouch·19 Kas

> best open-weight LLM by a US company this is cool but i’m not sure about emphasizing the “US” part since the base model is deepseek V3

Drishan Arora@drishanarora

Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of OpenAI’s GPT-OSS, Nvidia’s Nemotron and Meta’s Llama). We also built an interface where you can try the model (it’s free and we don’t store any chats): chat.deepcogito.com Additionally, you can download the model on @huggingface, or try it out on @openrouter, @togethercompute, @FireworksAI_HQ , @ollama cloud, @runpod, @baseten, or run it locally using @ollama or @UnslothAI. This model uses significantly fewer tokens amongst any similar capability models, because it has better reasoning capabilities. You will also notice improvements across instruction following, coding, longer queries, multi-turn and creativity. 📌 Model Weights: huggingface.co/collections/de… 📌Openrouter: openrouter.ai/deepcogito/cog… 📌 HF Blog: huggingface.co/blog/deepcogit… Some notes on our approach + design choices below 👇

English

255

25.1K

Lechao Xiao@Locchiu·23 Eki

First scaling law: performance follows a power law of compute, with its exponent governed by science and engineering. Second scaling law: the total improvement of this law follows a power law of resource, with its exponent governed by vision and conviction.

English

757

Lechao Xiao@Locchiu·22 Eki

@vinaysrao Congrats Vinay, this is super cool! I tried to find the tokenizer and vocab size, but couldn't find it in the paper. do you mind sharing them (also good to update this info in the paper) ?

English

447

Vinay S Rao@vinaysrao·21 Eki

While at Meta, I worked on this optimizer-wrapper (outer step lookahead momentum) we're calling Snoo (arxiv.org/abs/2510.15830). You can use it with AdamW or Muon and see really strong scaling. Here's a plot where we ran it against (tuned) AdamW up to 1e23 training flop scales. The "x"s in the plot are compute-factors i.e the baseline needs "x" more flops to reach the same loss (instead of simply measuring in steps). - We further established a medium-track WR on modded-nanogpt (github.com/KellerJordan/m…) With amazing co-authors (Dominik,Vishal,Michael).

English

234

19.6K

Lechao Xiao@Locchiu·10 Eki

@hyhieu226 lol, analysts are the last defenders of human intelligence

English

343

Hieu Pham@hyhieu226·9 Eki

A friend of mine won an IMO gold, went on to obtain a PhD in algebraic topology, and now works at a frontier AI lab. This guy, however, doesn't know how to do integration by parts. He knows the principle, but treats those tricks as below him. AI models today give the same vibe.

English

123

3.4K

285.5K

Lechao Xiao@Locchiu·5 Eki

@Andrea__M Do you mind summarizing your insight about “1980 nonparametric statistics” vs “ scaling laws” theory paper? We can then begin with this, brain storm new research directions/ basic questions. This can turn into a fruitful discussion on theory of scaling.

English

534

Andrea Montanari@Andrea__M·4 Eki

Honest question. What "scaling laws" theory papers that are not a variation on 1980s nonparametric statistics?

English

134

16K

Lechao Xiao@Locchiu·1 Eki

@LiamFedus @ekindogus @periodiclabs Congrats, Liam and Dogus!

English

William Fedus@LiamFedus·30 Eyl

Today, @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be, running experiments, and learning from the results. Intelligence is necessary, but not sufficient. New knowledge is created when ideas are found to be consistent with reality. And so, at Periodic, we are building AI scientists and the autonomous laboratories for them to operate. Until now, scientific AI advances have come from models trained on the internet. But despite its vastness — it’s still finite (estimates are ~10T text tokens where one English word may be 1-2 tokens). And in recent years the best frontier AI models have fully exhausted it. Researchers seek better use of this data, but as any scientist knows: though re-reading a textbook may give new insights, they eventually need to try their idea to see if it holds. Autonomous labs are central to our strategy. They provide huge amounts of high-quality data (each experiment can produce GBs of data!) that exists nowhere else. They generate valuable negative results which are seldom published. But most importantly, they give our AI scientists the tools to act. We’re starting in the physical sciences. Technological progress is limited by our ability to design the physical world. We’re starting here because experiments have high signal-to-noise and are (relatively) fast, physical simulations effectively model many systems, but more broadly, physics is a verifiable environment. AI has progressed fastest in domains with data and verifiable results - for example, in math and code. Here, nature is the RL environment. One of our goals is to discover superconductors that work at higher temperatures than today's materials. Significant advances could help us create next-generation transportation and build power grids with minimal losses. But this is just one example — if we can automate materials design, we have the potential to accelerate Moore’s Law, space travel, and nuclear fusion. We’re also working to deploy our solutions with industry. As an example, we're helping a semiconductor manufacturer that is facing issues with heat dissipation on their chips. We’re training custom agents for their engineers and researchers to make sense of their experimental data in order to iterate faster. Our founding team co-created ChatGPT, DeepMind’s GNoME, OpenAI’s Operator (now Agent), the neural attention mechanism, MatterGen; have scaled autonomous physics labs; and have contributed to some of the most important materials discoveries of the last decade. We’ve come together to scale up and reimagine how science is done. We’re fortunate to be backed by investors who share our vision, including @a16z who led our $300M round, as well as @Felicis, DST Global, NVentures (NVIDIA’s venture capital arm), @Accel and individuals including @JeffBezos , @eladgil , @ericschmidt, and @JeffDean. Their support will help us grow our team, scale our labs, and develop the first generation of AI scientists.

English

429

443

4.2K

3.5M

Descobrir

@JustinLin610 @hyhieu226 @OpenAI @xai @JCHaswell @TheGregYang @Jianlin_S @peterjliu