Timothy Nguyen

1.8K posts

Timothy Nguyen banner
Timothy Nguyen

Timothy Nguyen

@IAmTimNguyen

Machine learning researcher at @GoogleDeepMind & mathematician. Host of The Cartesian Cafe podcast. All opinions are my own.

London, England انضم Mayıs 2017
458 يتبع12K المتابعون
تغريدة مثبتة
Timothy Nguyen
Timothy Nguyen@IAmTimNguyen·
Quantum field theory textbooks have been lying to you or have left you confused. For a typical passage like the following from Peskin&Schroeder - what does it mean to do a change of variables on an ill-defined path integral? For perturbative QFT, my paper resolves this issue: 🧵
Timothy Nguyen tweet media
English
17
42
472
49.2K
Timothy Nguyen أُعيد تغريده
Surya Ganguli
Surya Ganguli@SuryaGanguli·
Please do apply. This ML theory summer school @Princeton will be amazing! Application deadline is in one week.
Boris Hanin@BorisHanin

🚨 2026 @Princeton ML Theory Summer School Meet your peers Learn from mini-courses by: - Subhabrata Sen - Lenaic Chizat - Sinho Chewi - Elliot Paquette - Elad Hazan - Surya Ganguli August 3 - 14, 2026 One week left to apply! Link 👇 Sponsors: @NSF, @PrincetonAInews, @EPrinceton, @JaneStreetGroup, @DARPA, @PrincetonPLI, Princeton NAM, Princeton AI2, Princeton PACM Some amazing speakers from this and previous years: @subhabratasen90, @LenaicChizat, @poseypaquet, @HazanPrinceton, @SuryaGanguli, @Andrea__M, @TheodorMisiakie, @KrzakalaF, @_brloureiro, @rakhlin, @DimaKrotov, @CPehlevan, @SoledadVillar5, @SebastienBubeck, @tengyuma

English
2
7
88
19.7K
Timothy Nguyen
Timothy Nguyen@IAmTimNguyen·
1. What do you think a mathematical theory of deep learning should look like? Will you be working on it? 2. How should the journal and peer review system be revised in light of AI enabling content generation and paper review at scale? 3. What do you think mathematical activity will look like once AIs become as good if not better than human mathematicians?
English
0
0
17
1.4K
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
What should I ask Terence Tao?
English
528
72
3K
253K
Timothy Nguyen أُعيد تغريده
Jasper Dekoninck
Jasper Dekoninck@j_dekoninck·
How often do LLMs claim to prove false mathematical statements? In our latest benchmark, BrokenArXiv, we find they do so very often. The best model, GPT-5.4, only rejects 40% of incorrect statements obtained by perturbing recent ArXiv papers, and other models do much worse.
Jasper Dekoninck tweet media
English
32
119
829
83.6K
Timothy Nguyen أُعيد تغريده
Christine Yip
Christine Yip@christinetyip·
We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time
Christine Yip tweet mediaChristine Yip tweet media
English
122
264
2.4K
264.7K
Timothy Nguyen أُعيد تغريده
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
968
2.1K
19.4K
3.5M
Timothy Nguyen أُعيد تغريده
Dan Roy
Dan Roy@roydanroy·
How are mathematicians facing the wave of rapidly advancing AI-for-math capabilities? Jeremy Avigad (CMU prof and co-author on the original 2015 system description paper for Lean) just posted a paper with his thoughts in the wake of the Math, Inc. announcement on sphere packing. andrew.cmu.edu/user/avigad/Pa… There are a lot of interesting passages in here, including a bit of the back story of the Math, Inc. bomb drop and how it was initially received by the humans working on the formalization project. But, as for how mathematics proceeds, here's the key last passage: "We need to remember our strengths: mathematicians are problem solvers and theory builders extraordinaire. Rather than fight the use of AI in mathematics, we should own it. It is not enough to keep up with current events and design benchmarks for AI researchers; we need to play an active role in deploying the technology and molding it to our purposes. We also need to learn how to raise our students with the wisdom to use the new technologies appropriately, and we need to be careful that we still manage to impart core mathematical intuitions and understanding. Figuring out how to use AI effectively to achieve our mathematical goals won’t be easy, but mathematicians have always embraced challenges—indeed, the harder, the better. If we face AI head-on and stay true to our values, mathematics will thrive. We just need to show up and get to work." The next few years should be a golden era for mathematics. For those of us working on the frontier, I hope we do well by our mathematician colleagues.
Dan Roy tweet media
English
22
193
852
107.2K
Timothy Nguyen أُعيد تغريده
Math, Inc.
Math, Inc.@mathematics_inc·
We are pleased to share that using Gauss, we have completed a ~200K LOC formalization of Maryna Viazovska’s 2022 Fields Medal theorems on optimal sphere packing in dimensions 8 and 24. This is the only Fields Medal-winning result from this century to be completely formalized, and is the largest single-purpose Lean formalization in history. We are honored to have assisted @SidharthHarihar1 and the rest of the sphere packing team in this achievement. math.inc/sphere-packing
English
45
341
2.3K
396.4K
Timothy Nguyen أُعيد تغريده
Robert Youssef
Robert Youssef@rryssf_·
Google DeepMind just used AlphaEvolve to breed entirely new game-theory algorithms that outperform ones humans spent years designing the discovered algorithms use mechanisms so non-intuitive that no human researcher would have tried them. here's what actually happened and why it matters:
Robert Youssef tweet media
English
16
106
664
44.1K
Timothy Nguyen أُعيد تغريده
gavin leech (Non-Reasoning)
gavin leech (Non-Reasoning)@g_leech_·
New paper on a long-shot I've been obsessed with for a year: How much are AI reasoning gains confounded by expanding the training corpus 10000x? How much LLM performance is down to "local" generalisation (pattern-matching to hard-to-detect semantically equivalent training data)?
gavin leech (Non-Reasoning) tweet mediagavin leech (Non-Reasoning) tweet media
English
32
133
968
221.1K
Timothy Nguyen أُعيد تغريده
Aakash Gupta
Aakash Gupta@aakashgupta·
This paper is quietly one of the most damning findings about current LLM architecture. Google Research tested 7 models across 7 benchmarks. The intervention was embarrassingly simple: paste the prompt twice. The result: 47 wins out of 70 tests, zero losses. Gemini Flash-Lite went from 21% to 97% accuracy on a name retrieval task. By copying and pasting. The reason this works tells you everything about the gap between how people think LLMs process information and how they actually process it. Every token can only look backward. So when you write “here’s a list of 50 names” followed by “what’s the 25th name?”, the list tokens were processed with zero awareness that a question was coming. The question tokens can see the list, but the list never saw the question. Repeating the prompt gives every token a second pass where it can attend to everything else. You’re essentially hacking bidirectional attention into a unidirectional system. And the cost is nearly zero because prefill is parallelized on modern hardware. But here’s what makes this actually interesting: reasoning models already do this. When you enable chain-of-thought, the gains from repetition almost entirely disappear (5 wins, 1 loss, 22 ties). That means reasoning models trained with RL independently learned to repeat the user’s prompt back to themselves before answering. The “thinking” that costs you 10x more tokens and 5x more latency is partly just the model giving itself a second look at your input. Which means a meaningful chunk of what we’re paying for with “reasoning” tokens could be replicated for free at the architecture level. The entire prompt repetition paper is an accidental proof that causal attention is leaving massive performance on the table, and that the industry’s current fix (burn more tokens thinking) is the expensive workaround for a structural limitation nobody’s addressing directly. The teams that figure out efficient bidirectional attention at inference time will compress the reasoning tax to nearly zero. Everyone else will keep selling you tokens to solve an architecture problem.
BURKOV@burkov

LLMs process text from left to right — each token can only look back at what came before it, never forward. This means that when you write a long prompt with context at the beginning and a question at the end, the model answers the question having "seen" the context, but the context tokens were generated without any awareness of what question was coming. This asymmetry is a basic structural property of how these models work. The paper asks what happens if you just send the prompt twice in a row, so that every part of the input gets a second pass where it can attend to every other part. The answer is that accuracy goes up across seven different benchmarks and seven different models (from the Gemini, ChatGPT, Claude, and DeepSeek series of LLMs), with no increase in the length of the model's output and no meaningful increase in response time — because processing the input is done in parallel by the hardware anyway. There are no new losses to compute, no finetuning, no clever prompt engineering beyond the repetition itself. The gap between this technique and doing nothing is sometimes small, sometimes large (one model went from 21% to 97% on a task involving finding a name in a list). If you are thinking about how to get better results from these models without paying for longer outputs or slower responses, that's a fairly concrete and low-effort finding. Read with AI tutor: chapterpal.com/s/1b15378b/pro… Get the PDF: arxiv.org/pdf/2512.14982

English
77
218
2.4K
500.6K
Timothy Nguyen أُعيد تغريده
Julia Kempe
Julia Kempe@KempeLab·
1/ #1stProof. Our second installment — this time tackling Problem 3, with @scottnarmstrong and @MunosRemi Also check out our takeaways — and a short “Humor from your bot” interlude — below.
Julia Kempe tweet media
English
4
20
79
9.3K
Timothy Nguyen أُعيد تغريده
OpenAI
OpenAI@OpenAI·
GPT-5.2 derived a new result in theoretical physics. We’re releasing the result in a preprint with researchers from @the_IAS, @VanderbiltU, @Cambridge_Uni, and @Harvard. It shows that a gluon interaction many physicists expected would not occur can arise under specific conditions. openai.com/index/new-resu…
English
953
1.5K
9.6K
4.5M
Timothy Nguyen
Timothy Nguyen@IAmTimNguyen·
The agent revolution is truly taking off. Today, I used Claude 4.6 for the first time within Google's IDE and I was blown away: Given a high level request (compare two experiments and what settings were different), it: 1) wrote a python file and build rule that correctly made use of internal APIs 2) executed it, automatically fixing mistakes and reunning along the way 3) wrote a diff script based on the experiment metadata it retrieved from the two experiment identifiers using the previous script 4) realized the naive diff would be too complicated for a human to parse and so realized on it's own what a human interpretable diff would be 5) returned a semantically meaningful diff in the terminal, enabling me to glean what was the significant difference between two experiment settings The previous day, I manually spent at least an hour doing the same thing manually. With Claude 4.6 I just sat back and had to click on a few proceed/permission prompts. This is transformational. Anyone who uses a computer should start onboarding and learn how to use agents. Or be left behind.
English
4
0
36
4.3K
Timothy Nguyen أُعيد تغريده
Thang Luong
Thang Luong@lmthang·
The #Aletheia paper is finally available on arXiv arxiv.org/abs/2602.10177! Excited to share the 1st wave of papers on AI for math research! More to come very soon, stay tuned! Blog: deepmind.google/blog/accelerat…
Thang Luong tweet media
Thang Luong@lmthang

6 months in, after the IMO-gold achievement, I’m very excited to share another important milestone: AI can help accelerate knowledge discovery in mathematics, physics, and computer science! We’re sharing Two new papers from @GoogleDeepMind and @GoogleResearch that explore how Gemini #DeepThink together with agentic workflows can empower mathematicians and scientists to tackle professional research problems. Some highlights: The first paper built a research agent #Aletheia, powered by an advanced version of Gemini Deep Think, that can autonomously produce publishable math research and crack open Erdős problems. The second paper, built on similar agentic reasoning ideas, helped resolve bottlenecks in 18 research problems, across algorithms, ML and combinatorial optimization, information theory and economics. See the thread for details about the two papers and the joint blog post.

English
11
81
445
38.9K
Timothy Nguyen
Timothy Nguyen@IAmTimNguyen·
The foundation: sair.foundation And they just held their first kickoff event yesterday!
English
0
7
12
1.5K
Timothy Nguyen
Timothy Nguyen@IAmTimNguyen·
Just learned that Terence Tao has founded the Science + AI foundation (SAIR) for "advancing scientific discovery and guiding AI with scientific principles for humanity". A great boon to have a mathematician of his stature actively shaping the AI revolution in advancing mathematics and the sciences.
English
8
20
187
13.9K
Timothy Nguyen
Timothy Nguyen@IAmTimNguyen·
A great read and a strong case for why AGI is already here: nature.com/articles/d4158… Yes, current frontier AI systems still have many failure modes. But the standards skeptics apply to AI systems for why they are not generally intelligent would also rule out humans from being generally intelligent.
English
4
1
12
1.4K
Timothy Nguyen أُعيد تغريده
Surya Ganguli
Surya Ganguli@SuryaGanguli·
Our new paper "From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers" arxiv.org/abs/2602.06923 lead by @ZimingLiu11 w/ @naturecomputes @AToliasLab Prev work suggests transformers trained on planetary motion do not learn a world model. We fix this: Key ingredients in the fix: 1) promote spatial continuity in the learned tokenization of space 2) ensure noise robustness of future predictions With these two ingredients the transformer learns a Keplerian world model (Kepler's elliptical equations can be decoded from the transformer hidden states). 3) reduce the context length to 2. Then (and only then) is Newton's gravitational world model learned (Newton's force law can be decoded from transformer hidden states). See @ZimingLiu11's excellent thread for more details. x.com/ZimingLiu11/st…
Surya Ganguli tweet media
English
16
67
390
23.9K
Timothy Nguyen
Timothy Nguyen@IAmTimNguyen·
An AI-math benchmark made by research mathematicians is here. Time limit: one week.
Timothy Nguyen tweet media
English
5
1
46
5.1K