Zachary Charles

892 posts

Zachary Charles

@MatharyCharles

distributed machine learning @ google | sometimes mathematician

Seattle Entrou em Eylül 2012

427 Seguindo1.4K Seguidores

Zachary Charles@MatharyCharles·1d

nicholas.carlini.com/writing/2025/f…

ZXX

Zachary Charles@MatharyCharles·1d

A useful meditation on LLMs but mainly I'm struck by how confident all the replies to it are. I don't think you should be as certain as you are people. E.g. see Nicolas Carlini's experiment in getting people to put 90% confidence intervals over future AI capabilities.

Daniel Litt@littmath

Given what current-gen LLMs (say, in math, but whatever) can do, I think their apparent limitations are kind of mysterious. What is the blocker preventing, at present, high quality fully autonomous work?

English

266

Zachary Charles@MatharyCharles·1d

I tried @ChaseBrowe32432's radical strategy of "copy and paste the prompt" for H03 (in brainfuck) into Gemini 3.1 Pro. It got most of the test cases right but not all. Did it again with a simple coding harness and it got the right answer pretty quickly. I think as long as they can write and execute an interpreter, good models can solve these.

Chase Brower@ChaseBrowe32432

I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.

English

492

Zachary Charles@MatharyCharles·1d

@damekdavis @kfountou @HarmonicMath @claudeai I'm a little surprised by how much machinery Aristotle had to use to get that first order condition, but very cool how easy it made formalizing this result!

English

Damek@damekdavis·1d

@MatharyCharles @kfountou @HarmonicMath @claudeai Man the things you used to have to do get a compiling proof. Eventually Aristotle proved the entire result in 15 minutes. github.com/damek/gd-lean

English

183

Kimon Fountoulakis@kfountou·1d

One thing I verified by doing this is how much MathLib is lacking in linear algebra and optimization. This causes both @HarmonicMath and @claudeai to give up. I had to dive in and provide a few results myself, even some basic facts on the eigenvalues of normalized Laplacian matrices or convergence of optimization methods. The bottleneck right now isn't the systems but the libraries.

Kimon Fountoulakis@kfountou

Success! It took me a little more than two days to formalize our paper, which was also proved by GPT-5.2 Pro. There are three axioms (assumptions), which are very basic optimization facts. In total, the formalization consists of 2,685 lines of code. I used a combination of @HarmonicMath Aristotle agent and @claudeai.

English

Zachary Charles retweetou

Chase Brower@ChaseBrowe32432·4d

Opus 4.6 in webui can solve even the "extremely hard" problems btw, not sure what their precise methodology was but they must have heavily hamstrung the models.

Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English

158

30.1K

Zachary Charles@MatharyCharles·2d

@ChaseBrowe32432 Lol fair enough! I'll play around with it when my kid goes to sleep tonight

English

206

Chase Brower@ChaseBrowe32432·2d

@MatharyCharles I and others in the thread tested several problems including extra-hard problems in e.g. brainfuck, i've also tested some in whitespace. Genuinely I'm begging you to just open up their hf repo and paste problems into chatgpt or claude webui x.com/ChaseBrowe3243…

Chase Brower@ChaseBrowe32432

Opus 4.6's solution: >>>>>>>>>>>[-]+<<<<<<<<<<<[-]>[-]>[-]>[-]<,>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<[<------------------------------------------------<<[->++++++++++<]>[-<+>]>[-<<+>>],>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<]<<[-]>[-]>[-]>[-]>[-]<<<<<>>>[-]>>[-]<<<<<[->>>+>>+<<<<<]>>>>>[-<<<<<+>>>>>]<<-<<<[->>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<]>>>>>>>>>>>>>>>>>>>>>>[-<<<<<<<<<<[-]>[-]>[-]>[-]<,>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<[<------------------------------------------------<<[->++++++++++<]>[-<+>]>[-<<+>>],>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<]<<[-]>[-]>[-]>[-]>[-]<<<<<>+>>>>>>>>>[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>]<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]<<[->>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<]>>>>>>>>>>>>>>>>>>>>>>>>>[->>>>>>>++<<<<<<<<<<<>>[-]>>>>>>>>[-]<<<<<<<<<<[->>+>>>>>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<<<[-]>>>>>>>[-]<<<<<<<<<[->>+>>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<<<[-<<<<<<<<<<<<+>>>>>>>>>>>>]>[-<<<<<<<<<<<<+>>>>>>>>>>>>]<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<[-]>>>>>>[-]<<<<<<<<<<[->>>>+>>>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<[->>[-]<<<<[->>>>+<<<<]>>>>>[-]+>[-]<<[>[-]>+<<[-<<<<+>>>>]]>[<<[-]+>>-]>[<<<<<<->>>>>>-]<<<<]<>>[[-]<<<[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<+>>[-]>>>>[-]<<<<<<<[->>>+>>>>+<<<<<<<]>>>>>>>[-<<<<<<<+>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<<<[->>>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<<]>>>[[-]<<<<[-]>><[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<+><[-<+>]>>>]<<>>]<<<<<[-<<<<<<<<<<<<+>>>>>>>>>>>>]>[-<<<<<<<<<<<<+>>>>>>>>>>>>]>><<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<-->>[-]+<<[>>[-]<<[-]]>>[<<<[-]>>>-]<<<[<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<-->>[-]+<<[>>[-]<<[-]]>>[<<<[-]>>>-]<<<]<<<<[-]>>[-<<+>>]>>>>>>>>[-]<<<<<<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>]>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]>>>>>>>>>>>><<<[-]>>>>[-]>[-]<<<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<<[-]>[<[-]+>[-]]<[<[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<<<[->>>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<<]>>>[[-]<<<<<[-]>>><<<[-]>>>>>>>>[-]<<<<<<<<<[->+>>>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<>>]<<<<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>>>[-]>[-]<<<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<<[-]>[<[-]+>[-]]<]<<<[-<<+>>]>>><<<<[-]>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>[[-]<<<<<<<<---------->+>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>]<<<<<<<<[->>>>+<<<<]>>>>><<<<[-<+>]>>>><<<<[-]>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>[[-]<<<<<<<<---------->+>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>]<<<<<<<<[->>>+<<<]>>>>><<<<[->+<]>>>>>>>>[-]<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>[>>>[-]+<<<<<<<++++++++++++++++++++++++++++++++++++++++++++++++.------------------------------------------------>>>>[-]]<>[-]>>>>[-]<<<<<<<[->>>+>>>>+<<<<<<<]>>>>>>>[-<<<<<<<+>>>>>>>]<<<<<>>[-]>>>[-]<[-<<+>>>+<]>[-<+>]<<<<<>>[-<+>]<[>>>[-]+<<<<<<++++++++++++++++++++++++++++++++++++++++++++++++.------------------------------------------------>>>[-]]<<++++++++++++++++++++++++++++++++++++++++++++++++.>>[-]<>++++++++++.<

English

1.1K

Chase Brower@ChaseBrowe32432·2d

The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in

François Chollet@fchollet

The fact that you need to provide a specialized harness clearly shows the model *does not* encode the kind of metalearning knowledge and problem-solving strategies that humans use. Humans solve novel problems without being told how to proceed step by step. AGI would *not* need a custom harness here. As an aside, the models still performed poorly at that point, they did not "crush" the task

English

182

15.6K

Zachary Charles@MatharyCharles·2d

This was true even before LLMs. I remember taking a course on world music with a ton of (fascinating!) reading paired with album listening and regularly I was just...the only one who did any of it. It meant that I basically got to talk to the professor 1:1 which was great!

Patrick McKenzie@patio11

Doing the reading is a superpower, and it's even better in a world where "no one" is doing the reading. (Inspired by a conversation I had with some college students.)

English

256

Zachary Charles@MatharyCharles·2d

@Carboniferoys He should meet the guy who proposed injecting people with small amounts of a virus to build immunity, instead of whatever is in a vaccine

English

546

Anti-Jungian Aktion@Carboniferoys·2d

Raw milk guy who reinvents pasteurization from first principles is a great ironic bit, but unfortunately we live in a world where that's a real type of guy.

English

101

1.3K

12.6K

Zachary Charles@MatharyCharles·2d

Not always possible but I would probably just omit the references and message an editor as to why you did not add these when you provide the final draft.

Joaquin Barroso@joaquinbarroso

I'm so mad. Once again, got a paper accepted, but Ref2 wants me to add 4 references all having a single author in common, whereas Ref1 suggests 6 with another common author! This unethical behavior should be stopped by the editors. Should I say who those authors are? Thoughts?

English

350

Zachary Charles@MatharyCharles·2d

@qberthet PhD students as ACs is new to me. Seems bad! I tend to believe that the large conferences will wane some day (how useful are publications at them professionally these days?), but that culture has momentum, and that this probably needs fixing.

English

589

Quentin Berthet@qberthet·2d

@MatharyCharles That ship has long sailed. I see plenty of PhD students as AC and undergrads as reviewers. Just a byproduct of having to handle 30k+ submissions for the big confs.

English

Zachary Charles@MatharyCharles·2d

I'm not commenting on this specific case as I do not know enough, but I think points at a difficult but necessary conversation in AI: grad students (and undergrads even more) are still learning, a lot, and we should stop treating them as being interchangeable with people who have finished grad school. I see a lot of reviewing from grad students, and while some do a good job, there are clearly cases where the student hasn't learned enough subject matter, history, and intuition to do a good job. But because there's incentives to publish early and often, our review pool has to reflect the authorship pool, and I think it isn't always a healthy dynamic. Again this is not about any specifics - I know graduate students who are wildly capable of all this. But graduate school is just that, school, and I think we've lost sight of that a bit.

Freda Shi@fredahshi

Our workshop was rejected by #ICML2026. Despite having 3 professors (2 full profs) and 2 senior research scientists, the only reason for rejection was "you got an undergrad on the organizing committee," who is actually a highly competent incoming PhD student. (1/)

English

13.6K

Zachary Charles retweetou

Courtney Paquette@cypaquette·2d

ICML workshop acceptance rate was 18% this year (due to space constraints), with submissions up 60% from last year. That meant many very strong, high-quality workshop proposals could not be accepted. (1/) @neu_rips @fredahshi

English

107

26.5K

Zachary Charles@MatharyCharles·2d

Really thoughtful observations about AI for formal mathematics. The point about what works well in Lean and what is useful for a mathematician often being different is especially apt

Talia Ringer 🕊@TaliaRinger

A few thoughts from talking to some of my favorite mathematicians (both for their work and, like, as people and friends) at the expMath meeting:

English

553

Zachary Charles retweetou

LocNil@locnilGD·3d

@auroriafantasia this is like the evil version of that "i am a baby kitty where is mama" thing

English

512

11.7K

Zachary Charles@MatharyCharles·3d

Google AI Search is maybe my favorite language model, unironically

auroria/darkheart 🏳️‍⚧️@auroriafantasia

this is my favorite thing to do

English

198

Thang Luong@lmthang·3d

@MatharyCharles You can take a look at the HAI card and the transcript. The author specified only the original problem, no hint was given to Aletheia.

English

324

Zachary Charles@MatharyCharles·3d

@lmthang I missed that, thank you!

English

Thang Luong@lmthang·3d

Update: #Aletheia has now powered 8 math research papers! 📈 Our most recent success, “The Simplicity of the Hodge Bundle,” was solved fully autonomously. It’s at Level 2 “Publishable research” per our categorization. More in thread!

Tony Feng@tonylfeng

A few months ago I bumped into Anand Patel, who had been my algebraic geometry TA in college, visiting Google DeepMind. He agreed to try out an agent I was building called Aletheia. Fast forward: Anand prompted Aletheia to solve a problem about simplicity of the Hodge bundle on M_g that had been floating around (a part of) the algebraic geometry community for at least ten years. Check out his paper at arxiv.org/pdf/2603.19052

English

179

13.7K

Descobrir

@ChaseBrowe32432 @damekdavis @kfountou @HarmonicMath @claudeai @Carboniferoys @qberthet @neu_rips