Zachary Charles
892 posts

Zachary Charles
@MatharyCharles
distributed machine learning @ google | sometimes mathematician


Given what current-gen LLMs (say, in math, but whatever) can do, I think their apparent limitations are kind of mysterious. What is the blocker preventing, at present, high quality fully autonomous work?

I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.



Success! It took me a little more than two days to formalize our paper, which was also proved by GPT-5.2 Pro. There are three axioms (assumptions), which are very basic optimization facts. In total, the formalization consists of 2,685 lines of code. I used a combination of @HarmonicMath Aristotle agent and @claudeai.

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵


Opus 4.6's solution: >>>>>>>>>>>[-]+<<<<<<<<<<<[-]>[-]>[-]>[-]<,>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<[<------------------------------------------------<<[->++++++++++<]>[-<+>]>[-<<+>>],>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<]<<[-]>[-]>[-]>[-]>[-]<<<<<>>>[-]>>[-]<<<<<[->>>+>>+<<<<<]>>>>>[-<<<<<+>>>>>]<<-<<<[->>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<]>>>>>>>>>>>>>>>>>>>>>>[-<<<<<<<<<<[-]>[-]>[-]>[-]<,>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<[<------------------------------------------------<<[->++++++++++<]>[-<+>]>[-<<+>>],>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<]<<[-]>[-]>[-]>[-]>[-]<<<<<>+>>>>>>>>>[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>]<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]<<[->>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<]>>>>>>>>>>>>>>>>>>>>>>>>>[->>>>>>>++<<<<<<<<<<<>>[-]>>>>>>>>[-]<<<<<<<<<<[->>+>>>>>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<<<[-]>>>>>>>[-]<<<<<<<<<[->>+>>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<<<[-<<<<<<<<<<<<+>>>>>>>>>>>>]>[-<<<<<<<<<<<<+>>>>>>>>>>>>]<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<[-]>>>>>>[-]<<<<<<<<<<[->>>>+>>>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<[->>[-]<<<<[->>>>+<<<<]>>>>>[-]+>[-]<<[>[-]>+<<[-<<<<+>>>>]]>[<<[-]+>>-]>[<<<<<<->>>>>>-]<<<<]<>>[[-]<<<[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<+>>[-]>>>>[-]<<<<<<<[->>>+>>>>+<<<<<<<]>>>>>>>[-<<<<<<<+>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<<<[->>>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<<]>>>[[-]<<<<[-]>><[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<+><[-<+>]>>>]<<>>]<<<<<[-<<<<<<<<<<<<+>>>>>>>>>>>>]>[-<<<<<<<<<<<<+>>>>>>>>>>>>]>><<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<-->>[-]+<<[>>[-]<<[-]]>>[<<<[-]>>>-]<<<[<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<-->>[-]+<<[>>[-]<<[-]]>>[<<<[-]>>>-]<<<]<<<<[-]>>[-<<+>>]>>>>>>>>[-]<<<<<<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>]>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]>>>>>>>>>>>><<<[-]>>>>[-]>[-]<<<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<<[-]>[<[-]+>[-]]<[<[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<<<[->>>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<<]>>>[[-]<<<<<[-]>>><<<[-]>>>>>>>>[-]<<<<<<<<<[->+>>>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<>>]<<<<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>>>[-]>[-]<<<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<<[-]>[<[-]+>[-]]<]<<<[-<<+>>]>>><<<<[-]>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>[[-]<<<<<<<<---------->+>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>]<<<<<<<<[->>>>+<<<<]>>>>><<<<[-<+>]>>>><<<<[-]>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>[[-]<<<<<<<<---------->+>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>]<<<<<<<<[->>>+<<<]>>>>><<<<[->+<]>>>>>>>>[-]<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>[>>>[-]+<<<<<<<++++++++++++++++++++++++++++++++++++++++++++++++.------------------------------------------------>>>>[-]]<>[-]>>>>[-]<<<<<<<[->>>+>>>>+<<<<<<<]>>>>>>>[-<<<<<<<+>>>>>>>]<<<<<>>[-]>>>[-]<[-<<+>>>+<]>[-<+>]<<<<<>>[-<+>]<[>>>[-]+<<<<<<++++++++++++++++++++++++++++++++++++++++++++++++.------------------------------------------------>>>[-]]<<++++++++++++++++++++++++++++++++++++++++++++++++.>>[-]<>++++++++++.<


The fact that you need to provide a specialized harness clearly shows the model *does not* encode the kind of metalearning knowledge and problem-solving strategies that humans use. Humans solve novel problems without being told how to proceed step by step. AGI would *not* need a custom harness here. As an aside, the models still performed poorly at that point, they did not "crush" the task

Doing the reading is a superpower, and it's even better in a world where "no one" is doing the reading. (Inspired by a conversation I had with some college students.)


I'm so mad. Once again, got a paper accepted, but Ref2 wants me to add 4 references all having a single author in common, whereas Ref1 suggests 6 with another common author! This unethical behavior should be stopped by the editors. Should I say who those authors are? Thoughts?



Our workshop was rejected by #ICML2026. Despite having 3 professors (2 full profs) and 2 senior research scientists, the only reason for rejection was "you got an undergrad on the organizing committee," who is actually a highly competent incoming PhD student. (1/)


A few thoughts from talking to some of my favorite mathematicians (both for their work and, like, as people and friends) at the expMath meeting:



this is my favorite thing to do



A few months ago I bumped into Anand Patel, who had been my algebraic geometry TA in college, visiting Google DeepMind. He agreed to try out an agent I was building called Aletheia. Fast forward: Anand prompted Aletheia to solve a problem about simplicity of the Hodge bundle on M_g that had been floating around (a part of) the algebraic geometry community for at least ten years. Check out his paper at arxiv.org/pdf/2603.19052

