Zachary Charles

892 posts

Zachary Charles banner
Zachary Charles

Zachary Charles

@MatharyCharles

distributed machine learning @ google | sometimes mathematician

Seattle Entrou em Eylül 2012
427 Seguindo1.4K Seguidores
Zachary Charles
Zachary Charles@MatharyCharles·
A useful meditation on LLMs but mainly I'm struck by how confident all the replies to it are. I don't think you should be as certain as you are people. E.g. see Nicolas Carlini's experiment in getting people to put 90% confidence intervals over future AI capabilities.
Zachary Charles tweet media
Daniel Litt@littmath

Given what current-gen LLMs (say, in math, but whatever) can do, I think their apparent limitations are kind of mysterious. What is the blocker preventing, at present, high quality fully autonomous work?

English
1
0
2
266
Zachary Charles
Zachary Charles@MatharyCharles·
I tried @ChaseBrowe32432's radical strategy of "copy and paste the prompt" for H03 (in brainfuck) into Gemini 3.1 Pro. It got most of the test cases right but not all. Did it again with a simple coding harness and it got the right answer pretty quickly. I think as long as they can write and execute an interpreter, good models can solve these.
Chase Brower@ChaseBrowe32432

I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.

English
0
0
4
492
Kimon Fountoulakis
Kimon Fountoulakis@kfountou·
One thing I verified by doing this is how much MathLib is lacking in linear algebra and optimization. This causes both @HarmonicMath and @claudeai to give up. I had to dive in and provide a few results myself, even some basic facts on the eigenvalues of normalized Laplacian matrices or convergence of optimization methods. The bottleneck right now isn't the systems but the libraries.
Kimon Fountoulakis@kfountou

Success! It took me a little more than two days to formalize our paper, which was also proved by GPT-5.2 Pro. There are three axioms (assumptions), which are very basic optimization facts. In total, the formalization consists of 2,685 lines of code. I used a combination of @HarmonicMath Aristotle agent and @claudeai.

English
5
3
74
5K
Zachary Charles retweetou
Chase Brower
Chase Brower@ChaseBrowe32432·
Opus 4.6 in webui can solve even the "extremely hard" problems btw, not sure what their precise methodology was but they must have heavily hamstrung the models.
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
11
14
158
30.1K
Chase Brower
Chase Brower@ChaseBrowe32432·
@MatharyCharles I and others in the thread tested several problems including extra-hard problems in e.g. brainfuck, i've also tested some in whitespace. Genuinely I'm begging you to just open up their hf repo and paste problems into chatgpt or claude webui x.com/ChaseBrowe3243…
Chase Brower@ChaseBrowe32432

Opus 4.6's solution: >>>>>>>>>>>[-]+<<<<<<<<<<<[-]>[-]>[-]>[-]<,>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<[<------------------------------------------------<<[->++++++++++<]>[-<+>]>[-<<+>>],>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<]<<[-]>[-]>[-]>[-]>[-]<<<<<>>>[-]>>[-]<<<<<[->>>+>>+<<<<<]>>>>>[-<<<<<+>>>>>]<<-<<<[->>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<]>>>>>>>>>>>>>>>>>>>>>>[-<<<<<<<<<<[-]>[-]>[-]>[-]<,>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<[<------------------------------------------------<<[->++++++++++<]>[-<+>]>[-<<+>>],>[-]+>[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>][-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<---------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<[-]>[-]<<<[->>+>+<<<]>>>[-<<<+>>>]<-------------------------------->[-]+<[>[-]<[-]]>[<<[-]>>[-]]<<]<<[-]>[-]>[-]>[-]>[-]<<<<<>+>>>>>>>>>[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>]<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]<<[->>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<]>>>>>>>>>>>>>>>>>>>>>>>>>[->>>>>>>++<<<<<<<<<<<>>[-]>>>>>>>>[-]<<<<<<<<<<[->>+>>>>>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<<<[-]>>>>>>>[-]<<<<<<<<<[->>+>>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<<<[-<<<<<<<<<<<<+>>>>>>>>>>>>]>[-<<<<<<<<<<<<+>>>>>>>>>>>>]<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<[-]>>>>>>[-]<<<<<<<<<<[->>>>+>>>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<[->>[-]<<<<[->>>>+<<<<]>>>>>[-]+>[-]<<[>[-]>+<<[-<<<<+>>>>]]>[<<[-]+>>-]>[<<<<<<->>>>>>-]<<<<]<>>[[-]<<<[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<+>>[-]>>>>[-]<<<<<<<[->>>+>>>>+<<<<<<<]>>>>>>>[-<<<<<<<+>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<<<[->>>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<<]>>>[[-]<<<<[-]>><[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<<+><[-<+>]>>>]<<>>]<<<<<[-<<<<<<<<<<<<+>>>>>>>>>>>>]>[-<<<<<<<<<<<<+>>>>>>>>>>>>]>><<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<-->>[-]+<<[>>[-]<<[-]]>>[<<<[-]>>>-]<<<[<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<-->>[-]+<<[>>[-]<<[-]]>>[<<<[-]>>>-]<<<]<<<<[-]>>[-<<+>>]>>>>>>>>[-]<<<<<<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>]>[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<[<<<<<<<<<<<<[-]+>[-]>[-]>>>>[-<<<<<+>+>>>>]<<<<[->>>>+<<<<]<[<[-]>[-]]<]>>>>>>>>>>>><<<[-]>>>>[-]>[-]<<<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<<[-]>[<[-]+>[-]]<[<[-]>>>>>>[-]<<<<<<<<<[->>>+>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>>[-]>[-]>[-]>[-]<<<<<<[->>>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<<]>>>[[-]<<<<<[-]>>><<<[-]>>>>>>>>[-]<<<<<<<<<[->+>>>>>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<<<<>>]<<<<<[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>>>>>>>>[-]>[-]<<<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<<[-]>[<[-]+>[-]]<]<<<[-<<+>>]>>><<<<[-]>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>[[-]<<<<<<<<---------->+>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>]<<<<<<<<[->>>>+<<<<]>>>>><<<<[-<+>]>>>><<<<[-]>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>[[-]<<<<<<<<---------->+>>>>>[-]>>>>[-]<<<<<<<<<<[->>>>>>+>>>>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<<<<<>>[-]<<>>+++++++++<<>>>[-]>[-]>[-]>[-]<<<<<[->>>[-]<<[->>+<<]>>>[-]+>[-]<<[>[-]>+<<[-<<+>>]]>[<<[-]+>>-]>[<<<<->>>>-]<<<<<]<>>>]<<<<<<<<[->>>+<<<]>>>>><<<<[->+<]>>>>>>>>[-]<<<<>[-]>>>>[-]<<<<<<<<[->>>>+>>>>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<<>[>>>[-]+<<<<<<<++++++++++++++++++++++++++++++++++++++++++++++++.------------------------------------------------>>>>[-]]<>[-]>>>>[-]<<<<<<<[->>>+>>>>+<<<<<<<]>>>>>>>[-<<<<<<<+>>>>>>>]<<<<<>>[-]>>>[-]<[-<<+>>>+<]>[-<+>]<<<<<>>[-<+>]<[>>>[-]+<<<<<<++++++++++++++++++++++++++++++++++++++++++++++++.------------------------------------------------>>>[-]]<<++++++++++++++++++++++++++++++++++++++++++++++++.>>[-]<>++++++++++.<

English
2
0
16
1.1K
Chase Brower
Chase Brower@ChaseBrowe32432·
The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in
Chase Brower tweet media
François Chollet@fchollet

The fact that you need to provide a specialized harness clearly shows the model *does not* encode the kind of metalearning knowledge and problem-solving strategies that humans use. Humans solve novel problems without being told how to proceed step by step. AGI would *not* need a custom harness here. As an aside, the models still performed poorly at that point, they did not "crush" the task

English
8
3
182
15.6K
Zachary Charles
Zachary Charles@MatharyCharles·
This was true even before LLMs. I remember taking a course on world music with a ton of (fascinating!) reading paired with album listening and regularly I was just...the only one who did any of it. It meant that I basically got to talk to the professor 1:1 which was great!
Patrick McKenzie@patio11

Doing the reading is a superpower, and it's even better in a world where "no one" is doing the reading. (Inspired by a conversation I had with some college students.)

English
0
0
2
256
Zachary Charles
Zachary Charles@MatharyCharles·
@Carboniferoys He should meet the guy who proposed injecting people with small amounts of a virus to build immunity, instead of whatever is in a vaccine
English
0
0
31
546
Anti-Jungian Aktion
Anti-Jungian Aktion@Carboniferoys·
Raw milk guy who reinvents pasteurization from first principles is a great ironic bit, but unfortunately we live in a world where that's a real type of guy.
Anti-Jungian Aktion tweet media
English
7
101
1.3K
12.6K
Zachary Charles
Zachary Charles@MatharyCharles·
@qberthet PhD students as ACs is new to me. Seems bad! I tend to believe that the large conferences will wane some day (how useful are publications at them professionally these days?), but that culture has momentum, and that this probably needs fixing.
English
0
0
10
589
Quentin Berthet
Quentin Berthet@qberthet·
@MatharyCharles That ship has long sailed. I see plenty of PhD students as AC and undergrads as reviewers. Just a byproduct of having to handle 30k+ submissions for the big confs.
English
2
0
3
1K
Zachary Charles
Zachary Charles@MatharyCharles·
I'm not commenting on this specific case as I do not know enough, but I think points at a difficult but necessary conversation in AI: grad students (and undergrads even more) are still learning, a lot, and we should stop treating them as being interchangeable with people who have finished grad school. I see a lot of reviewing from grad students, and while some do a good job, there are clearly cases where the student hasn't learned enough subject matter, history, and intuition to do a good job. But because there's incentives to publish early and often, our review pool has to reflect the authorship pool, and I think it isn't always a healthy dynamic. Again this is not about any specifics - I know graduate students who are wildly capable of all this. But graduate school is just that, school, and I think we've lost sight of that a bit.
Freda Shi@fredahshi

Our workshop was rejected by #ICML2026. Despite having 3 professors (2 full profs) and 2 senior research scientists, the only reason for rejection was "you got an undergrad on the organizing committee," who is actually a highly competent incoming PhD student. (1/)

English
7
2
59
13.6K
Zachary Charles retweetou
Courtney Paquette
Courtney Paquette@cypaquette·
ICML workshop acceptance rate was 18% this year (due to space constraints), with submissions up 60% from last year. That meant many very strong, high-quality workshop proposals could not be accepted. (1/) @neu_rips @fredahshi
English
7
7
107
26.5K
Zachary Charles retweetou
LocNil
LocNil@locnilGD·
@auroriafantasia this is like the evil version of that "i am a baby kitty where is mama" thing
LocNil tweet media
English
4
8
512
11.7K
Thang Luong
Thang Luong@lmthang·
@MatharyCharles You can take a look at the HAI card and the transcript. The author specified only the original problem, no hint was given to Aletheia.
English
1
1
4
324