michael

1.5K posts

michael banner
michael

michael

@_michaelginn

PhD student at @BoulderNLP @lecslab. LLMs for rare languages, automata, synthetic data

Boulder, CO Katılım Kasım 2018
314 Takip Edilen227 Takipçiler
michael
michael@_michaelginn·
@weaponofkill @ChaseBrowe32432 Not just brainfuck specifically. The point is you can’t make a general causal claim (more abstractions causes better LLM performance) unless you, minimally, have some quantitative evidence
English
0
0
0
11
Nate
Nate@weaponofkill·
@_michaelginn @ChaseBrowe32432 > I would need to see proof that 1) brainfuck has “less abstractions” than other languages You have to be trolling
English
1
0
1
13
Chase Brower
Chase Brower@ChaseBrowe32432·
Opus 4.6 in webui can solve even the "extremely hard" problems btw, not sure what their precise methodology was but they must have heavily hamstrung the models.
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
9
6
93
16.5K
JC wolf
JC wolf@JCwolf123321·
@_michaelginn @Michael_Druggan Why are you talking about the smartest? How many tries do you think it would take an average programmer? How long do you think it would take for them to make hello world from scratch in Brainfuck?
English
1
0
1
140
Michael Druggan
Michael Druggan@Michael_Druggan·
I never respected this guy but now I respect him even less. This is an absolutely braindead take when you look at the details. Not only is the task something almost no human programmers can do either (one-shotting programs in ridiculous languages like brainfuck) the models can solve it just fine when allowed to use their full capabilities in an agentic harness.
ib@Indian_Bronson

More proof LLMs aren't conscious and aren't generalizing any information, and therefore aren't going to become generally intelligent, but are in fact (still extremely useful) trained statistical responders.

English
15
3
200
24K
michael
michael@_michaelginn·
@MancerAI_ @Michael_Druggan Because I think it’s useful to understand his framing of it. And if his definition includes this task, then the paper is obviously a useful benchmark.
English
0
0
0
11
MancerAI
MancerAI@MancerAI_·
@_michaelginn @Michael_Druggan "I don’t have one because I don’t think it’s a useful concept" - so why do you introduce a concept to the discussion if you don't have a definition nor find it useful? Seems counterproductive both to the discussion and at large if you don't like the concept
English
1
0
0
12
Eugene
Eugene@HugeLeters·
@_michaelginn @Michael_Druggan this is just a trivially obvious fact to me which does not speak of anyone's intelligence at a reasonable level - its a language with a deliberately obscure syntax, made to be incomprehensible to people, there's nothing surprising even its creator wouldnt write it well
English
1
0
5
36
michael
michael@_michaelginn·
@JCwolf123321 @Michael_Druggan I bet they could do it in five tries or less. I happen to think there’s plenty of very smart people out there.
English
1
0
0
134
JC wolf
JC wolf@JCwolf123321·
@_michaelginn @Michael_Druggan I mean this with full sincerity: Do you know what the language brainfuck is? It might be the *most* unintuitive programming language. I'm not saying the creator of Brainfuck probably couldn't do the tasks, I *am* saying they probably don't complete them first try.
English
1
0
6
158
michael
michael@_michaelginn·
@Michael_Druggan you don’t think there’s *any* human who could do the task? Even like, the creator of the language?
English
2
0
0
287
Michael Druggan
Michael Druggan@Michael_Druggan·
@_michaelginn ASI should be able to. AGI is only supposed to be human level and humans struggle with writing brainfuck programs a lot. Like seriously ask the best programmer you know if they can write a nontrivial program in brainfuck without any scratch work or testing.
English
2
0
21
1.5K
michael
michael@_michaelginn·
@Michael_Druggan @fchollet That would be the case if these people actually had hard evidence for specific claims of human failures, but it’s always just vibes based
English
0
0
1
95
Michael Druggan
Michael Druggan@Michael_Druggan·
@fchollet It is genuinely true that many of the failure modes observed in LLMs are also observed in humans and I think pointing this out when it comes up is important.
English
2
0
29
1.2K
François Chollet
François Chollet@fchollet·
When the latest AI systems can't do something, there's a category of people who will immediately say, "well humans can't do it either!" - Then they stop saying it when AI improves a bit. Been hearing it for 4+ years, "humans can't reason either", "humans can't adapt to a task they haven't been prepared for", "humans can't follow instructions", "humans also suffer from hallucinations", etc. Until 2025 I was frequently told "humans can't do ARC 1 tasks either" (in reality any normally smart human would do >95% on ARC 1 if properly incentivized). Now that AI saturates ARC 1 they've completely stopped saying this.
François Chollet@fchollet

In general I've been sensing a new current deep learning maximalists recently, going from "our models can definitely reason" to "well our models can't reason, but neither can humans!"

English
52
14
216
25.7K
michael
michael@_michaelginn·
@a1exwd @pmddomingos LLMs are certainly trained on the languages in the test, since there is code for them online.
English
0
0
2
92
AlexWD
AlexWD@a1exwd·
If I gave you a test in a language you've never been introduced to how well would you perform? Probably not well. Would that performance indicate that you lack general reasoning capabilities? Now that AGI is here it seems that we're resorting to giving AIs tests far beyond what a human would be capable of and then calling them not AGI once they "fail". Hilarious.
English
8
1
13
949
michael
michael@_michaelginn·
@ChaseBrowe32432 I get your intuition just fine on a specific example. I would definitely need actual evidence---not just your feelings---to believe the general claim.
English
0
0
0
29
Chase Brower
Chase Brower@ChaseBrowe32432·
this does not require empirical measurement; this should be extremely simple for you to identify a priori. think for like 2 seconds: how many logical steps are required to read in a dynamic-length int array in C? now how many logical steps are required to read in a dynamic-length int array in brainfuck? the code objectively requires greatly more serial depth. if you can't understand this you just have a skill issue, i don't know what to tell you. maybe you'd understand if you actually attempt to implement these in C and brainfuck respectively.
English
2
0
6
74
michael
michael@_michaelginn·
@ChaseBrowe32432 Again, this is a claim that you’re making with no evidence. I would need to see proof that 1) brainfuck has “less abstractions” than other languages, according to some quantitative metric, and 2) that metric correlates with agent performance.
English
2
0
0
64
Chase Brower
Chase Brower@ChaseBrowe32432·
@_michaelginn you don't understand the point here; this is true for any agent that can exist. the abstractions solve a lot of problems. not having the abstractions introduces many more problems. this is an objective constraint, and has nothing to do with the proclivities of the agent.
English
1
0
4
69
michael
michael@_michaelginn·
@ChaseBrowe32432 This sounds like a great hypothesis (languages with more abstraction are easier for LLMs) that I would love to see empirical validation for!
English
1
0
0
61
Chase Brower
Chase Brower@ChaseBrowe32432·
@_michaelginn C provides you all of these abstractions, too. the thing the models struggled on most was reading the dynamic length input into an "array". this is like 2 lines of code in C, but exceptionally difficult in brainfuck. try it yourself (with no reference) and see how long it takes
English
1
0
4
69