
michael
1.5K posts

michael
@_michaelginn
PhD student at @BoulderNLP @lecslab. LLMs for rare languages, automata, synthetic data



🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵



More proof LLMs aren't conscious and aren't generalizing any information, and therefore aren't going to become generally intelligent, but are in fact (still extremely useful) trained statistical responders.



















In general I've been sensing a new current deep learning maximalists recently, going from "our models can definitely reason" to "well our models can't reason, but neither can humans!"

Real people drip-feed info. "Hi, help with a return." Then they wait. LLMs dump everything in one shot: "My name is Daiki Johnson, ZIP 80273, order #W9245618, refund to Mastercard ending 4892..." ~2x more identifiers per turn than humans. Your agent never has to handle incomplete information.



🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵




Ok so let me get this straight. SHOCKING: frontier LLMs suck at writing in esoteric languages. Things like... brainfuck and whitespace? STOP THE PRESSES, STOP THE VCS, IT'S A BUBBLE Brainfuckbench is cute, but this is hardly an indictment of the frontier models' capabilities.



