Lossfunk
538 posts

Lossfunk
@lossfunk
We ask foundational questions to explore what's next in AI

@inceptmyth @paraschopra @karpathy @fchollet @GaryMarcus @ylecun @AndrewYNg @demishassabis @drfeifei @goodfellow_ian 9/ We're releasing everything: 🌐 Website: esolang-bench.vercel.app 📄 Paper: arxiv.org/abs/2603.09678 🤗 Dataset: huggingface.co/datasets/Lossf… 💻 Code: github.com/Lossfunk/Esola…


🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵



5/ We threw everything at it to try to close the gap. Few-shot examples. Self-reflection. ReAct pipelines. Coder-critic pairs. Average improvement from few-shot: +0.8 percentage points. Statistically insignificant. ICL works by activating knowledge that already exists from pretraining. When that knowledge isn't there to begin with, a few examples in the context window can't substitute for it.








