spion
29.2K posts

spion
@spion
Fullstack SWE. ex-Apple. Prefer insightful discussion to debate. Rust, TypeScript, Effect, SolidJS, localfirst, devops, keto, stats/science, audio/DSP

now a deleted tweet, probably nothing



🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵


All the AI talk, so I actually tried, but after 400 thousand tokens the result is pretty bad, I am writing this by hand. It will take days instead of 2 hours but at least it will work properly...



i can write 50k lines of code a day and it will absolutely not generate any tangible lasting value nor will these 16k, from Garry or anyone else (sorry, but its the truth)





