
Mitesh B Ashar
29.9K posts




🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵


if your skill depends on dynamic content, you can embed !`command` in your SKILL.md to inject shell output directly into the prompt Claude Code runs it when the skill is invoked and swaps the placeholder inline, the model only sees the result!




@iMBA @shantanugoel I was specifically talking from the perspective of context usage. Skills use only when they're invoked vs plugins are more actively using up context since they're auto loaded






if your skill depends on dynamic content, you can embed !`command` in your SKILL.md to inject shell output directly into the prompt Claude Code runs it when the skill is invoked and swaps the placeholder inline, the model only sees the result!



I am usually happy when my review agents give me DRY recommendations. My mind tells me each time: Yay! One less code drift scenario for the future!


Vibe coding is creating overconfident engineers. (a rant) We used to debate architecture. Tradeoffs. Patterns. We had opinions about systems, if not, we used to study them. Now we read the AI output, it looks reasonable, we ship it. Without even thinking of other options. We are losing the habit of even asking the question. System thinking is a muscle. And muscles atrophy. There is a difference between an engineer who uses AI and an engineer who has outsourced their thinking to it. Most of us cannot tell which one we have become!




