dex
4.7K posts

dex
@dexhorthy
building the post-IDE IDE at https://t.co/hDpglja33W - @aitinkerers sf lead, prev @replicatedhq @SproutSocial @nasa - ai that works pod @ https://t.co/69BhaNtWfd

"Ghost in the Shell 2: Innocence" had the best burning-amber UIs



🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵





Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.

Your coding agents inherit your credentials and your permissions. No identity system in the stack can tell the difference between you and the agent acting in your name. Today: Keycard for Coding Agents 🧵

@dexhorthy This is a great talk. Grounded in reality. I'm doing a lot of what you're suggesting naturally (when it matters) and expect my preferred harness (Cursor) to build in some of these features in the workflow (like their Plan mode for example). Thanks for sharing!



if you don't have a feedback loop to actually provide type errors it doesn't really benefit yeah the agent can run the type checker manually but it just doesn't it might be over stated though - practically speaking it's mostly just fixing things when it hallucinates an api that isn't there















