Tony Davis
839 posts

Tony Davis
@TonyD993
Drop in arxiv markdown formatter: https://t.co/J57PiQVOeC


🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵



I'll call my shot more specifically: Q1 2027 an open source model running on 32gb of memory will do 100 tok/s and match Opus 4.6 on most benchmarks




wise words from the best systems engineer I've worked with: "two things that make code actually maintainable: 1. reduce the layers a reader has to trace 2. reduce the state a reader has to hold in their head" applies to every codebase. always.




We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n











