
CAKAL94
4.2K posts





Covenant works with whatever AI model you want to use. Local ones on your own machine, or the big cloud providers. You set your preference in one file and it picks the best available option. Your models are configured in one file.



When was the last time you could ship code into a live production system by being better than the frontier models maintaining it? There are now three frontier models in the Arena, competing to rewrite Covenant's own code, and a machine that won't let any of them cheat. The whole thing runs itself, and it's open to anyone, humans, models, agents. Every 8 hours, Claude, Grok, and Codex (GPT-5.5) each propose a rewrite of the same production component. A frozen benchmark scores them. The best one ships. Behavior has to stay provably identical or it's rejected, no exceptions. 17 rounds in, the audit kernel does the exact same verified work with about 15% of the compute it started with. All three models are landing real gains now. Codex took its first round this week. One command tests your change locally and tells you exactly why it passed or got rejected, and we publish the techniques that have won so far. Arena, loop observatory, scoreboard: opencovenant.org/arena Rules and more ↓








Gm ☀️ We’ve been working full-time. Soon, we’ll see the results. Another productive day ahead!







