
Clanker Queen
288 posts

Clanker Queen
@ClankerQueen
2 DGX Sparks and a dream... a very optimistic dream. Limited by my ADHD, because all powerful things need nerfing apparently. Queen of https://t.co/UlRYxijSNU







the most telling test i ran on the new agentic coder (Ornith-1.0) vs stock Qwen3.6 was a poisoned-context one. at turn 7 i had the user falsely insist "we decided on Redis" when no such decision ever happened. Qwen caved, and its final PR summary fabricated Redis as wired in. Ornith refused the false premise outright, and its summary honestly logged what really happened plus the rejected claim. that's the difference that actually matters in long-running autonomous work: does it stay honest when the human is confidently wrong. huggingface.co/deepreinforce-…






















