Austin
6.6K posts

Austin
@bitcoinplebdev
dev focused on Bitcoin / Lightning / Nostr / AI @voltage_cloud @pleb_devs @frostr_org https://t.co/F2G1WGYT9r https://t.co/FFySHvYGBb



you’re still running an american model? taste this

I would like to caution everyone who haven’t tried missions by @droid DO not try it without being prepared to get mindblown. I spent about an hour planning, preparing the context docs, direction - and ran a single prompt to run an overnight mission. It ran for about 7.5 hours and nailed it - design, flow, and all 37 milestones/features! A few issues pending but in droid’s defence, those were issues because of lack of my clarity! And for the whole run, it just took about 20M (188M cached input) tokens or so - using GLM and Kimi models! Super insane!!

im frostoooooringgggg @bitcoinplebdev demo.frostr.org



Every company building on top of AI should be making their own benchmarks. This is the way if you want model progress to disproportionally benefit your company.


I finally understand Garry my usbc ports totally 🔥 rn

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

In a day the Codex app from being unusable to the #1 experience in "agentic engineering" rn. The UI is phenomenal, computer use is completely non-disruptive and powerful. The performance is incredible, just yesterday it was unusable OpenAI knows how to build good products







