

Build Micro Apps
67 posts

@buildmicroapps
I experiment with SaaS, micro-apps, and new frameworks so you can look smart without doing any work.






We’re announcing: VibeBench, a new benchmark for what actually matters — how models feel when used on real work by experienced software engineers. But, we need your help. Here’s how it works: 1. An initial cohort of 1000 qualified software engineers (join: vibebench.standardagents.ai) 2. Groups of 250 evaluate new models for 2 days on real work. 3. Participants subjectively rank the model relative to other models they have experience with. 4. On day 4 a report is released with objective results derived from the subjective tests. How can you help: 1. We all need this benchmark to exist, but for it to become reality, we need an initial cohort of 1000 qualified software engineers. If that’s you, please join! vibebench.standardagents.ai 2. Repost this! We need to reach as many qualified engineers as we can find. 3. Share this initiative with everyone on your engineering teams. Together we can make this benchmark a reality for all of us.




been building custom harnesses on top of Project Think for the past 2 days - an agent that monitors production failures and rolls back Flagship features in real time. @CloudflareDev agents team nailed this one! blog.cloudflare.com/project-think/


An AI agent can write your code in minutes. But someone still has to review, merge, deploy, and monitor it. What if the agent could do that too? Feature flags are the missing piece. They let an agent ship code behind a flag, test it on real traffic, ramp the rollout, and kill it instantly if things break. No human in the loop until you choose to be. Today we're shipping Flagship to make this possible - feature flags native to @Cloudflare's network, OpenFeature standard. Move fast, break nothing. blog.cloudflare.com/flagship

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

Kumo is now my go-to for building dashboards! The team absolutely cooked with this one 👨🍳 github.com/cloudflare/kumo

Use workers bindings so that you don't have ENVs

Building @usehullo at the @opencode buildathon. Sign up on the waitlist (hullo.email) and help me impress @thdxr (i think he's pissed with me)




DON’T LET CLAUDE READ YOUR ENV FILE DON’T LET CLAUDE READ YOUR ENV FILE DON’T LET CLAUDE READ YOUR ENV FILE DON’T LET CLAUDE READ YOUR ENV FILE DON’T LET CLAUDE READ YOUR ENV FILE






Just found @usehullo and it’s actually insane. It crawls the web to find a reason to email you (like a recent Series B or a new hire) and cites its sources. Cold email that doesn't feel like a bot. Get on the list: hullo.email



