Weco AI
52 posts


We're excited to announce that @BingchenZhao, who built the predecessor of AutoResearch, has joined @WecoAI full-time! Bingchen is the first author of LLMSpeedrunner at Meta FAIR, which ran the automated research loop on @karpathy's NanoGPT, which later evolved into NanoChat and the speedrun community where AutoResearch operates today. Weco has been committed to ML research automation for 2.5 years, starting with AIDE. We're super pumped by how large an impact AIDE has had, topping @OpenAI's MLE-Bench and @METR_Evals' RE-Bench, and becoming a foundation for AI Scientist v2, AIRA-Dojo, and LLMSpeedrunner itself. And AutoResearch, with AIDE's simple greedy discard/keep loop reaching a mass audience, is really building consensus that the empirical research loop can and should be automated. We're excited to keep pushing this frontier, not just as a concept but seriously bringing it to the real world, and materially accelerating the knowledge generation of humanity.



In case you want to run AutoResearch this weekend: It costs ~$300 for 85 experiments using Claude Code (opus). A quick guide to autoresearch ~60 experiments for free: 1. Use the mac/local GPU fork:github.com/miolini/autore… 2. Use weco to get some free credits: `pipx install weco` → `weco setup claude-code` Or simply give this doc to your Claude Code agent: docs.weco.ai/quickstart - You’ll get $20 in free credits 3. Tell your coding agent to run weco optimization for val_bpb on train.py. 4. Tell your coding agent to use gemini-3-flash-preview, you should get about 60 free experiments. - For better performance, use gemini-3.1-pro-preview (~15 free experiments). 5. You can watch the progress on this nice dashboard: dashboard.weco.ai/share/v5X8WV5H…


ah yes, this is what post-agi feels like :) i didn't touch anything. brb sauna


Since started working on AIDE, we've built & iterated on hundreds of evals with agents hill-climbing on them daily. Many lessons learned, I wrote a blog post about it. TLDRs: 1. Eval is mostly about fighting noise, generalization gaps and cost 2. Vibes are an N=1 benchmark - not terrible to start, staying leads to over engineering 3. Eval should co-evolve with the solution weco.ai/blog/eval-is-n…





Thrilled to announce Weco has raised an $8M seed led by @GoldenVentures to build self-evolving software! Our technology has already been used by frontier labs like OpenAI, Meta, Google and Sakana AI. We’re making every codebase a living experiment that learns to beat itself:

