OpenBlock
460 posts

OpenBlock
@openblocklabs
OB-1 is a frontier, self-improving coding agent. Now available for general access!
San Francisco, CA Katılım Kasım 2022
0 Takip Edilen6.9K Takipçiler
OpenBlock retweetledi

Benchmarks, Accountability, and What Matters
Our Terminal Bench submission did not meet the standard we set for ourselves. We made real improvements to our agent harness for the benchmark, but we also resorted to methods that compromised our results. The methodology was wrong, and we take full accountability.
Benchmarks have become a huge focus for our industry, driving launch posts and informing which agents get adopted. At the same time, benchmaxxing is rampant. Several of the highest-ranked submissions on Terminal Bench today actively inject task-specific guidance, cherry-pick trials, and refuse to publish trajectories. We anticipate many more will be removed soon, but the deeper issue is systemic. We got caught up in the race, and that was a mistake.
This is a turning point for us, and maybe for others, to focus on what matters outside of benchmarks: building a product people love. We’ve built an incredible, high-caliber team that has spent the last six months heads-down building a frontier agent, and the work speaks for itself:
- Cloud sandboxes that run your code in isolation
- Auto-generated skills and hooks based on your past sessions
- Fine-tuned subagent models purpose-built for subtasks
- Session sharing so your team can pick up where you left off
- Hands-off mode with built-in safety controls
- Support for 300+ models
- PM Mode for planning specs
- and much more
We’re committed to doing better going forward, which means focusing on transparency and verifiability. It’s been an important week to reflect, but it’s time to get back to building.
— Daljeet & Tejpal
English

Today’s coding agent teams still employ hundreds of human engineers, which we find telling.
We’ve kept our team small, consisting entirely of IOI/IMO medalists, to make one bet: OB-1 will build OB-1 faster than any human team.
We’re just getting started. Be sure to follow @openblocklabs for future updates.
English

Here’s where OB-1 is going:
– Auto-generates evals from past PRs, then climbs them with custom models
– Builds its own skills, hooks, and rules from a codebase and session history
– Background agents in safe sandboxes that keep working while you context-switch
– Session sharing and forking: redefining version control around prompts, instead of source code
– Lives where you already work: Slack, Linear, GitHub, Graphite
– PM mode so it never runs out of ideas
English

2/ OB-1 is a self-improving coding agent currently in beta. It placed #1 on Terminal Bench in September.
We’re letting people off the waitlist each day - join here: openblocklabs.com/waitlist
English
OpenBlock retweetledi

Coding agents 💚 Modal Sandboxes
OpenBlock@openblocklabs
Your coding agent just got its own computer. ob1 --sandbox Powered by Modal.
English

3/ OB-1 is a self-improving coding agent currently in beta. It placed #1 on Terminal Bench in September.
We’re letting people off the waitlist each day. Join here: openblocklabs.com/waitlist
English

2/ Most coding agents run directly on your machine: eating memory, slowing your computer down, and even crashing your terminal.
--sandbox moves all of that off your laptop and into an isolated cloud environment on @modal
Your agent gets its own machine with your repo and local environment cloned instantly.
English
OpenBlock retweetledi
OpenBlock retweetledi

I’ll be at NeurIPS in San Diego this year!
Reach out if you want to talk about coding agents (+ our upcoming CLI launch @openblocklabs), domain-specific RL, open-source.
English
OpenBlock retweetledi

So much fun hosting the CMU builder night tonight; packed with demos, energy, and great people.
Gave a sneak peek of @openblocklabs' upcoming CLI agent, OB-1!
s/o @waynesutton @convex for the space :)


English