Carson Kraycik retweetledi

Today, we’re sharing a new state of the art for computer use.
Our system holds the two highest verified scores on OSWorld, the standard benchmark for AI agents that operate a computer like a person: 83.6% using Claude Opus 4.7 and 81.5% using Claude Sonnet 4.6. The human baseline is 72.4%.
🧵 1/7

English



















