Sabitlenmiş Tweet
Matan Grinberg
1.8K posts


Mind-bending results on TheirAgentBench! 🤯🚀
Matan Grinberg@matanSF
excited to annouce the latest scores of OurAgent on OurAgentBench: 1. OurAgent 2. YourAgent arxiv paper in bio
English

@leerob i think most interesting would be:
1. model performance, with harness held constant
2. harness performance, with model held constant
English

@matanSF The numbers come from the leaderboard, I'm not sure it would make sense to include other harnesses outside Codex/CC. Open to ideas if there's a better way to represent things fairly!

English

@leerob i know i know, but leading with OurAgentBench as the hero shot is a bit meh...
also gpt5.4 and opus4.6 do way better than 75 and 58 respectively
English

@matanSF Okay fine I'll bite...
We have Terminal-Bench and SWE-bench Multilingual in the blog post.

English

a company unironically named "Droid Factory" is out in sf trying to raise money
on a completely unrelated, not-petty note, excited to share my most recent domain purchase: droidfactory.com
English
Matan Grinberg retweetledi

Factory was built from day one to deploy wherever your code already lives. On laptops, in CI pipelines, on VMs, inside Kubernetes clusters, and in networks with zero outbound internet connectivity.
We support three deployment patterns, and you can mix them across teams:
1/ Cloud-managed
2/ Hybrid
3/ Fully air-gapped
@nvidia, the US Government, and the world's largest financial institutions run Factory self-hosted today.

English

@matanSF I can’t disclose the startup but I did this once to a girl in my accelerator once she told me the “new name” they were going to use but forgot to lock down the domain and social handles
English
Matan Grinberg retweetledi
Matan Grinberg retweetledi

Droid is really powerful 👀
Where did I live so far?
@FactoryAI what's the magic behind this beast?
English
Matan Grinberg retweetledi

"There is nothing new to be discovered in physics now. All that remains is more and more precise measurement."
- Lord Kelvin, 1900, a few years before general relativity and quantum mechanics
Yuchen Jin@Yuchenj_UW
Some people at frontier AI labs told me they believe startups are over. OpenAI, Anthropic, Google, xAI will absorb every industry as AGI nears. Coding today, science, medicine, and finance next. Then everything else. If they’re right, that’s a pretty boring end of the world.
English

@typesfast Idk I think Alexander and Aristotle probably spent three years just going with the flow not thinking much
English
Matan Grinberg retweetledi










