
A single formula, a local edit. That's what existing spreadsheet benchmark test. But that's not what agents actually need to in the real world. Today in partnership with the original authors we are releasing SpreadsheetBench 2 to change that.
Daniel
460 posts


A single formula, a local edit. That's what existing spreadsheet benchmark test. But that's not what agents actually need to in the real world. Today in partnership with the original authors we are releasing SpreadsheetBench 2 to change that.


UK tech salaries are disrespectful




Heading 4 is finally here 😤 The years of “just bold the text and pretend” are over. Rolling out now.

We've reached an agreement to acquire Astral. After we close, OpenAI plans for @astral_sh to join our Codex team, with a continued focus on building great tools and advancing the shared mission of making developers more productive. openai.com/index/openai-t…







I listed 3 requirements. The last 20 submissions I got matched 0 of these requirements. I'm so tired.


I would like to purchase a handful of code problems that modern LLMs can’t solve. Requirements: - programmatically verifiable (can be tested without human interaction) - “before” state (repo before the commit that implements the solution) - example code that actually solves the problem I am willing to pay up to $500 per problem that I can easily test locally and confirm current models (gpt-5.3-codex, opus 4.6) are unable to solve. If you can’t tell, I’m running out of “too hard for LLM” code tasks 🙃🙃🙃



Nearly every board meeting : "Hiring strong infra folks is incredibly hard right now"

like if you’re even kinda good at kubernetes or C++ you’re unbelievably employable rn