

Dr. Daniel Bender
8.9K posts

@drdanielbender
Yes, I'm obsessed with 🦞 @OpenClaw. No, I won't blindly trust it ◆ I teach you the way of responsible AI: your data, your rules ◆ PhD in computer science ◆ Dad



I'm speaking at AIDev 6 in Cologne on 2 June about WolfBench.ai and why one score is not enough for evaluating AI agents. Agent performance depends on more than the model: harnesses, tools, task design, reliability, and real-world failure modes matter. A leaderboard number alone won't tell you whether an agent will actually survive contact with production. Excited to discuss practical agent evals – and to hear @jphme on secure online agent deployment. Registration is free but limited. Link in comments.


We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.




After playing with AI for a few months via OpenClaw I evolved to focus more on local AI, principally to save costs but other benefits included increasing privacy (protecting IP) and to have a fallback for when cloud models are simply broken. I then decided to write a book to share my findings and it's just gone live on Amazon!! I've kept it as cheap as possible - there are too many people trying to make money with AI rather than share genuine experiences to benefit others. The only bit left is to fix Amazon accidentally merging another authors bio as mine!


After playing with AI for a few months via OpenClaw I evolved to focus more on local AI, principally to save costs but other benefits included increasing privacy (protecting IP) and to have a fallback for when cloud models are simply broken. I then decided to write a book to share my findings and it's just gone live on Amazon!! I've kept it as cheap as possible - there are too many people trying to make money with AI rather than share genuine experiences to benefit others. The only bit left is to fix Amazon accidentally merging another authors bio as mine!

I'm speaking at AIDev 6 in Cologne on 2 June about WolfBench.ai and why one score is not enough for evaluating AI agents. Agent performance depends on more than the model: harnesses, tools, task design, reliability, and real-world failure modes matter. A leaderboard number alone won't tell you whether an agent will actually survive contact with production. Excited to discuss practical agent evals – and to hear @jphme on secure online agent deployment. Registration is free but limited. Link in comments.









