Dr. Daniel Bender

8.9K posts

Dr. Daniel Bender

@drdanielbender

Yes, I'm obsessed with 🦞 @OpenClaw. No, I won't blindly trust it ◆ I teach you the way of responsible AI: your data, your rules ◆ PhD in computer science ◆ Dad

Germany 🇩🇪 เข้าร่วม Ocak 2022

1.3K กำลังติดตาม5K ผู้ติดตาม

ทวีตที่ปักหมุด

Dr. Daniel Bender@drdanielbender·11 Haz

Ever wondered how model quantization (FP16, Q8, Q4) *really* affects performance? There's an analogy that makes the trade-offs crystal clear... and it involves something you might drink. 😉🍺 Kudos to @jtdavies for this brilliant comparison. 🙏 See the image for the full explanation! 👇

English

2.8K

Dr. Daniel Bender@drdanielbender·12h

I am using Minimax M2.5 in OpenClaw for quite some time and I really like what the model can do in this setup. It super cool to see that @MiniMax_AI is putting out a benchmark which shares how they compare to the state-of-the art closed models when used in OpenClaw. The gap is to the top models is there, but M2.7 is catching up. I am loving the honesty. ♥️

Skyler Miao@SkylerMiao7

Yes, we performed extensive optimizations and even established a dedicated benchmark for it.

English

252

Dr. Daniel Bender@drdanielbender·4d

Totally unexpected after the recent release of M2.5, now we have M2.7, improved across all benchmarks. 🤯 Especially interesting: The new Minimax Claw Bench shows M2.7 catching up fast to frontier models when used in OpenClaw. More info about M2.7 in the official article. 👇

MiniMax (official)@MiniMax_AI

x.com/i/article/2034…

English

333

Dr. Daniel Bender@drdanielbender·10 Mar

Wolfram Ravenwolf’s Four-Metric Framework · based on Terminal-Bench 2.0 One score is not enough. Because performance is a distribution, not a point. wolfbench.ai

English

108

Dr. Daniel Bender@drdanielbender·10 Mar

x.com/i/spaces/1dGYl…

ZXX

271

Dr. Daniel Bender@drdanielbender·10 Mar

Interesting project trying to see how well the defense against prompt-injection in OpenClaw works by exposing a mail account openly. Currently, with a bounty of $1000: hackmyclaw.com

English

Dr. Daniel Bender@drdanielbender·10 Mar

Looking forward to talking to my AI buddies and everyone else who is interested in these topics. 👇 x.com/i/spaces/1dGYl…

English

258

Dr. Daniel Bender@drdanielbender·10 Mar

Cool! If someone is as me struggling to make sense of the "Graph" API: > The Microsoft Graph API is a unified, RESTful web developer platform that provides access to data and intelligence across Microsoft's cloud services. Originally released in 2015, it acts as a single gateway for applications to interact with resources from Microsoft 365, Windows, and Enterprise Mobility + Security. In summary, so much more to do for you @KarthiDreamr 😅

English

KarthiDreamr@KarthiDreamr·10 Mar

Created with @MiniMax_AI 2.5 on OpenCode

KarthiDreamr@KarthiDreamr

Just created a skill for @openclaw to access Microsoft To Do app using its Graph API. Well documented, so you can paste this to openclaw & just follow its instructions to setup easily. clawhub.ai/KarthiDreamr/m…

English

Dr. Daniel Bender@drdanielbender·10 Mar

@WolfBenchAI @WolframRvnwlf @wandb Crazy to see that the in comparison small and open-weight Minimax M2.5 outperforms GPT-5.4 and is close to Opus 4.6 in @OpenClaw. 😯

English

WolfBench@WolfBenchAI·10 Mar

Introducing WolfBench: @WolframRvnwlf's new evaluation framework for models and agents, brought to you by @wandb Single score metrics don't adequately describe model performance and capabilities. Here's how the new WolfBench framework solves that problem: wandb.ai/wandb_fc/wolfb…

English

4.4K

Dr. Daniel Bender@drdanielbender·10 Mar

Looking for the best / most cost-efficient model for your agentic tasks, e.g. in @openclaw ? Don't look any further, the new @WolfBenchAI got you covered! 👇

WolfBench@WolfBenchAI

English

185

Dr. Daniel Bender@drdanielbender·9 Mar

status.openai.com

ZXX

Dr. Daniel Bender@drdanielbender·9 Mar

If your OpenClaw is not that chatty today, it might be due to a downtime of the Codex CLI (if you configured OpenClaw to use your OpenAI subscription). To check the current status, take a look at the URL below. 👇 Sidenote: You should always have a back-up model provider to fall back to in these situations, which is @MiniMax_AI with their powerful M2.5 model for me. 💪

English

249

Dr. Daniel Bender@drdanielbender·8 Mar

Crazy that the hyped GPT 5.4 does in Terminal-Bench 2.0 not get ahead of the Chinese open-weight models. 👇 Thanks for running and sharing the benchmark results, @WolframRvnwlf. 🙏

Wolfram Ravenwolf@WolframRvnwlf

I've been evaluating @openclaw with the popular Terminal-Bench 2.0 framework - interesting initial insights: Sonnet outperforms Opus, aligning with other benchmarks that identify it as superior for agentic tasks. And, surprisingly, GPT 5.4 isn't any better than Chinese models. 👀 I'm continuing the evaluations and will share further insights shortly. Keep an eye on wolfbench.ai. 🐺

English

211

Dr. Daniel Bender@drdanielbender·8 Mar

@WolframRvnwlf Wait what? Why does this decision increase the demand for Claude Code by a factor of more than 100x? 🤔

English

Wolfram Ravenwolf@WolframRvnwlf·7 Mar

The joint Anthropic-Pentagon ad campaign is a runaway success: Just two weeks ago, no version of Claude Code had ever surpassed 100,000 downloads - now we're seeing nearly 10 million downloads weekly!

English

1.1K

Dr. Daniel Bender@drdanielbender·8 Mar

@moritzkremb @openclaw @kilocode Thats cool, but I can not trust the scores. As much as I live Minimax, their M2.1 is way behind Kimi K2.5 when I use these models in OpenClaw. The results state that M2.1 is ahead. 🤔

English

159

Moritz Kremb@moritzkremb·7 Mar

There's finally a proper benchmark for @openclaw model performance. I just found that @kilocode built an open source benchmark that tests models across 23 real world openclaw tasks like scheduling meetings, writing code, triaging email etc gpt-5.3-codex is sitting at number one. tbh that matches my experience. gemini 3 flash in second place. didn't expect that. curious to see where gpt-5.4 will land on this.

English

102

591

76K

Dr. Daniel Bender@drdanielbender·3 Mar

@benbeingbin Good take. I did not see this new aspect for monorepos before. AI needs the context which a monorepo provides. 💪

English

Benjamin Gregory@benbeingbin·30 Ara

Building in a monorepo isn't about abstract philosophies on design patterns for 'how we should work.' It's about velocity in an era where products change fast and context matters. AI is all about context. This monorepo is our company. kasava.dev/blog/everythin…

English

Dr. Daniel Bender@drdanielbender·27 Şub

Everything you imagine is only a few promots away in these days! Have a look what @ednico_ build: A subscription tracker to monitor your spending and show you valid alternatives. 👇

Ed Nico@ednico_

Been a while since I wanted to track the amount I spend on subscriptions. With the help of AI, it was a lot more straightforward than I envisaged so wanted to share the app I created. Check it out at substrak.io

English

260

Dr. Daniel Bender@drdanielbender·27 Şub

@ednico_ Good idea and well executed, Ed!

English

Ed Nico@ednico_·26 Şub

English

1.1K

Dr. Daniel Bender@drdanielbender·27 Şub

"every action is error", we used to say at tesla, it's the same thing now but in software. - @karpathy Interesting thought. 🤔 Work on removing yourself from the equation and givung the AI agent what he needs to solve a problem on its own.

Andrej Karpathy@karpathy

"Last year" very possible you're holding it wrong. UI: should be a lot more tractable with /chrome etc. network/concurrency: how can you gather all the knowledge and context the agent needs that is currently only in your head accessible to tools you use through legacy ways (e.g. web UIs)? how can you make the things you care about testable? observable? legible? the goal is to arrange the thing so that you can put agents into longer loops and remove yourself as the bottleneck. "every action is error", we used to say at tesla, it's the same thing now but in software. Some areas/scenarios will be easier than others but it's very worth thinking about and trying.

English

137

Dr. Daniel Bender รีทวีตแล้ว

Wolfram Ravenwolf@WolframRvnwlf·26 Şub

Today's 🦞 @openclaw tip is about a very useful but little known slash command - and a common problem that's easy to overlook: /context list This command shows you what's taking up your context and how large your injected workspace files (AGENTS, SOUL, etc.) are. Here's what many don't realize: by default, each file can have up to 20,000 characters injected, but older versions capped the total for all files combined at 24,000 - newer versions have now increased that limit to 150,000. Although truncation is less of a problem with this higher limit, it's still smart to check your maximum values and confirm that all workspace files are fully injected (injected chars == raw chars). If not, or if you're approaching any limit, adjust agents.defaults.bootstrapMaxChars (per file) or agents.defaults.bootstrapTotalMaxChars (combined). Remember, this counts characters, not tokens - so 150,000 characters equal roughly 50K tokens.

English

177

11.4K

Dr. Daniel Bender@drdanielbender·26 Şub

Created a vulnerability scanner for my used open-source software with @MiniMax_AI M2.5 in single-shot. List your software (name + version) in a JSON file, and it checks against OSV .dev for known CVEs. Finally, I get an alarm when one of my programs has a known vulnerability!

English

206

ค้นพบ

@MiniMax_AI @KarthiDreamr @WolfBenchAI @WolframRvnwlf @wandb @openclaw @moritzkremb @kilocode