Dr. Daniel Bender

8.9K posts

Dr. Daniel Bender banner
Dr. Daniel Bender

Dr. Daniel Bender

@drdanielbender

Yes, I'm obsessed with 🦞 @OpenClaw. No, I won't blindly trust it ◆ I teach you the way of responsible AI: your data, your rules ◆ PhD in computer science ◆ Dad

Germany 🇩🇪 Entrou em Ocak 2022
1.3K Seguindo5K Seguidores
Tweet fixado
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Ever wondered how model quantization (FP16, Q8, Q4) *really* affects performance? There's an analogy that makes the trade-offs crystal clear... and it involves something you might drink. 😉🍺 Kudos to @jtdavies for this brilliant comparison. 🙏 See the image for the full explanation! 👇
Dr. Daniel Bender tweet media
English
1
2
18
2.8K
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
I am using Minimax M2.5 in OpenClaw for quite some time and I really like what the model can do in this setup. It super cool to see that @MiniMax_AI is putting out a benchmark which shares how they compare to the state-of-the art closed models when used in OpenClaw. The gap is to the top models is there, but M2.7 is catching up. I am loving the honesty. ♥️
Skyler Miao@SkylerMiao7

Yes, we performed extensive optimizations and even established a dedicated benchmark for it.

English
0
0
6
265
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Totally unexpected after the recent release of M2.5, now we have M2.7, improved across all benchmarks. 🤯 Especially interesting: The new Minimax Claw Bench shows M2.7 catching up fast to frontier models when used in OpenClaw. More info about M2.7 in the official article. 👇
Dr. Daniel Bender tweet media
MiniMax (official)@MiniMax_AI

x.com/i/article/2034…

English
0
0
3
333
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Wolfram Ravenwolf’s Four-Metric Framework · based on Terminal-Bench 2.0 One score is not enough. Because performance is a distribution, not a point. wolfbench.ai
English
0
0
2
108
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Interesting project trying to see how well the defense against prompt-injection in OpenClaw works by exposing a mail account openly. Currently, with a bounty of $1000: hackmyclaw.com
English
0
0
2
91
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Cool! If someone is as me struggling to make sense of the "Graph" API: > The Microsoft Graph API is a unified, RESTful web developer platform that provides access to data and intelligence across Microsoft's cloud services. Originally released in 2015, it acts as a single gateway for applications to interact with resources from Microsoft 365, Windows, and Enterprise Mobility + Security. In summary, so much more to do for you @KarthiDreamr 😅
English
1
0
1
30
WolfBench
WolfBench@WolfBenchAI·
Introducing WolfBench: @WolframRvnwlf's new evaluation framework for models and agents, brought to you by @wandb Single score metrics don't adequately describe model performance and capabilities. Here's how the new WolfBench framework solves that problem: wandb.ai/wandb_fc/wolfb…
English
3
4
18
4.4K
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Looking for the best / most cost-efficient model for your agentic tasks, e.g. in @openclaw ? Don't look any further, the new @WolfBenchAI got you covered! 👇
WolfBench@WolfBenchAI

Introducing WolfBench: @WolframRvnwlf's new evaluation framework for models and agents, brought to you by @wandb Single score metrics don't adequately describe model performance and capabilities. Here's how the new WolfBench framework solves that problem: wandb.ai/wandb_fc/wolfb…

English
0
0
0
185
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
If your OpenClaw is not that chatty today, it might be due to a downtime of the Codex CLI (if you configured OpenClaw to use your OpenAI subscription). To check the current status, take a look at the URL below. 👇 Sidenote: You should always have a back-up model provider to fall back to in these situations, which is @MiniMax_AI with their powerful M2.5 model for me. 💪
Dr. Daniel Bender tweet media
English
1
0
0
249
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Crazy that the hyped GPT 5.4 does in Terminal-Bench 2.0 not get ahead of the Chinese open-weight models. 👇 Thanks for running and sharing the benchmark results, @WolframRvnwlf. 🙏
Wolfram Ravenwolf@WolframRvnwlf

I've been evaluating @openclaw with the popular Terminal-Bench 2.0 framework - interesting initial insights: Sonnet outperforms Opus, aligning with other benchmarks that identify it as superior for agentic tasks. And, surprisingly, GPT 5.4 isn't any better than Chinese models. 👀 I'm continuing the evaluations and will share further insights shortly. Keep an eye on wolfbench.ai. 🐺

English
0
0
3
211
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
@WolframRvnwlf Wait what? Why does this decision increase the demand for Claude Code by a factor of more than 100x? 🤔
English
0
0
1
33
Wolfram Ravenwolf
Wolfram Ravenwolf@WolframRvnwlf·
The joint Anthropic-Pentagon ad campaign is a runaway success: Just two weeks ago, no version of Claude Code had ever surpassed 100,000 downloads - now we're seeing nearly 10 million downloads weekly!
Wolfram Ravenwolf tweet media
English
3
0
6
1.1K
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
@moritzkremb @openclaw @kilocode Thats cool, but I can not trust the scores. As much as I live Minimax, their M2.1 is way behind Kimi K2.5 when I use these models in OpenClaw. The results state that M2.1 is ahead. 🤔
English
0
0
2
161
Moritz Kremb
Moritz Kremb@moritzkremb·
There's finally a proper benchmark for @openclaw model performance. I just found that @kilocode built an open source benchmark that tests models across 23 real world openclaw tasks like scheduling meetings, writing code, triaging email etc gpt-5.3-codex is sitting at number one. tbh that matches my experience. gemini 3 flash in second place. didn't expect that. curious to see where gpt-5.4 will land on this.
Moritz Kremb tweet media
English
102
46
591
76K
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
@benbeingbin Good take. I did not see this new aspect for monorepos before. AI needs the context which a monorepo provides. 💪
English
0
0
0
3
Benjamin Gregory
Benjamin Gregory@benbeingbin·
Building in a monorepo isn't about abstract philosophies on design patterns for 'how we should work.' It's about velocity in an era where products change fast and context matters. AI is all about context. This monorepo is our company. kasava.dev/blog/everythin…
English
1
0
2
39
Ed Nico
Ed Nico@ednico_·
Been a while since I wanted to track the amount I spend on subscriptions. With the help of AI, it was a lot more straightforward than I envisaged so wanted to share the app I created. Check it out at substrak.io
Ed Nico tweet media
English
6
0
15
1.1K
Dr. Daniel Bender retweetou
Wolfram Ravenwolf
Wolfram Ravenwolf@WolframRvnwlf·
Today's 🦞 @openclaw tip is about a very useful but little known slash command - and a common problem that's easy to overlook: /context list This command shows you what's taking up your context and how large your injected workspace files (AGENTS, SOUL, etc.) are. Here's what many don't realize: by default, each file can have up to 20,000 characters injected, but older versions capped the total for all files combined at 24,000 - newer versions have now increased that limit to 150,000. Although truncation is less of a problem with this higher limit, it's still smart to check your maximum values and confirm that all workspace files are fully injected (injected chars == raw chars). If not, or if you're approaching any limit, adjust agents.defaults.bootstrapMaxChars (per file) or agents.defaults.bootstrapTotalMaxChars (combined). Remember, this counts characters, not tokens - so 150,000 characters equal roughly 50K tokens.
Wolfram Ravenwolf tweet media
English
5
12
176
11.4K
Dr. Daniel Bender
Dr. Daniel Bender@drdanielbender·
Created a vulnerability scanner for my used open-source software with @MiniMax_AI M2.5 in single-shot. List your software (name + version) in a JSON file, and it checks against OSV .dev for known CVEs. Finally, I get an alarm when one of my programs has a known vulnerability!
Dr. Daniel Bender tweet media
English
1
0
3
206