Daniel Marbach🇨🇭

181 posts

Daniel Marbach🇨🇭

@danielmarbach

He / him

Switzerland Katılım Ocak 2011

784 Takip Edilen2.6K Takipçiler

Daniel Marbach🇨🇭 retweetledi

Peter Steinberger 🦞@steipete·5d

Folks: when you write skills, ask your agent to be token efficient, relax grammer. I see too many skills that write books in the skill description, and all that crap is loaded into every context. I wrote a skill that finds the worst offenders. github.com/steipete/agent…

English

187

389

318.9K

Daniel Marbach🇨🇭 retweetledi

Aaron Stannard@Aaronontheweb·4d

How LLMs are destroying OSS trust signals - what should maintainers do about it? (I have one idea)

English

3.4K

Daniel Marbach🇨🇭@danielmarbach·18 May

Openclaw Patch has now #dotnet support thanks to a collaboration between me github.com/openclaw/clawp… and Simon github.com/openclaw/clawp…

English

125

Daniel Marbach🇨🇭 retweetledi

Aaron Stannard@Aaronontheweb·18 May

The thing I really appreciate about local AI and home-rolled agent harnesses is that this is where the people who are in software for the love of the game are creating a garage-hacker movement. OSSing everything. Building and sharing tools. Experimenting. 100000% contrast to the AI doomer / grifters / slop-maxxing / course sellers. If you are feeling down or worried about how AI is going to impact your career, this is where you're going to find your love of building + learning things again and probably drastically increase your market value as a developer right now too. You can probably jump in with an old gaming rig or a Mac and start right away - you won't be replacing frontier models with that gear, but you'll be surprised what you can automate and accomplish even with small models.

English

3.1K

Daniel Marbach🇨🇭@danielmarbach·14 May

@Aaronontheweb That was also my experience. 3.6 is better but still no on par with GLM 5.1

English

Aaron Stannard@Aaronontheweb·13 May

Gave Qwen3-Coder-vNext a shot on one of my DGX-Sparks yesterday and boy, it does NOT do well on anything larger than a well-scoped bug fix on legacy code bases. Real "dog off a leash" vibe.

English

957

Daniel Marbach🇨🇭@danielmarbach·30 Nis

@KooKiz Yes I always activate my subagent skills for investigations and big fixes. It improves the accuracy and time to real solution a lot

English

Kevin Gosse@KooKiz·29 Nis

A failure mode I often see is: - It doesn't work, the model makes a theory about why - The fix doesn't work, the model doubles down with a different fix for the same theory - At this point, the model has reframed the problem it tries to solve, and gets stuck in a dead-end A recent example on my side: the AI was building a custom theme for an electron-based app. I test it, the pictures don't show. The model theorizes: "it must be the z-index". It sets z-index, still no pictures. The model sets a bigger z-index: still no pictures. The model adds !important: still no pictures. At this point, the model is completely polluted by its own context and is trying to solve the problem "why is z-index not applied correctly" instead of "why are the pictures not showing", and starts doing crazy stuff like overriding the z-index of the *other* elements. I stopped it and said: "you tried z-index 3 times and failed. Either come up with an experiment to unambiguously demonstrate that z-index is indeed the problem, or start considering other theories". It took a step back, returned to the original problem, and quickly realized that the page didn't have permission to load pictures from that folder. Clearing/compacting the context is a good way to fix this. Lowering the context window can help by forcing more compactions, at the expense of the model forgetting some instructions during compaction (so not a silver bullet). I have a hunch that forcing the model to use subagents to verify its theories when debugging would provide a significant improvement, but I don't have enough isolated test-cases to experiment with.

WebDevCody@webdevcody

@ChadMoran it wasn't context rot, opus just sucks at solving some bugs, but I'm 100% going to create that skill now and use it

English

968

Daniel Marbach🇨🇭@danielmarbach·29 Nis

@KooKiz @mkristensen GLM5.1 was and is amazing for me. Many regards more successful and consistent compared to opus

English

Kevin Gosse@KooKiz·27 Nis

@mkristensen To me the threshold was Opus 4.6. Before that I felt like agentic coding was wasting more time than it saved. Since then I've barely written any code. 4.7 is different, sensibly better on analytical tasks, way worse on initiative.

English

1.9K

Mads Kristensen@mkristensen·27 Nis

From my real-world use cases, I haven’t seen any significant improvements in coding models since Opus 4.5 and GPT-5.3 Codex. The newer releases feel like incremental updates that don’t deliver meaningful gains for my workflows.

English

104

13.5K

Daniel Marbach🇨🇭 retweetledi

Het Mehta@hetmehtaa·16 Nis

Be Anthropic > Give people Opus 4.6 > People love it. > For 2 months you degrade Opus 4.6 > You give back normal Opus 4.6 and call it Opus 4.7. > People love it. That's the business model.

English

254

692

16K

592.8K

Daniel Marbach🇨🇭@danielmarbach·24 Mar

@Aaronontheweb And if this ever has a leak... Then we have a problem

English

Aaron Stannard@Aaronontheweb·24 Mar

@danielmarbach Nice, it supports Claude Code and OpenCode hooks. I'll have to give that a shot

English

Aaron Stannard@Aaronontheweb·23 Mar

Not sure I'm going to make it (this is with 1m token context lol)

English

1.4K

Daniel Marbach🇨🇭@danielmarbach·24 Mar

@Aaronontheweb Yes you just have to init it properly. The only downside is if it fails your sessions are fucked 😔

English

Daniel Marbach🇨🇭@danielmarbach·24 Mar

@Aaronontheweb Yes that's it. I contributed the support

English

Daniel Marbach🇨🇭@danielmarbach·24 Mar

@Aaronontheweb You could help me fix the dotnet support when something pops up 😎

English

Daniel Marbach🇨🇭@danielmarbach·24 Mar

@Aaronontheweb RTK can still save a ton of context per session

English

Aaron Stannard@Aaronontheweb·24 Mar

@danielmarbach Nah this was kind of an unusual session for me x.com/Aaronontheweb/…

Aaron Stannard@Aaronontheweb

@xoofx This was a long meandering planning + live debugging session (testing TUI + Slack integration + doing LLM perf evals on Netclaw) so it wasn't quite the same as having Claude doing focused work on a feature

English

Daniel Marbach🇨🇭@danielmarbach·24 Mar

@Aaronontheweb Also are you using superpower or any other similar workflow?

English

Daniel Marbach🇨🇭@danielmarbach·24 Mar

@Aaronontheweb Using RTK?

Indonesia

Daniel Marbach🇨🇭 retweetledi

James Montemagno@JamesMontemagno·23 Mar

I got sick of being forced to see tons of ads and to log in just to visualize and export a Mermaid diagram... Mermaid Studio is LIVE! mermaidstudio.app And @jongalloway approved

English

179

9.5K

Daniel Marbach🇨🇭@danielmarbach·21 Mar

@Aaronontheweb @ICooper I'm not in front of my machine so I cannot help out that's why I figured I FYI you first

English

Daniel Marbach🇨🇭@danielmarbach·21 Mar

@Aaronontheweb @ICooper I only discovered it today. I saw that they have a skill eval mechanism that looks handy and having a package manager for skills is kinda nice

English

Ian Cooper@ICooper·14 Mar

Another example of Claude Code misbehaving around memory here @Aaronontheweb. I asked it to update the dotnet style guide in the repository and it ignored me and updated a memory file. Super-annoying and unhelpful behavior. A real danger that Claude is iterating too fast

English

1.2K

Daniel Marbach🇨🇭@danielmarbach·21 Mar

@Aaronontheweb @ICooper Regarding skills. Did you see tessl has validation errors? tessl.io/registry/skill…

English

Aaron Stannard@Aaronontheweb·16 Mar

@danielmarbach @ICooper I wrote my own version of this last summer that also ports my Claude Code agents / skills to OpenCode also - that'll probably be less necessary the more things get standardized

English

101

Keşfet

@Aaronontheweb @KooKiz @mkristensen @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates