Dinys

608 posts

Dinys

@din__mon

Melbourne Beigetreten Mart 2018

128 Folgt85 Follower

Angehefteter Tweet

Dinys@din__mon·7 Eki

If you are looking to upskill @BandaWorks is a great way to horn your skills and have an experience about developing for a real-world application. Join me in developing a Household chores tracker together with a reward system. 😋 #100DaysOfCode

English

Dinys@din__mon·21h

@CommandCodeAI When will BYOK available? 🙏

English

Command Code@CommandCodeAI·22h

Big!! We just shipped 36K+ tool repairs in `command-code@0.28.0` This would improve 29 different models and 36K+ tool errors won't happen. cmd will also show the repair icon when it repairs a tool call. Or how many times it does that. Not all repairs will show the icon btw.

Ahmad Awais@MrAhmadAwais

how did we make deepseek outperform opus 4.7? i've been thinking about why "open model bad at tool calling" is almost always a harness problem, not a model problem. context: spent the two days looking at billions of tokens in @CommandCodeAI (tb open source ai cli) using deepseek. I ended up writing a tool-input repair layer. the trigger was watching deepseek-flash fail on the simplest /review run, every shellCommand and readFile call bouncing back with a raw zod issues blob, the model unable to recover because the error wasn't in a form it could read. by the end deepseek v4 pro was beating opus 4.7 6/10 times on our internal evals. a few things i learned that feel general: 1/ the failure modes aren't random they're a small finite compositional set. across deepseek-flash, deepseek v4 pro, glm, qwen, the same four mistakes repeat almost exactly: - sending `null` for an optional field instead of omitting it - emitting `["a","b"]` as a json *string* instead of an actual array - wrapping a single arg in `{}` where the schema expected an array (an "empty placeholder") - passing a bare string where an array was expected (`"foo"` instead of `["foo"]`) four repairs, ~30-100 lines each, ordered carefully (json-array-parse must run before bare-string-wrap or `'["a","b"]'` becomes `['["a","b"]']`). that is the whole catalogue. when i hear "this open source model can't do tool calls" i now assume one of those four, and so far that's been right ~90% of the time. 2/ the funniest failure mode is also the most revealing. deepseek-flash, when asked to edit or write a file, sometimes emits the path as a *markdown auto-link*: filePath: "/Users/x/proj/[notes.md](http://notes. md)" our writeFile tool obediently trued creating files literally named `[notes.md](http://notes .md)` until we caught it. this is not a hallucination. it's the post-training chat distribution leaking through the tool boundary the model has been rewarded for auto-linking in conversational output, and is applying that prior in a context where it makes no sense. the fix is two regex lines that unwrap only the degenerate case where link text equals url-without-protocol real markdown like `[click](https://x .com)` passes through untouched. this is also conditioning of their own tools during RL which were different from all other tools we write and ofc can't predict. "tool confusion" is a more useful frame than "capability gap." the model knows how to format a path. it just hasn't been told clearly enough that this path is going to fopen, not into a chat bubble. so we encode that hint at the schema level `pathString()` instead of `z.string()` and the leak is plugged for every path field at once. 3/ the design choice that mattered was inverting preprocess-then-validate to validate-then-repair. my first attempt was the obvious one: a preprocessing pass that normalized inputs (strip nulls, parse stringified arrays, etc.) before zod ever saw them. it broke immediately, writeFile content that *happened* to be json-shaped got rewritten before it hit disk. silent corruption, easy to miss in a smoke test. then i made it less greedy - parse the input as-is. if it succeeds, ship it. valid inputs are never touched. - on failure, walk the validator's own issue list. for each issue path, try the four repairs in order until one applies. - parse again. on success, log `tool_input_repaired:${toolName}`. on failure, log `tool_input_invalid:${toolName}` and return a model-readable retry message. the structural insight here is: when you preprocess, you encode a prior about what's broken. when you let the validator complain first, the schema is the prior, and you only spend repair budget at the exact paths the schema actually disagreed at. the validator is doing the work of localizing the bug for you. it's the same shape as cheap-then-careful everywhere else try the fast path, fall back on evidence. (this also gives you per-tool telemetry for free. you can watch repair rates per (model, tool) and notice when a model regresses on a specific contract before users do.) 4/ shape invariants and relational invariants need different fixes. the four repairs above all handle shape problems wrong type, missing key, wrong container. but read_file had a *relational* invariant: "if you provide offset, you must also provide limit, and vice versa." deepseek kept calling `readFile({ absolutePath, limit: 30 })` and getting an `ERROR:` back. you can't fix this with input repair, because each field is independently valid the bug is in the relationship between them. so i taught the function the model's intent instead. `limit` alone → `offset = 0`. `offset` alone → `limit = 2000` (matches common read tool ops default). then surfaced the decision back to the model in the result: "Note: limit was not provided; defaulted to 2000 lines. To read more or fewer lines, retry with both offset and limit." no `Error:` prefix, so the tui doesn't paint it red. the model sees what we picked and can self-correct on the next turn if our guess was wrong. transparency over silent magic wins big. repair where you can. extend semantics where you can't. surface the choice either way. zoom out: a lot of what looks like model capability is actually contract design. a strict schema is a choice with a cost it filters out noise, but it also filters out recoverable noise from any model that hasn't memorized the exact json contract you happened to pick. the largest commercial models eat that cost invisibly and are linient on tool calling because they've seen enough of every contract during pretraining; open models pay it loudly and get dismissed for it. the harness is where you mediate between distributions. four small repairs (i'm sure more to follow as we have three more merging today), two regex lines for auto-links, one relational default, one prefix change. the model didn't change. the contract got more forgiving in exactly the places it needed to be. deepseek v4 pro now beats opus 4.7 6/10 times on our internal evals. imo "skill issue" applies to the harness more often than the model.

English

5.5K

Dinys@din__mon·1d

@aijoey What harness you use to achieve this level of concurrency?

English

Joey@aijoey·1d

ifykyk. concurrency on dgx spark before all the new breakthroughs. i need to run mtp and nvfp4 versions.

Joey@aijoey

Local AI landing page generation on a DGX Spark. One Gemma-4-26B Q4 GGUF served by llama.cpp with 7 concurrent decode slots. The orchestrator breaks “landing page” into 6 section briefs: hero features steps testimonials pricing CTA Then 6 Gemma instances generate the sections in parallel and stitch everything into one Tailwind page. ~3 minutes end to end. The best part: everything you just watched happens offline, forever. No one can turn it off besides my light company lol @googlegemma @NVIDIAAIDev

English

1.6K

Dinys@din__mon·2d

@localm_tuts What's your orchestrator model? Planning model or top tier OSS?

English

Nilay@localm_tuts·5d

I am without OPUS since 29 days I am without GPT 5.5 since 29 days I am without GPT 5.4 since a week And I survived - so can you! API (NVIDIA++) + Local 👏

DeepSeek@deepseek_ai

We are making our discount permanent! 🎉 Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life! 🚀

English

Dinys@din__mon·2d

@CommandCodeAI Do you have a HUD to display context or other information below the prompt bar?

English

Command Code@CommandCodeAI·20 May

A dollar for $40 of DeepSeek V4 Pro usage? Hard to say no to that.

English

1.3K

1.5M

Dinys@din__mon·3d

@BadBrainCode @mr_r0b0t In what way it is better?

English

BrainOS@BadBrainCode·3d

x.com/i/article/2058…

ZXX

2.6K

Dinys@din__mon·3d

@mr_r0b0t @NousResearch What's the model you use for planning?

English

mr-r0b0t@mr_r0b0t·3d

@din__mon @NousResearch I spend hours planning

English

mr-r0b0t@mr_r0b0t·3d

These sorts of interactions keep me coming back to @NousResearch Hermes Agent

English

1.1K

Dinys@din__mon·4d

@aijoey @Teknium @NousResearch @0xSero @nvidia Where do you get the recipes? Just trying to understand better

English

136

Joey@aijoey·4d

got vllm studio running from my mac mini against the dgx spark today added hermes as a selectable agent runtime, wired it through an openai compatible bridge, fixed lan access, and imported my local model zoo into launchable recipes 21 models detected 12 vllm recipes 9 llama.cpp gguf recipes this is the kind of workflow i want locally mac mini as the control surface dgx spark as the inference box hermes as the operator layer vllm studio as the model dashboard no cs degree, just building the stack piece by piece with ai as the teacher

English

7.4K

Dinys@din__mon·4d

@mr_r0b0t @TheKryptoWiz At this point I don't really look at HF, just waiting for the models you are releasing. Lol

English

mr-r0b0t@mr_r0b0t·5d

@TheKryptoWiz Guaranteed any 3.7 releases Switching to docker containers as well for easier loading

English

mr-r0b0t@mr_r0b0t·5d

Come get your QWOPUS3.6-27B-V2 😍 huggingface.co/Jackrong/Qwopu…

English

832

Dinys@din__mon·5d

@mr_r0b0t @NVIDIAAI @GIGABYTEUSA @Acer Let me know if you get DeepSeekV4 Flash, I am looking forward to it

English

mr-r0b0t@mr_r0b0t·5d

@din__mon @NVIDIAAI @GIGABYTEUSA @Acer Currently, a docker container that verifiably runs MM2.7 NVFP4 optimally (engages Blackwell tensor cores) Bit tedious of a process, but worth it IMHO! DeepSeekV4 Flash next 🤓

English

mr-r0b0t@mr_r0b0t·6d

The 2x @NVIDIAAI GB10 (GB20?) cluster! Pictured hard at work optimizing MM2.7 NVFP4 for consumer Blackwell (RTX/GB10) 🤓 Top - @GIGABYTEUSA AI TOP ATOM 4TB Bottom - @Acer Veriton GN100 AI Mini Workstation 4TB 120mm fan for bottom plate is en route

English

4.3K

Dinys@din__mon·20 May

@mr_r0b0t @NVIDIAAI @Alibaba_Qwen Is it the text model?

English

189

mr-r0b0t@mr_r0b0t·20 May

Here is a very popular model that really benefits from the proper use of your @NVIDIAAI Blackwell GPU/GB10 using NVFP4 and the @Alibaba_Qwen 3.6-27B native MTP This was run on a single GB10 Fully benchmark results and methods included ⏬

English

3.5K

Dinys@din__mon·19 May

@mr_r0b0t @NVIDIAAI Thanks a lot!!!

English

mr-r0b0t@mr_r0b0t·19 May

@din__mon @NVIDIAAI The new algo would punish me for publishing them all to quickly 😅 Here's the other repos though! github.com/r0b0tlab/nemot… github.com/r0b0tlab/gemma… github.com/r0b0tlab/gemma… github.com/r0b0tlab/qwen3…

English

481

mr-r0b0t@mr_r0b0t·19 May

Here's the first of five NVFP4 optimized benchmark. It's a crowd favorite that saw HUGE benefit from the CUTLASS back end! If you're using @NVIDIAAI Blackwell GPUs or a GB10 (DGX Spark or equivalent) this is for you! TLDR: 57.49 tok/s single stream on fully native architecture!

mr-r0b0t@mr_r0b0t

Big news for DGX Spark users! NVFP4 optimized for your GB10 is fast. Until now, many of us have been using a working but suboptimal Marlin backend instead of CUTLASS. We benchmarked five models, all using CUTLASS backends. All ran stably, all were faster 😁

English

6.5K

Dinys@din__mon·19 May

@mr_r0b0t @NVIDIAAI Can't wait to try

English

440

mr-r0b0t@mr_r0b0t·19 May

Your GB10 DGX Spark is about to get a little faster! Looks like I've got a reliable config that fully utilizes all the Blackwell kernels on the GB10 when running NVFP4 😁 @NVIDIAAI DGX forums will be updated after a little more testing!

English

116

5.4K

Dinys@din__mon·18 May

@losterror501 @0xSero Is it on vLLM or llama.cpp? Is llama.cpp not slow for ds4 flash

English

error501@losterror501·18 May

@0xSero vlm-studio is awesome. forked it and added ds4 support for my dgx spark + deepseek. feels good not depending on someone else’s hosting for tokens.

English

3.9K

0xSero@0xSero·17 May

Such an interesting little guy

English

374

14.6K

Dinys@din__mon·27 Mar

Enjoying playing the uke, will recommend to anyone

English

Dinys@din__mon·3 Haz

@zealigan Missed it! 2am is hard for me

English

Eric 🇺🇦@wikkidoo·2 Haz

Today is a fine day for a #twitchstream! I will be building some fun stuff with TypeScript, WebSockets, and Next.JS! Might even get into building a game of pong! Starting around 12 EDT.

Eric 🇺🇦@wikkidoo

I'll be 🎈LIVE streaming on #twitch 📺 🔜 about 22 hours! Head on over to streams.ericadamski.dev to see what I plan on doing.

English

Dinys@din__mon·2 Haz

@zealigan Haha wat time? There's the time zone issue

English

Eric 🇺🇦@wikkidoo·2 Haz

@din__mon I am streaming many hours tomorrow and the day after! Let's chat!

English

Eric 🇺🇦@wikkidoo·31 May

SO CLOSE! Reaching for #twitchaffiliate this week!

English

Dinys@din__mon·2 Haz

@zealigan Yep I would. Long time haven't chat to you and have been disconnected from Twitter.

English

Eric 🇺🇦@wikkidoo·1 Haz

@din__mon Super hard! You gotta pop in sometime! Would be good to chat again!

English

Entdecken

@CommandCodeAI @aijoey @localm_tuts @BadBrainCode @mr_r0b0t @NousResearch @Teknium @0xSero