Premnath D

474 posts

Premnath D

Premnath D

@MrVoicer

Designer and Digital Engineer

Katılım Şubat 2020
738 Takip Edilen74 Takipçiler
Ahmad Awais
Ahmad Awais@MrAhmadAwais·
@MrVoicer @CommandCodeAI I believe hermes is like open claw not a coding agent. We are a purpose built coding agent and every model we have is what our harness is optimized for. Here’s one example. Finally the $1 Go plan.
Ahmad Awais@MrAhmadAwais

how did we make deepseek outperform opus 4.7? i've been thinking about why "open model bad at tool calling" is almost always a harness problem, not a model problem. context: spent the two days looking at billions of tokens in @CommandCodeAI (tb open source ai cli) using deepseek. I ended up writing a tool-input repair layer. the trigger was watching deepseek-flash fail on the simplest /review run, every shellCommand and readFile call bouncing back with a raw zod issues blob, the model unable to recover because the error wasn't in a form it could read. by the end deepseek v4 pro was beating opus 4.7 6/10 times on our internal evals. a few things i learned that feel general: 1/ the failure modes aren't random they're a small finite compositional set. across deepseek-flash, deepseek v4 pro, glm, qwen, the same four mistakes repeat almost exactly: - sending `null` for an optional field instead of omitting it - emitting `["a","b"]` as a json *string* instead of an actual array - wrapping a single arg in `{}` where the schema expected an array (an "empty placeholder") - passing a bare string where an array was expected (`"foo"` instead of `["foo"]`) four repairs, ~30-100 lines each, ordered carefully (json-array-parse must run before bare-string-wrap or `'["a","b"]'` becomes `['["a","b"]']`). that is the whole catalogue. when i hear "this open source model can't do tool calls" i now assume one of those four, and so far that's been right ~90% of the time. 2/ the funniest failure mode is also the most revealing. deepseek-flash, when asked to edit or write a file, sometimes emits the path as a *markdown auto-link*: filePath: "/Users/x/proj/[notes.md](http://notes. md)" our writeFile tool obediently trued creating files literally named `[notes.md](http://notes .md)` until we caught it. this is not a hallucination. it's the post-training chat distribution leaking through the tool boundary the model has been rewarded for auto-linking in conversational output, and is applying that prior in a context where it makes no sense. the fix is two regex lines that unwrap only the degenerate case where link text equals url-without-protocol real markdown like `[click](https://x .com)` passes through untouched. this is also conditioning of their own tools during RL which were different from all other tools we write and ofc can't predict. "tool confusion" is a more useful frame than "capability gap." the model knows how to format a path. it just hasn't been told clearly enough that this path is going to fopen, not into a chat bubble. so we encode that hint at the schema level `pathString()` instead of `z.string()` and the leak is plugged for every path field at once. 3/ the design choice that mattered was inverting preprocess-then-validate to validate-then-repair. my first attempt was the obvious one: a preprocessing pass that normalized inputs (strip nulls, parse stringified arrays, etc.) before zod ever saw them. it broke immediately, writeFile content that *happened* to be json-shaped got rewritten before it hit disk. silent corruption, easy to miss in a smoke test. then i made it less greedy - parse the input as-is. if it succeeds, ship it. valid inputs are never touched. - on failure, walk the validator's own issue list. for each issue path, try the four repairs in order until one applies. - parse again. on success, log `tool_input_repaired:${toolName}`. on failure, log `tool_input_invalid:${toolName}` and return a model-readable retry message. the structural insight here is: when you preprocess, you encode a prior about what's broken. when you let the validator complain first, the schema is the prior, and you only spend repair budget at the exact paths the schema actually disagreed at. the validator is doing the work of localizing the bug for you. it's the same shape as cheap-then-careful everywhere else try the fast path, fall back on evidence. (this also gives you per-tool telemetry for free. you can watch repair rates per (model, tool) and notice when a model regresses on a specific contract before users do.) 4/ shape invariants and relational invariants need different fixes. the four repairs above all handle shape problems wrong type, missing key, wrong container. but read_file had a *relational* invariant: "if you provide offset, you must also provide limit, and vice versa." deepseek kept calling `readFile({ absolutePath, limit: 30 })` and getting an `ERROR:` back. you can't fix this with input repair, because each field is independently valid the bug is in the relationship between them. so i taught the function the model's intent instead. `limit` alone → `offset = 0`. `offset` alone → `limit = 2000` (matches common read tool ops default). then surfaced the decision back to the model in the result: "Note: limit was not provided; defaulted to 2000 lines. To read more or fewer lines, retry with both offset and limit." no `Error:` prefix, so the tui doesn't paint it red. the model sees what we picked and can self-correct on the next turn if our guess was wrong. transparency over silent magic wins big. repair where you can. extend semantics where you can't. surface the choice either way. zoom out: a lot of what looks like model capability is actually contract design. a strict schema is a choice with a cost it filters out noise, but it also filters out recoverable noise from any model that hasn't memorized the exact json contract you happened to pick. the largest commercial models eat that cost invisibly and are linient on tool calling because they've seen enough of every contract during pretraining; open models pay it loudly and get dismissed for it. the harness is where you mediate between distributions. four small repairs (i'm sure more to follow as we have three more merging today), two regex lines for auto-links, one relational default, one prefix change. the model didn't change. the contract got more forgiving in exactly the places it needed to be. deepseek v4 pro now beats opus 4.7 6/10 times on our internal evals. imo "skill issue" applies to the harness more often than the model.

English
1
0
1
166
Ahmad Awais
Ahmad Awais@MrAhmadAwais·
GUIs can never, and I mean never compete with the raw power and agency of a CLI in your terminal. It’s the biggest reason why we’ve started with a CLI @CommandCodeAI. After building what 70+ open source CLIs in over a decade. I don’t say this lightly. GUIs that exist today are restricted by the parameters of foggy notions of an old system of engineering that’s very much getting disrupted by AI. When given an option to build anything, use anyway, vs here’s what some random person on the internet thought what all your engineering work’s UI could look like, I’ll always prefer the option to do whatever I want, with no restrictions what so ever. Every single GUI app I have tried is missing so many features, so many things that I can’t do much about. Ah soo many clicks too. With the raw functionality of a CLI, I can do anything. Every time humans needed to reinvent something, they always went back to a basic input/output terminal. We don’t know what the future is of coding and engineering looks like. It’s definitely not an electron app with 60fps hard limit and 300MB min size of three columned UI. So we the engineers & builders are back to the basics. Input/Output terminals. No over engineering to get in the way of whatever the future will potentially look like. It’s not even a comparison. My opinion of course.
Theo - t3.gg@theo

Are you still using the CLI versions of your preferred agent instead of desktop apps like Codex App, Conductor, or T3 Code? Tell me why below. Genuinely curious.

English
6
0
26
4.6K
Premnath D
Premnath D@MrVoicer·
@theo This has been a pain point, so I switched to the Codex app.
English
0
0
0
85
Theo - t3.gg
Theo - t3.gg@theo·
Just learned it's literally impossible to paste images into Claude Code over SSH. How do you CLI people live like this??
English
403
16
2.3K
626.9K
Premnath D retweetledi
Tony Joseph
Tony Joseph@tjoseph0010·
The economic illiteracy on display here is off the charts, as is the desire to please the powers that be. Gold is not taxed at those rates because it will lead to humongous smuggling by powerful mafias as used to be the case decades ago. It was a lesson learnt at high cost to lives and lawfulness.
Zakka Jacob@Zakka_Jacob

Declare gold a sin good. Just like tobacco and alcohol. Impose 200% duty on import of gold. People will naturally stop buying gold. It will mean short term pain but reduced dollar outflow and better rupee and CAD support.

English
15
128
502
34.8K
Cloudways
Cloudways@Cloudways·
Hi Premnath, we're sorry to hear about your experience and understand how frustrating delays can be. Our billing team has addressed the concern on their end — please check your support ticket for their response. If you need further assistance, feel free to DM us, and we'll take it from there.
English
1
0
1
25
Premnath D
Premnath D@MrVoicer·
@Cloudways 48 hours. One migration ticket (#884365). Zero resolution. Chat support can't help. Migration team doesn't do chat. Every path leads to "add a note to the ticket." Happy to switch back to Pagely if this is what Cloudways support looks like. @digitalocean
English
4
0
0
57
Premnath D retweetledi
TheStandupPod
TheStandupPod@thestanduppod·
Divorcing Windows
English
71
197
2K
37.3K
Alexis Gallagher
Alexis Gallagher@alexisgallagher·
Yes. The thing I notice is this: If you ask GPT-5.5 to check Opus-4.7's work, it finds clear errors. And then if you point out the errors to Opus, Opus agrees. But if you ask Opus to check GPT-5.5's work, Opus says basically says "this is correct, but nitpicks some detail".
Dimitris Papailiopoulos@DimitrisPapail

I've started being noticeably more trusting of Codex with GPT 5.5 than Claude Code, which still remains my main driver for most projects. The latter needs comparatively more hand-holding, inspection and audits. Feeling constantly suspicious of your CLI agent is exhausting.

English
29
21
467
55.4K
Premnath D retweetledi
TheLiverDoc™
TheLiverDoc™@theliverdoc·
Name that Orthopedician. Supposed "doctors" like them should be dragged out into the light so that they dont do this sort of dangerously expensive nonsense ever again. If this is what Orthopedics department at AIIMS Delhi is upto these days, then it is better to consult a more academically oriented doctor at a private hospital. Right at the bottom of this herbal and dietary supplement it is clearly written - not for medicinal use. Zero evidence that this combination works for anything in Orthopedic practice, but ample evidence that Curcuma longa in it, in the high dose present, can cause liver injury. michiganmedicine.org/health-lab/15-… Very triggering to see academic institutions spiral into INTEGRATED unscientific treatments. Is prescribing herbal supplements your doctors new modus operandi @aiims_newdelhi ? Did your Orthopedician discuss with the patient regarding possibility of liver injury with this product before prescribing?
TheLiverDoc™ tweet media
Rational_Indian@RationalIndia16

@theliverdoc An orthopaedic at @aiims_newdelhi prescribed this for pain in bones.

English
49
105
721
98K
Kappaemme
Kappaemme@Kappaemme1926·
you have $100, which one do you choose, Claude or Codex?
Kappaemme tweet mediaKappaemme tweet media
English
163
1
327
74.1K
Premnath D
Premnath D@MrVoicer·
@BacLeodiv No. Codex is so much better. Performance and usage. Claude says that Codex exhibits superior execution discipline when assigned to do the same tasks. Still Claude hits limits sooner than Codex.
English
0
0
0
249
Bac Leo
Bac Leo@BacLeodiv·
So Claude Code just doubled its limits after users started moving to Codex. Are you switching back?
English
300
7
416
45.1K
Brian Gardner
Brian Gardner@bgardner·
🚨 Wow. 🚨 “Today, Matt made the decision to remove real-time collaboration from WordPress 7.0 and shared that he is not confident the current approach is robust enough to include in Core at this time, citing concerns around surface area, race conditions, server load, memory efficiency, and recurring bugs found through fuzz testing.” make.wordpress.org/core/2026/05/0…
English
23
22
92
14.5K
Ahmad Awais
Ahmad Awais@MrAhmadAwais·
Giving away @CommandCodeAI Max subscription to someone at random who follows me and Command. RT. That’s more than 5 billion tokens of DeepSeek v4 pro. In 24hrs. LFG!! Read the eng deep dives below, good for all not just us.
Ahmad Awais@MrAhmadAwais

interesting milestone: @CommandCodeAI on pace for ~1,000 new subs/day today. the broader thought imo is that devs are figuring out that running open models inside Claude Code was leaving a lot on the table. seeing more and more posts of DeepSeek/Kimi beating Opus/GPT once you swap them into Command Code instead. the harness matters, often as much as the model itself (a point that i think is still pretty underrated). my eng notes on our harness engineering below.

English
137
173
250
18.9K
Premnath D
Premnath D@MrVoicer·
@Cloudways Update the ticket 10 hours ago. Got a reply 30 minutes ago. The reply recommended .htaccess on a Lightning (Nginx) stack where .htaccess doesn't work. How is a time-sensitive migration supposed to happen at this pace? @digitalocean
Premnath D tweet mediaPremnath D tweet media
English
1
0
0
53
Premnath D
Premnath D@MrVoicer·
@Cloudways Checked your support plans. $100/month, not prorated, 24–48hr activation. So to fix a 48hr migration delay, I'd pay $100 for 1 day of access. The math doesn't work. Just fix the ticket.
English
0
0
0
25
Nat Miletic
Nat Miletic@natmiletic·
If your WordPress site feels slow and painful to maintain, fix these before anything else: 1. Switch to a good caching plugin like WPRocket 2. Compress every image 3. Convert images to WebP or AVIF 4. Remove unused tracking scripts (like Facebook Pixel or others) Most "slow site" problems are just unoptimized images and external scripts.
English
28
5
67
11.4K
Ben Word
Ben Word@retlehs·
@natmiletic I gotta plug a new caching plugin on the block that’s worth looking at (it does require your server having Redis though, and is less of an all-in-one solution like WP Rocket) roots.io/millicache-red…
Ben Word tweet media
English
1
1
4
270
Premnath D retweetledi
Ravi
Ravi@tamilravi·
Hey @spotifyindia , could you please just give the lyrics for Tamil songs in Tamil script? It is a pain to read lyrics in Roman letters. Thanks.
English
3
8
85
5.8K