g023

4.4K posts

g023

@g023dev

developer/programmer/ai nerd

Canada Katılım Ekim 2023

2.1K Takip Edilen466 Takipçiler

Sabitlenmiş Tweet

g023@g023dev·25 Nis

So I optimized the model, i optimized the harness, now I'm optimizing the endpoint by making an openai api to deepseek endpoint proxy that has some context compression features automatically integrated to attempt to save $$$ (works well with copilot): gist.github.com/g023/c2bb7b540…

English

g023@g023dev·2h

@kv_iyer @Italianclownz Basically you set it as the endpoint to connect to as the openai api endpoint, and it handles translating from the deepseek style to the openai api style, which matters when it comes to reasoning and tool calling.

English

KV@kv_iyer·5h

@g023dev @Italianclownz can you ELI5 pls 😭

English

Carlo@Italianclownz·3d

I tried Deepseek V4. And in less than 5 minutes I already burned .05 using opencode and vibe coding.

English

752

79.7K

g023@g023dev·6h

@Michaelzsguo I made a proxy that i use for copilot for translating from openai to ds4 that might work for codex: gist.github.com/g023/c2bb7b540…

English

Michael Guo@Michaelzsguo·1d

I needed to pursue /goal inside Codex, but I burned through my Plus membership tokens. Luckily, I have a capable and very cheap DeepSeek V4 Pro setup that I can connect to Codex. Pointing Claude Code at DeepSeek’s Anthropic-compatible endpoint is easy. I have done it for a while, and even used it to fix a DeepSeek issue inside Codex: export ANTHROPIC_BASE_URL=api.deepseek.com/anthropic export ANTHROPIC_AUTH_TOKEN=$DEEPSEEK_API_KEY export ANTHROPIC_MODEL=deepseek-v4-pro Codex is trickier. DeepSeek exposes OpenAI-style Chat Completions, but starting earlier this year, Codex expects the newer Responses API shape. So the practical setup is: Codex /v1/responses -> VibeAround proxy -> DeepSeek /v1/chat/completions My /goal has now been running inside Codex with DeepSeek V4 Pro for nearly 30 minutes without any issue.

English

9.3K

g023@g023dev·9h

Literally the only one cheering on the copilot changes are microsoft employees lol

English

g023@g023dev·9h

@bridgemindai Its almost like they're all working together...

English

BridgeMind@bridgemindai·13h

Google is finally cutting the bleed. Gemini CLI free tier users no longer get access to Gemini 3.1 Pro. You now need a paid plan. Google gave away frontier model access for months while Anthropic and OpenAI charged $200/month. That era is over. Every AI company learns the same lesson. Free access to your best model is not a growth strategy. It's a burn rate. Claude Code charges $200/month. Codex charges $200/month. Google was the last holdout giving away the farm. Now everyone pays.

English

357

27K

g023 retweetledi

Ahmad Awais@MrAhmadAwais·1d

how did we make deepseek outperform opus 4.7? i've been thinking about why "open model bad at tool calling" is almost always a harness problem, not a model problem. context: spent the two days looking at billions of tokens in @CommandCodeAI (tb open source ai cli) using deepseek. I ended up writing a tool-input repair layer. the trigger was watching deepseek-flash fail on the simplest /review run, every shellCommand and readFile call bouncing back with a raw zod issues blob, the model unable to recover because the error wasn't in a form it could read. by the end deepseek v4 pro was beating opus 4.7 6/10 times on our internal evals. a few things i learned that feel general: 1/ the failure modes aren't random they're a small finite compositional set. across deepseek-flash, deepseek v4 pro, glm, qwen, the same four mistakes repeat almost exactly: - sending `null` for an optional field instead of omitting it - emitting `["a","b"]` as a json *string* instead of an actual array - wrapping a single arg in `{}` where the schema expected an array (an "empty placeholder") - passing a bare string where an array was expected (`"foo"` instead of `["foo"]`) four repairs, ~30-100 lines each, ordered carefully (json-array-parse must run before bare-string-wrap or `'["a","b"]'` becomes `['["a","b"]']`). that is the whole catalogue. when i hear "this open source model can't do tool calls" i now assume one of those four, and so far that's been right ~90% of the time. 2/ the funniest failure mode is also the most revealing. deepseek-flash, when asked to edit or write a file, sometimes emits the path as a *markdown auto-link*: filePath: "/Users/x/proj/[notes.md](http://notes. md)" our writeFile tool obediently trued creating files literally named `[notes.md](http://notes .md)` until we caught it. this is not a hallucination. it's the post-training chat distribution leaking through the tool boundary the model has been rewarded for auto-linking in conversational output, and is applying that prior in a context where it makes no sense. the fix is two regex lines that unwrap only the degenerate case where link text equals url-without-protocol real markdown like `[click](https://x .com)` passes through untouched. this is also conditioning of their own tools during RL which were different from all other tools we write and ofc can't predict. "tool confusion" is a more useful frame than "capability gap." the model knows how to format a path. it just hasn't been told clearly enough that this path is going to fopen, not into a chat bubble. so we encode that hint at the schema level `pathString()` instead of `z.string()` and the leak is plugged for every path field at once. 3/ the design choice that mattered was inverting preprocess-then-validate to validate-then-repair. my first attempt was the obvious one: a preprocessing pass that normalized inputs (strip nulls, parse stringified arrays, etc.) before zod ever saw them. it broke immediately, writeFile content that *happened* to be json-shaped got rewritten before it hit disk. silent corruption, easy to miss in a smoke test. then i made it less greedy - parse the input as-is. if it succeeds, ship it. valid inputs are never touched. - on failure, walk the validator's own issue list. for each issue path, try the four repairs in order until one applies. - parse again. on success, log `tool_input_repaired:${toolName}`. on failure, log `tool_input_invalid:${toolName}` and return a model-readable retry message. the structural insight here is: when you preprocess, you encode a prior about what's broken. when you let the validator complain first, the schema is the prior, and you only spend repair budget at the exact paths the schema actually disagreed at. the validator is doing the work of localizing the bug for you. it's the same shape as cheap-then-careful everywhere else try the fast path, fall back on evidence. (this also gives you per-tool telemetry for free. you can watch repair rates per (model, tool) and notice when a model regresses on a specific contract before users do.) 4/ shape invariants and relational invariants need different fixes. the four repairs above all handle shape problems wrong type, missing key, wrong container. but read_file had a *relational* invariant: "if you provide offset, you must also provide limit, and vice versa." deepseek kept calling `readFile({ absolutePath, limit: 30 })` and getting an `ERROR:` back. you can't fix this with input repair, because each field is independently valid the bug is in the relationship between them. so i taught the function the model's intent instead. `limit` alone → `offset = 0`. `offset` alone → `limit = 2000` (matches common read tool ops default). then surfaced the decision back to the model in the result: "Note: limit was not provided; defaulted to 2000 lines. To read more or fewer lines, retry with both offset and limit." no `Error:` prefix, so the tui doesn't paint it red. the model sees what we picked and can self-correct on the next turn if our guess was wrong. transparency over silent magic wins big. repair where you can. extend semantics where you can't. surface the choice either way. zoom out: a lot of what looks like model capability is actually contract design. a strict schema is a choice with a cost it filters out noise, but it also filters out recoverable noise from any model that hasn't memorized the exact json contract you happened to pick. the largest commercial models eat that cost invisibly and are linient on tool calling because they've seen enough of every contract during pretraining; open models pay it loudly and get dismissed for it. the harness is where you mediate between distributions. four small repairs (i'm sure more to follow as we have three more merging today), two regex lines for auto-links, one relational default, one prefix change. the model didn't change. the contract got more forgiving in exactly the places it needed to be. deepseek v4 pro now beats opus 4.7 6/10 times on our internal evals. imo "skill issue" applies to the harness more often than the model.

Ahmad Awais@MrAhmadAwais

Wow I just made DeepSeek V4 Pro beat Opus 4.7 6/10 times in our internal evals by auto repairing many of its quirks in tool calling. It’s performing super solid for such a cheap model.

English

869

113.7K

g023@g023dev·1d

@Italianclownz I'm curious to see what Grok Build is

English

Carlo@Italianclownz·1d

Codex beats Claude and Deepseek v4 for a main coding model. Nothing has come close to what codex can do . Everything else makes mistakes. I hope Google releases a strong coding model for Gemini.

English

g023@g023dev·1d

@Its_Nova1012 DeepSeek v4 seems to be carrying its weight easily

English

175

NOVA@Its_Nova1012·1d

Which open source AI model are you using? - Qwen 3.5 - DeepSeek v4 - Llama 4 Maverick - Kimi K2.6 And what's main the reason behind it?

English

147

13K

g023@g023dev·1d

@mark_k @xai Grok Build is missing the Opus exodus party

English

Mark Kretschmann@mark_k·1d

Upcoming releases from @xai 🔥🔥 - Grok Build (imminent) like Codex and Claude Code - Grok Computer (soon) computer-use agent - Grok Imagine Pro 1080p - Grok Imagine 2.0 - Grok 4.4 with 1T parameters - Grok 4.5 with 1.5T parameters - Grok 5 with 10T

English

158

121

1.7K

56.5K

g023 retweetledi

mr-r0b0t@mr_r0b0t·1d

We are now 100% complete with spending nearly all $100 in discounted deepseek v4 pro credits! This last batch of traces got scary close, but it doubled the original so that's no surprise. Full stats:

English

9.2K

g023@g023dev·1d

@DOdeniyi9389 @amritwt If you need Opus to code, you aren't a serious coder

English

Daniel Odeniyi@DOdeniyi9389·1d

@g023dev @amritwt Lmao. You don't code anything serious that why

English

amrit@amritwt·2d

“omg its so cheap” yeah because it’s dumb asf

Séb Krier@sebkrier

DeepSeek V4’s capability lags behind leading U.S. models by about 8 months. nist.gov/news-events/ne…

English

17.9K

g023 retweetledi

Tom Yeh@ProfTomYeh·2d

Softmax vs Sigmoid ✍️ Interact 👉 byhand.ai/Khlg9b = Softmax = Softmax is how deep networks turn raw scores into a probability distribution — the final layer of every classifier, and the core of every attention head in a transformer. To see what it does, picture five boba tea shops on the same block, all competing for your dollar. Five candidates: a, b, c, d, e — different chains, different brewing styles, different pearls. A boba reviewer hands you a 𝘤𝘩𝘦𝘸𝘪𝘯𝘦𝘴𝘴 𝘴𝘤𝘰𝘳𝘦 for each — higher means perfectly chewy "QQ" pearls with the right bite (ask a Taiwanese friend to find out what QQ means). Negative scores are real: mushy bobas, overcooked pearls, a batch left sitting too long. How do you turn five chewiness scores into an allocation that adds to a whole dollar? You could spend everything at the chewiest shop, but that ignores how good the runners-up are. Softmax is the smooth alternative. Read the diagram left to right. First, raise each score to e^{x} — this does two things: it turns negative chewiness into small positives, and it stretches the gaps between scores exponentially. Then sum all five into a single total Z. Finally, divide each e^{x} by Z to get a probability. The five probabilities add up to one, so you can read them as percentages of your dollar. The chewiest shop gets the biggest slice — but never the whole dollar. That's the point of softmax: it ranks confidently while still leaving room for the others. = Sigmoid = Sigmoid squashes any real number into a probability between 0 and 1 — the classic activation for binary classification, and still the gating function inside LSTMs and GRUs. Same boba block as the previous Softmax example, narrowed to just two contenders — a hot new shop `a` with chewiness score x, and your usual go-to `b` whose score is pinned at zero (the neutral baseline you've come to expect). Sigmoid is just softmax with two players, one of them pinned to zero. Read the diagram left to right. First, raise each score to e^{x} — for the usual shop `b` whose score is zero, this is just e^0 = 1 (the constant baseline). Then sum the two into a total Z. Finally, divide each e^{x} by Z to get a probability. The two probabilities add up to one — the new shop wins more of your dollar when its pearls get chewier, and your usual keeps the rest. That's the point of sigmoid: it turns a single chewiness score into a clean 0-to-1 chance you'll try the new place over your usual. --- AI Math, Algorithms, Architectures by hand ✍️ Subscribe to my 60K+ reader newsletter 👉 byhand.ai

English

169

1.2K

70.9K

g023@g023dev·2d

@MarcusSpillane @bindureddy 3.x wasn't cost efficient enough to compete with the monthly plans for deepseek, but v4.x from my own experiments, i'm finding that api price is definitely competitive with even the pricing of the old monthly plans.

English

Marcus@MarcusSpillane·2d

@g023dev @bindureddy good to know -- copilot integration has been the sleeper story of the last few months.

English

Bindu Reddy@bindureddy·2d

DeepSeek V4 flash is not getting the attention it deserves It's a VERY GOOD fast open-source model Perfect for many simple use-cases at scale - much faster than GPT 5.5 thinking or Opus 4.7

English

560

22.9K

g023@g023dev·2d

@gailcweiner Try Dseek v4 and then you'll see wow

English

Gail Weiner@gailcweiner·2d

Am I missing something? GPT 5.5 is good but it’s not great.

English

5.8K

g023@g023dev·2d

@krishnanrohit "this is a large".. "this is a massive"... hmm I'm starting to see a pattern in the responses that could maybe corrected with some clever prompting. Steering tokens?

English

248

rohit@krishnanrohit·2d

Opus just now: "This is a large UI restructure. It's 11:53 PM and Part B alone is ~250 lines of HTML/JS changes across multiple tightly-coupled files. I want to do this right, not rush. Can I do Part A + Part C tonight, which are clean and self-contained, and tackle Part B fresh in the next session?" Unbelievable.

English

513

89.2K

g023 retweetledi

JNS@_devJNS·2d

ZXX

223

3.6K

112.5K

g023@g023dev·2d

@mazeincoding I mean you can cut the umbilical cord now and start working with the competition or tether yourself to slavery until the ship goes down.

English

Maze@mazeincoding·2d

anthropic shouldn't be getting 1/100 of the hate they're getting

English

112

231

33.8K

g023@g023dev·2d

@relizarov forgot the whole toxic evil corporation wildcard that'll make sure everyone pays the piper

English

Roman Elizarov@relizarov·2d

AI won’t reduce the need for developers. It will reduce the cost of building software. When something gets cheaper, we make more of it. The world needs more software. The developer role is changing, not going away.

English

226

17K

g023@g023dev·2d

@Bhavani_00007 you forgot vs code insiders

English

Bhavani.py@Bhavani_00007·2d

I still don't understand what the purple one is used for 😭 can someone explain?

English

403

5.5K

3.3M

g023@g023dev·2d

@aditiitwt whichever is not crashed or has less tabs open

English

aditii@aditiitwt·2d

guyys, What’s your default browser ?

English

185

100

8.5K

g023@g023dev·2d

@thdxr Go for the business space instead of just the coding space.

English

233

dax@thdxr·2d

pretty much every competitor in our space has been very easy to deal with except openai, they're the only company that understands building things for a lot of people we basically have no shot at directly competing

English

108

4.3K

321.5K

Keşfet

@kv_iyer @Italianclownz @Michaelzsguo @bridgemindai @CommandCodeAI @Its_Nova1012 @mark_k @xai