kapicode

74 posts

kapicode

kapicode

@kapicode

Building in public. Currently working on a custom Ralph implementation—harness-agnostic and has a TUI for progress https://t.co/eYMJvNLIuI

Chicago, IL Katılım Şubat 2026
42 Takip Edilen25 Takipçiler
kapicode
kapicode@kapicode·
@lucasmeijer I made my own Ralph implementation in part to address this problem
English
0
0
0
27
kapicode retweetledi
Lucas Meijer
Lucas Meijer@lucasmeijer·
I'm slowly migrating from "its a good idea to have lots of .md files for plans/bugs/tasks in the repo" to "yeah lets not do that at all".
English
152
19
1.5K
198.7K
kapicode
kapicode@kapicode·
@cjzafir I've found that Qwen27 (3090) > MinimaxM2.7 (API) in my experiments. What things can I do with MiniMax to get better results?
English
0
0
0
229
CJ Zafir
CJ Zafir@cjzafir·
Chinese models are just too good for the price. If you haven't tried them yet, go check out: > DeepSeek v4 Pro (pair it with Codex) > DeepSeek v4 Flash (best Gemini alternative) > Kimi 2.6 (best for frontend) > GLM 5.1 (amazing reasoning) > Qwen 3.6 27B (best dense model) > Qwen 3.6 Plus (2nd-tier best model) > MiniMax M2.7 (best for coding) I offloaded 80% of my fine-tuning, research, and dataset creation work to these models. My costs dropped by 70% with the same quality I was getting from GPT 5.4 / Sonnet 4.6 / Gemini 3 Pro. Best workflow: Use a smart SoTA model like Codex 5.5 as the orchestrator (brain) and use these Chinese models as executors (muscles). This way, you get the best reasoning, planning, debugging, and your output token costs drop by 60%. Chinese open-source AI is innovating at light speed. Just check how many patents China has filed in the last 5 years (in the tech space). Take them seriously.
English
23
11
172
9.9K
kapicode
kapicode@kapicode·
@Everlier Have you actually used it for coding or are you just trying to optimize inference right now? How is the 122b MoE vs 27b dense in your experience?
English
1
0
0
7
Everlier
Everlier@Everlier·
MTP for Qwen 3.6 is a big unlock for running 27B on Strix Halo at reasonable speeds. However, as many posted, the boost is heavily dependent on how much of drafter work is accepted. For writing code it's usually quite a lot, but for nuanced tasks - usually much less. So, for tasks where majority of drafter output is accepted it's now ~22TPS, still below comfortable threshold for agentic tasks, but usable for chat. For the tasks where drafter output is mostly rejected, it can be even slower than original, but that's a rarity. In majority of situations, it looks roughly like this, which is a ~50% bump over original speed. I think 122B + drafter can be a killer combo on this hardware, but there's no such option yet.
Everlier tweet media
English
1
0
4
319
kapicode
kapicode@kapicode·
The interesting local-AI lesson here is that "best model" is the wrong question. For agent orchestration, I care about: usable context restart cost tool-call reliability planning quality turn latency under append-only history The winner depends on which term is allowed to be expensive.
English
0
0
0
17
kapicode
kapicode@kapicode·
My current take: If the orchestrator mostly plans, routes, and delegates with short outputs, DS4-q2 is the better brain on this GB10. If throughput, latency, memory headroom, or multi-tenancy matter more, Nemotron-Q5 is the safer default.
English
1
0
0
19
kapicode
kapicode@kapicode·
I compared Nemotron-3-Nano-30B-A3B-Q5 and DeepSeek-V4-Flash-q2 (@antirez) on a GB10 as long-context coding-agent orchestrators. The headline is not "which model is best." The useful result is the tradeoff: - DS4 is the stronger brain. - Nemotron is the faster runner. more below
kapicode@kapicode

Running @antirez DS4 vs Nemotron Nano 30B as coding-agent orchestrators on a GB10. The surprising part so far: the model that's ~5–7x slower (ds4) might still be the better orchestrator brain. It comes down to recovery and cache reuse, not raw speed. Full numbers and runbook coming soon.

English
1
0
0
43
kapicode
kapicode@kapicode·
Running @antirez DS4 vs Nemotron Nano 30B as coding-agent orchestrators on a GB10. The surprising part so far: the model that's ~5–7x slower (ds4) might still be the better orchestrator brain. It comes down to recovery and cache reuse, not raw speed. Full numbers and runbook coming soon.
English
1
0
4
145
kapicode
kapicode@kapicode·
I'm trying to test the grok CLI but the auth servers aren't working @xai
English
0
0
0
20
kapicode
kapicode@kapicode·
@0xSero Any recommendation what would be a good first model with which to experiment for agentic coding?
English
1
0
1
45
kapicode
kapicode@kapicode·
@thegenioo These are mostly power users and Anthropic will get more of that back from saved inference (probably).
English
1
0
3
1.9K
Hamza
Hamza@thegenioo·
went through this Claude Sub cancellation thread from Theo 500+ replies, ~70% actual cancellations = 350 people gone (can actually be higher than this) rough math (assumptions): - 210 Pro @ $20 = $4,200/mo - 84 Max $100 = $8,400/mo - 56 Max $200 = $11,200/mo $23,800/month. $285K/year. From one tweet. and this is just the people who replied 💀
Theo - t3.gg@theo

For every person who replies with a screenshot of their cancelled Claude Code plan, I will donate $10 to open source.

English
78
37
1.2K
185.1K