kapicode

74 posts

kapicode

@kapicode

Building in public. Currently working on a custom Ralph implementation—harness-agnostic and has a TUI for progress https://t.co/eYMJvNLIuI

Chicago, IL Katılım Şubat 2026

42 Takip Edilen25 Takipçiler

kapicode@kapicode·4h

@lucasmeijer I made my own Ralph implementation in part to address this problem

English

kapicode retweetledi

Lucas Meijer@lucasmeijer·1d

I'm slowly migrating from "its a good idea to have lots of .md files for plans/bugs/tasks in the repo" to "yeah lets not do that at all".

English

152

1.5K

198.7K

kapicode@kapicode·6h

@cjzafir I've found that Qwen27 (3090) > MinimaxM2.7 (API) in my experiments. What things can I do with MiniMax to get better results?

English

229

CJ Zafir@cjzafir·8h

Chinese models are just too good for the price. If you haven't tried them yet, go check out: > DeepSeek v4 Pro (pair it with Codex) > DeepSeek v4 Flash (best Gemini alternative) > Kimi 2.6 (best for frontend) > GLM 5.1 (amazing reasoning) > Qwen 3.6 27B (best dense model) > Qwen 3.6 Plus (2nd-tier best model) > MiniMax M2.7 (best for coding) I offloaded 80% of my fine-tuning, research, and dataset creation work to these models. My costs dropped by 70% with the same quality I was getting from GPT 5.4 / Sonnet 4.6 / Gemini 3 Pro. Best workflow: Use a smart SoTA model like Codex 5.5 as the orchestrator (brain) and use these Chinese models as executors (muscles). This way, you get the best reasoning, planning, debugging, and your output token costs drop by 60%. Chinese open-source AI is innovating at light speed. Just check how many patents China has filed in the last 5 years (in the tech space). Take them seriously.

English

172

9.9K

kapicode@kapicode·6h

@Everlier Have you actually used it for coding or are you just trying to optimize inference right now? How is the 122b MoE vs 27b dense in your experience?

English

Everlier@Everlier·6h

@kapicode Yes, llama.cpp, I also posted about 122B later here: x.com/Everlier/statu…

Everlier@Everlier

Tested Qwen 3.5 122B + MTP on Strix Halo, made a sweep of kyuz0 pre-built optimised images. Overall, results are a mixed bag, it's a bit faster, but not enough to make it fluid in agentic workflows, but makes it very usable for chats. llama.cpp args: ``` llama-server --no-mmap -dio -ngl 99 -np 1 --kv-unified --spec-type draft-mtp --spec-draft-n-max 3 ```

English

Everlier@Everlier·1d

MTP for Qwen 3.6 is a big unlock for running 27B on Strix Halo at reasonable speeds. However, as many posted, the boost is heavily dependent on how much of drafter work is accepted. For writing code it's usually quite a lot, but for nuanced tasks - usually much less. So, for tasks where majority of drafter output is accepted it's now ~22TPS, still below comfortable threshold for agentic tasks, but usable for chat. For the tasks where drafter output is mostly rejected, it can be even slower than original, but that's a rarity. In majority of situations, it looks roughly like this, which is a ~50% bump over original speed. I think 122B + drafter can be a killer combo on this hardware, but there's no such option yet.

English

319

kapicode@kapicode·6h

@Everlier llama.cpp*

Español

kapicode@kapicode·6h

@Everlier You use llama.cp, corect?

English

kapicode@kapicode·1d

The interesting local-AI lesson here is that "best model" is the wrong question. For agent orchestration, I care about: usable context restart cost tool-call reliability planning quality turn latency under append-only history The winner depends on which term is allowed to be expensive.

English

kapicode@kapicode·1d

My current take: If the orchestrator mostly plans, routes, and delegates with short outputs, DS4-q2 is the better brain on this GB10. If throughput, latency, memory headroom, or multi-tenancy matter more, Nemotron-Q5 is the safer default.

English

kapicode@kapicode·1d

I compared Nemotron-3-Nano-30B-A3B-Q5 and DeepSeek-V4-Flash-q2 (@antirez) on a GB10 as long-context coding-agent orchestrators. The headline is not "which model is best." The useful result is the tradeoff: - DS4 is the stronger brain. - Nemotron is the faster runner. more below

kapicode@kapicode

Running @antirez DS4 vs Nemotron Nano 30B as coding-agent orchestrators on a GB10. The surprising part so far: the model that's ~5–7x slower (ds4) might still be the better orchestrator brain. It comes down to recovery and cache reuse, not raw speed. Full numbers and runbook coming soon.

English

kapicode@kapicode·1d

English

145

kapicode@kapicode·1d

I'm trying to test the grok CLI but the auth servers aren't working @xai

English

kapicode@kapicode·3d

@0xSero (On a spark)

English

kapicode@kapicode·3d

@0xSero Any recommendation what would be a good first model with which to experiment for agentic coding?

English

0xSero@0xSero·4d

Nvidia is aligned

NVIDIA AI@NVIDIAAI

@xeophon @arcee_ai Open > closed

English

372

26.3K

kapicode@kapicode·5d

@thegenioo These are mostly power users and Anthropic will get more of that back from saved inference (probably).

English

1.9K

Hamza@thegenioo·5d

went through this Claude Sub cancellation thread from Theo 500+ replies, ~70% actual cancellations = 350 people gone (can actually be higher than this) rough math (assumptions): - 210 Pro @ $20 = $4,200/mo - 84 Max $100 = $8,400/mo - 56 Max $200 = $11,200/mo $23,800/month. $285K/year. From one tweet. and this is just the people who replied 💀

Theo - t3.gg@theo

For every person who replies with a screenshot of their cancelled Claude Code plan, I will donate $10 to open source.

English

1.2K

185.1K

Keşfet

@lucasmeijer @cjzafir @Everlier @antirez @xai @0xSero @thegenioo @elonmusk