Da7em (@Da7_Tech) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

Da7em@Da7_Tech·29 Mar

> been paying $12 for PlayStation Plus > console: PS5, "next-gen gaming" > tried 60fps, ray tracing disabled. tried ray tracing, locked at 30fps > Get a gaming laptop, same price as a PS5 Pro. > same games. same Monitor. > 144fps. ray tracing on. mods. free online. > quality: better than anything Sony ever promised > cancelled PS Plus > saving $12 forever > with a platform that never charged me to use my own internet > the PS5 didn't change. i did

English

249

345

10.3K

1.3M

Da7em@Da7_Tech·3h

@MaxForAI No, this problem is a month ago even with 5.2.

English

0

2

62

Max For AI@MaxForAI·4h

@Da7_Tech 因为GLM5.2是一个紧急发射的模型，这些问题会在后续解决

中文

1

0

2

778

Da7em@Da7_Tech·1d

GLM has a serious token leakage / caching-accounting issue on Z.ai I tested this across Claude Code, Hermes Agent, Zcode, and OpenCode, so it does not look like one harness behaving badly. This has been consistent since the start of my subscription, during normal off-peak usage. My read: repeated context is being billed as fresh input instead of cached input. That’s not max reasoning. That’s a server-side caching/accounting problem. The screenshots show the issue clearly: Zcode used around 270K tokens, but I was billed for nearly 5M tokens. Cached tokens are clearly not working. I contacted support, escalated this, and tried everything, but got no real response. This is not how you treat paying customers. Please fix this. @ZixuanLi_ @Zai_org

English

21

4

110

141.5K

Da7em@Da7_Tech·4h

Thank you for sharing this and for explaining the issue so clearly. I also want to add that in some cases, even outside peak hours, GLM usage was over 50x higher than other models on the same task. I have clear evidence of a task that normally uses under 2M tokens being charged as 88M tokens. It burned the entire 5-hour Pro window without completing, and the task had to continue across two separate windows. This is not normal usage. It strongly points to a serious server-side cache/accounting issue.

English

0

6

426

Da7em รีทวีตแล้ว

思维怪怪@0xLogicrw·4h

智谱 AI 海外平台 Z.ai 被曝出存在严重的缓存计费系统异常。开发者 Da7em 向官方反馈，他在使用平台 GLM 模型时，重复的上下文疑似无法正常触发缓存扣费，而是被系统当作全新输入进行了重复计费。 Da7em 称开发工具 Zcode 记录的实际 Token 使用量约为 27 万，但平台最终的计费账单却显示消耗了近 500 万 Token，账单额度放大了近 19 倍。为排查客户端干扰，Da7em 对比测试了 Claude Code、Hermes Agent、Zcode、OpenCode 等多个智能体开发框架，确认异常在不同工具下表现一致，因此判定故障根源在服务器端缓存与记账系统。

Da7em@Da7_Tech

GLM has a serious token leakage / caching-accounting issue on Z.ai I tested this across Claude Code, Hermes Agent, Zcode, and OpenCode, so it does not look like one harness behaving badly. This has been consistent since the start of my subscription, during normal off-peak usage. My read: repeated context is being billed as fresh input instead of cached input. That’s not max reasoning. That’s a server-side caching/accounting problem. The screenshots show the issue clearly: Zcode used around 270K tokens, but I was billed for nearly 5M tokens. Cached tokens are clearly not working. I contacted support, escalated this, and tried everything, but got no real response. This is not how you treat paying customers. Please fix this. @ZixuanLi_ @Zai_org

中文

1

3

21

5.9K

Da7em@Da7_Tech·8h

@JankDankins_ Nah, it's not, they told me that, and they'll fix it.

English

0

95

Sam@JankDankins_·15h

@Da7_Tech its not a problem, you ar e correct, it is the billing structure, it is intentional

English

1

0

3

251

Da7em@Da7_Tech·8h

DeepSeek was great for me in Hermes Agent. It was excellent at tracking and fixing bugs, really solid with code issues, and strong in understanding, coverage, and general knowledge. As a general-purpose agent, I highly recommend it. But for long-running tasks, it’s bad and not suitable at all. The model itself seems wired to stop as early as possible and ask whether you want to continue, no matter what. It even tries to break goal mode. Sadly, it has no long-task stamina. As for Kimi, my last test was with Kimi 2.5. It was good, especially for UI work, and very good at agent management. But it was extremely slow and usually needed multiple rounds of tweaking and improvement to reach the target. It couldn’t reliably complete the task from a single prompt the way stronger models can. I haven’t tested their latest model yet, so I’m waiting to try it before giving a full opinion.

Mushaf S.@mushaf_mughal

@Da7_Tech What do you think of kimi or deepseek from your experience? From this post I got the idea of how GLM is compared to opus or gpt model

English

2

0

11

2K

Da7em@Da7_Tech·8h

@oooo0ooooo10 I've tried everything – Claude Code, Zcode, Hermes agent, Opencode – nothing's working.

English

0

75

Oooooo@oooo0ooooo10·15h

@Da7_Tech What harness are you testing it in? Give Claude Code or Droid a try and see what happens

English

1

0

751

Da7em รีทวีตแล้ว

Da7em@Da7_Tech·23h

Same task. Same prompt. Same setup. Opus 4.8 completed the task. It used 2% of the weekly limit and 7% of the 7-hour limit. GLM 5.2 did not complete the task. It consumed 20% of the weekly limit and 100% of the 5-hour limit. Token usage is completely abnormal: Opus 4.8: less than 1.5M tokens GLM 5.2: reached 53M tokens And this is despite the app showing only around 1.67M tokens. This means one thing very clearly: cached tokens are not working properly on GLM 5.2. The repeated context is being counted as normal input instead of cached tokens. I have been dealing with this issue for more than a month with no real solution. By the way, I am on the Pro plan, and I paid for a full yearly subscription. These numbers are not normal. This is not usage behavior. This is a serious billing/cache accounting problem on GLM 5.2.

English

29

13

256

44.6K

Da7em@Da7_Tech·8h

@aether_oracle I have tried everything.

English

0

754

Aether Oracle@aether_oracle·12h

@Da7_Tech Are you using the OpenAI compatible endpoint in Cursor? I would try the Anthropic endpoint in Claude Code. The oai one might be bugged

English

1

0

1.2K

Da7em@Da7_Tech·8h

@0xSero Glm-5.2 or Kimi 2.7?

Türkçe

0

262

0xSero@0xSero·9h

GLM-5.2 in Grok build lol.

English

5

0

56

5.3K

Da7em@Da7_Tech·9h

@weconomypistis Thanks, I really appreciate the help.

English

0

1.2K

Pistis@weconomypistis·10h

@Da7_Tech I had been sent this feedback to their chief operatot, hope to have a positive feedback to you

English

1

0

1

1.4K

Da7em@Da7_Tech·10h

@ajs6888 And it already is.

English

0

336

安叫兽|Bird🕊️ 🔶 BNB@ajs6888·11h

@Da7_Tech 这要是一直存在，账单真的会很难看

中文

1

0

1

390

Da7em รีทวีตแล้ว

Da7em@Da7_Tech·1d

Anthropic spent years fighting open-source AI, warning the public about its dangers, and locking its own work behind a greedy, unsustainable model. And today, with one move, it gave open-source AI the greatest gift imaginable. A huge wave of people will now start looking seriously at open-source models: testing them, learning them, using them, building on them, and contributing back. So thank you, Anthropic. Thank you for being greedy, short-sighted, and stupid enough to launch a truly Mythos-level ad campaign for open-source AI.

English

2

28

6.4K

Da7em@Da7_Tech·15h

@chen973812 Not true.

English

0

1

2.6K

chen@chen973812·20h

@Da7_Tech 我觉得这只是一个显示问题，coding plan 一直都是不透明的，但是你可以参考cc-switch 第三方工具，里面会有实际用量的金额计费和token用量，简单来说，我觉得glm显示了token用量，隐藏了金额用量，它金额用量是固定的，所以你的缓存是生效的，如果你每次都是新请求，你的token总用量会锐减

中文

1

0

3.3K

Da7em@Da7_Tech·15h

@vanyadog2 Codex $20 gives me more usage than Glm Pro, which is disappointing.

English

1

0

1

105

Vanydog@vanyadog2·15h

Yes, this is a strange problem. According to the data, I spent over 20M tokens, but that's impossible even with cache. My project wasn't that large, and I'd never received that much volume anywhere. I also instantly used up two five-hour limits, and the last one had a strange spike. I had only 45% of my limit used, so I told the team to push changes and change a couple of lines of code, and suddenly it went up to 100%, and my token consumption jumped from 2M to 12M at once. I used up two five-hour limits in literally 10 minutes... The Lite subscription is unusable now, as are almost all $20 subscriptions on AI these days, unfortunately...

English

1

0

2

105

Da7em@Da7_Tech·23h

I have contacted them. I emailed them. I messaged every single member of the Z.ai team I could find on X. I posted on Discord again and again. I tried multiple times, and nothing worked. All my tests and all the evidence point to the same thing: the problem is on their side, not mine. I am losing time and money while they simply do not seem to care. If anyone sees this and is able to help, please help me. I do not want to be redirected to another generic support email. Those recommendations have been useless. I want a real solution. I want someone responsible from Z.ai to contact me directly. I paid for a full-year pro subscription, and for more than a month now I have not been able to benefit from it at all. I need help. x.com/da7_tech/statu… x.com/da7_tech/statu…

Da7em@Da7_Tech

GLM has a serious token leakage / caching-accounting issue on Z.ai I tested this across Claude Code, Hermes Agent, Zcode, and OpenCode, so it does not look like one harness behaving badly. This has been consistent since the start of my subscription, during normal off-peak usage. My read: repeated context is being billed as fresh input instead of cached input. That’s not max reasoning. That’s a server-side caching/accounting problem. The screenshots show the issue clearly: Zcode used around 270K tokens, but I was billed for nearly 5M tokens. Cached tokens are clearly not working. I contacted support, escalated this, and tried everything, but got no real response. This is not how you treat paying customers. Please fix this. @ZixuanLi_ @Zai_org

English

7

0

12

20.7K

Da7em@Da7_Tech·15h

@rainnyday1979 Sadly, they loose our trust.

English

1

0

1

2.5K

UlsanOzzy@rainnyday1979·15h

@Da7_Tech 저도 비슷한 이유로 GLM-5.1 에서 크게 시간 낭비한 적이 있습니다. 저 혼자만의 착각이 아니었군요.

한국어

1

0

6

2.9K

Da7em@Da7_Tech·15h

@AI_dude_eu No, its not. I have tried another models in same harness, and Glm with all harness.

English

0

1.3K

AI Dude 🇪🇺@AI_dude_eu·16h

@Da7_Tech Maybe wrong harness??

English

1

0

1.5K

Da7em@Da7_Tech·15h

@LeonLRedfield Sadly, I really believed in GLM.

English

1

0

4

1.6K

台灣李家宝🇲🇳@LeonLRedfield·16h

@Da7_Tech 重點是，這些中國模型并不能真的用於特定生產

中文

1

0

2

1.8K

Da7em@Da7_Tech·16h

@kostasbotonakis It's a disaster, I'm really upset.

English

0

5

3.4K

Konstantinos@kostasbotonakis·16h

It's not just you, and it's not a you problem. It's z.ai lying to you about the model's capabilities. When they advertised that GLM-5.2 is a “near” Opus 4.8 model and now Fable is temporarily unavailable, they thought people would just accept this and pay for a subscription like they did last year with GLM 4.x.

English

2

0

29

4.1K

Da7em@Da7_Tech·16h

@lucasteske @Zai_org Thanks man, I'm grateful.

English

0

1

59

Cybernetic Lover@lucasteske·16h

@Da7_Tech I dont think they will answer though, sadly. But tagging anyways @Zai_org

English

1

0

1

320

Da7em@Da7_Tech·16h

Hi, I submitted PR #44987 with comprehensive Arabic localization for Hermes: Desktop, Dashboard, Agent/CLI, full RTL layouts, Arabic plural rules, fonts, and 3,100+ translated strings. Today, PR #45619 appeared after mine, using substantial parts of my work without attribution. Its Arabic catalog contains 421 exactly matching lines, including 64 long, distinctive matches, plus the same implementation decisions and tests. The author explicitly knew about my earlier PR. I believe this is uncredited appropriation of my work. Please review both PRs, their timestamps, commits, and diffs before deciding which one to merge. @NousResearch @Teknium

English

0

8

755

Da7em

ค้นพบ