
Da7em
1.4K posts

Da7em
@Da7_Tech
Chasing the horizon of tomorrow's ✨ | Gaming, Weaving Business, AI, and Apps | Linux soul!






GLM has a serious token leakage / caching-accounting issue on Z.ai I tested this across Claude Code, Hermes Agent, Zcode, and OpenCode, so it does not look like one harness behaving badly. This has been consistent since the start of my subscription, during normal off-peak usage. My read: repeated context is being billed as fresh input instead of cached input. That’s not max reasoning. That’s a server-side caching/accounting problem. The screenshots show the issue clearly: Zcode used around 270K tokens, but I was billed for nearly 5M tokens. Cached tokens are clearly not working. I contacted support, escalated this, and tried everything, but got no real response. This is not how you treat paying customers. Please fix this. @ZixuanLi_ @Zai_org

@Da7_Tech What do you think of kimi or deepseek from your experience? From this post I got the idea of how GLM is compared to opus or gpt model









GLM has a serious token leakage / caching-accounting issue on Z.ai I tested this across Claude Code, Hermes Agent, Zcode, and OpenCode, so it does not look like one harness behaving badly. This has been consistent since the start of my subscription, during normal off-peak usage. My read: repeated context is being billed as fresh input instead of cached input. That’s not max reasoning. That’s a server-side caching/accounting problem. The screenshots show the issue clearly: Zcode used around 270K tokens, but I was billed for nearly 5M tokens. Cached tokens are clearly not working. I contacted support, escalated this, and tried everything, but got no real response. This is not how you treat paying customers. Please fix this. @ZixuanLi_ @Zai_org













