
Charlie Chen
20 posts



Qwen 3.6 Plus from @Alibaba_Qwen is officially the first model on OpenRouter to break 1 Trillion tokens processed in a single day! At ~1,400,000,000,000 tokens, it’s the strongest full day performance of any new model dropped this year. Congrats to the Qwen team!


























myth #3: model intelligence is the same regardless of whether you use completions or responses wrong again. responses was built for thinking models that call tools within their chain-of-thought (CoT). responses allows persisting the CoT between model invocations when calling tools agentically -- the result is a more intelligent model, and much higher cache utilization; we saw cache rates jump from 40-80% on some workloads. this one is perhaps the most egregious. developers don't realize how much performance they are leaving on the table. i get it, its hard because you use LiteLLM or some custom harness you built around chat completions or whatever, but prioritizing the switch is crucial if you want GPT-5 to be maximally performant in your agents. here's our cookbook on function calling with responses: cookbook.openai.com/examples/o-ser…
















