

琥珀青葉@KohakuLab
3.1K posts

@KBlueleaf
Graduate student in Taiwan. Leader of KohakuLab. Researcher/Dev in ComfyOrg



🚨 BREAKING: Tencent has killed the “next-token” paradigm. Tencent and Tsinghua has released CALM (Continuous Autoregressive Language Models), and it completely disrupts the next-token paradigm. LLMs currently waste massive amounts of compute predicting discrete, single tokens through a huge vocabulary softmax layer. It’s slow and scales poorly. CALM bypasses the vocabulary entirely. It uses a high-fidelity autoencoder to compress chunks of text into a single continuous vector with 99.9% reconstruction accuracy. The model now predicts the “next vector” in a continuous space. The numbers are actually insane: - Each generative step now carries 4× the semantic bandwidth. - Training compute is reduced by 44%. - The softmax bottleneck is completely removed. We’re literally watching language models evolve from typing discrete symbols to streaming continuous thoughts. This changes the entire trajectory of AI.











Implemented VQ and related algo in triton, and get 4~8x speed up with nearly 0 vram usage (compare to naive torch impl)











