loop
11 posts


My first PhD work: "Not All Prefills Are Equal"
Prefill-Decode disaggregation is the standard for LLM serving. But for multi-turn conversations, it re-transfers the entire KV cache every turn.
We found a better way!
Thanks for my amazing advisor @ce_zhang and collaborators!
English

Ascend NPU上的大模型推理框架还是太难用了,用于科研的nano框架更是没有,计划先实现一个CPU上的Qwen0.6B的推理框架,然后移植到NPU上
github.com/BangBOOM/nano-…
中文

Finally logseq begins to change it’s UI
Logseq 🪵@logseq
Logseq 0.9.14 just dropped! We've overhauled the sidebars, making it easier than ever to manage information. But what we're most excited about? Smart Merge for Logseq Sync! Say goodbye to sync conflicts and hello to block-level syncing. See 👇 for more details of the changes.
English

