Chumeng Liang

21 posts

Chumeng Liang banner
Chumeng Liang

Chumeng Liang

@lowerbad

1st CS PhD student at @UofIllinois. Diffusion Language Models. Representation Learning.

加入时间 Ekim 2025
35 关注99 粉丝
Chumeng Liang
Chumeng Liang@lowerbad·
@sedielem Thank you! It is always good to find theoretical supports for classical works!
English
0
0
2
94
Sander Dieleman
Sander Dieleman@sedielem·
This work provides theoretical grounding for some of the design decisions (cross-entropy loss, learnable embeddings, self-conditioning, entropy-based schedule) in CDCD (arxiv.org/abs/2211.15089), and brings it into the modern era. Continuous text diffusion is still competitive!
Chumeng Liang@lowerbad

Continuous diffusion dominates image & video generation, but people used to believe that it inherently lags behind its discrete counterparts in language modeling. Today, we challenge this belief with LangFlow: the first continuous diffusion language model that rivals—and even beats—discrete diffusion. (1/7) Blog: caradryanl.github.io/blog/2026/lang… GitHub: github.com/nealchen2003/L… Arxiv: arxiv.org/abs/2604.11748

English
3
17
113
11.9K
Chumeng Liang
Chumeng Liang@lowerbad·
@punyajoysaha Thank you for your attention. TESS series are great pretrained models while our work focuses on methodology at smaller scales for now. If we have the chance to scale up our model, we would love to compare it to TESS.
English
0
0
1
176
Punyajoy Saha
Punyajoy Saha@punyajoysaha·
@lowerbad Why have u not compared with papers like TESS/TESS 2
English
1
0
1
243
Chumeng Liang
Chumeng Liang@lowerbad·
Thank you for the note. To our best knowledge, we believe LangFlow is the first to provide comprehensive and size-controlled ppl/gen ppl/entropy comparison across LM1B/OWT/zero-shot, and demonstrated clear win over best DDLM in significant portion of the tasks. We have included discussion on several brilliant recent concurrent work in DLMs, such as FMLM, we believe these few-step distillation techniques can be synergistically combined with our embedding-space DLM to further improve efficiency.
English
0
0
4
335
Chumeng Liang
Chumeng Liang@lowerbad·
The potential of continuous DLMs extends far beyond just performance. They open the door for all continuous diffusion techniques to be introduced into language modeling: - One-step generation, such as Consistency Models - Guided generation, such as CFG - Unified multimodal generation, such as protein structure-sequence co-design LangFlow suggests: continuous diffusion is NOW a viable and promising paradigm for language modeling. (7/7)
English
1
1
10
769
Chumeng Liang 已转推
Jiaxuan You
Jiaxuan You@youjiaxuan·
🚨 RL for LLMs is finally accessible. Introducing OpenTinker: The first community-driven, open-source framework designed to democratize Reinforcement Learning for LLMs. github.com/open-tinker/Op… Inspired by @thinkymachines's amazing Tinker, we realize the biggest bottleneck in agentic LLM research isn’t the math—it’s the setup. Current RL pipelines are messy. Configuring VeRL for every single experiment is a productivity killer. OpenTinker fixed it. 🛠 How OpenTinker Works: Decoupled Design of Server and Client - Setup Once, Run Forever: Configure the OpenTinker backend on your GPU cluster once. - Develop Locally: Define your RL environments directly on your laptop. - Train on the Cloud: Simply point your local client to the backend. The cluster handles the compute; you handle the science. 📉 The 10x Development Efficiency Thanks to our elegant architectural decomposition, OpenTinker reduces the time to develop a new RL training pipeline by at least an order of magnitude. ⚡ Turn Idle GPU Compute into Gold Small labs often have underutilized hardware. OpenTinker turns your idle GPUs into an internal/external API service for - RL Training - SFT - Inference 🎯 Who needs OpenTinker? - Researchers tired of infrastructure hell. - Labs needing to standardize workflows. - Teams wanting to maximize hardware ROI. Thanks my amazing PhD student @realagi25 for leading the project. We are building the future of open RL infra. Be the first to build with us. 👇 Start Building with OpenTinker Now 🚀 Repo: github.com/open-tinker/Op… 🌐 Blog: open-tinker.github.io/opentinker-pag… If you believe RL should be accessible to everyone, give us a star, repost this 🔄 post, and let us know what agents you plan to build!
Jiaxuan You tweet mediaJiaxuan You tweet media
English
15
148
1.1K
57.9K
Chumeng Liang
Chumeng Liang@lowerbad·
Goooood job
Zhanhui Zhou@asapzzhou

(1/n) Tiny-A2D: An Open Recipe to Turn Any AR LM into a Diffusion LM Code (dLLM): github.com/ZHZisZZ/dllm Checkpoints: huggingface.co/collections/dl… With dLLM, you can turn ANY autoregressive LM into a diffusion LM (parallel generation + infilling) with minimal compute. Using this recipe, we built a 🤗collection of the smallest diffusion LMs that work well in practice. Key takeaways: 1. Finetuned on Qwen3-0.6B, we obtain the strongest small (~0.5/0.6B) diffusion LMs to date. 2. The base AR LM matters: Investing compute in improving the base AR model is potentially more efficient than scaling compute during adaptation. 3. Block diffusion (BD3LM) generally outperforms vanilla masked diffusion (MDLM), especially on math-reasoning and coding tasks.

English
0
0
0
149
Chumeng Liang 已转推
Chumeng Liang 已转推
Jiaxuan You
Jiaxuan You@youjiaxuan·
We believe future forecasting is the ultimate challenge for agentic LLMs. 🚀 Live Trade Bench is now fully open-sourced! It’s the first live, real-world benchmark testing 20+ LLMs on financial forecasting. 📄 Read our 37-page paper detailing insights from a 2-month live trading experiment: 👉 arxiv.org/abs/2511.03628 📊 Track real-time performance across 20 LLMs here: 👉 trade-bench.live 💻 Developers interested in LLM benchmarking or trading? Try it out with: pip install live-trade-bench 🔗 Code: github.com/ulab-uiuc/live…
Jiaxuan You tweet mediaJiaxuan You tweet mediaJiaxuan You tweet mediaJiaxuan You tweet media
English
6
20
133
10.8K
Chumeng Liang
Chumeng Liang@lowerbad·
Great job!
Zhanhui Zhou@asapzzhou

(1/n) 🚨 BERTs that chat: turn any BERT into a chatbot with diffusion hi @karpathy, we just trained a few BERTs to chat with diffusion — we are releasing all the model checkpoints, training curves, and recipes! Hopefully this spares you the side quest into training nanochat with diffusion for now 🙂. It’s both a hands-on tutorial for beginners and an example showing how to use our complete toolkit (dLLM) for deeper projects. Code: github.com/ZHZisZZ/dllm Report: api.wandb.ai/links/asap-zzh… Checkpoints: huggingface.co/collections/dl… Motivation: I couldn’t find a good “Hello World” example for training a minimally working yet useful diffusion language models, a class of bidirectional language models capable of parallel token generation in arbitrary order. So I tried finetuning BERTs to make it chat with discrete diffusion—and it turned out more fun than I expected. TLDR: With a small amount of open-source instruction-following data, a standard BERT can gain conversational ability with diffusion. Specifically, a finetuned ModernBERT-large, with a similar number of parameters, performs close to Qwen1.5-0.5B.

English
0
0
2
84
Zhanhui Zhou
Zhanhui Zhou@asapzzhou·
(1/n) 🚨 BERTs that chat: turn any BERT into a chatbot with diffusion hi @karpathy, we just trained a few BERTs to chat with diffusion — we are releasing all the model checkpoints, training curves, and recipes! Hopefully this spares you the side quest into training nanochat with diffusion for now 🙂. It’s both a hands-on tutorial for beginners and an example showing how to use our complete toolkit (dLLM) for deeper projects. Code: github.com/ZHZisZZ/dllm Report: api.wandb.ai/links/asap-zzh… Checkpoints: huggingface.co/collections/dl… Motivation: I couldn’t find a good “Hello World” example for training a minimally working yet useful diffusion language models, a class of bidirectional language models capable of parallel token generation in arbitrary order. So I tried finetuning BERTs to make it chat with discrete diffusion—and it turned out more fun than I expected. TLDR: With a small amount of open-source instruction-following data, a standard BERT can gain conversational ability with diffusion. Specifically, a finetuned ModernBERT-large, with a similar number of parameters, performs close to Qwen1.5-0.5B.
Andrej Karpathy@karpathy

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've seen a bit of both. A lot of diffusion papers look a bit dense but if you strip the mathematical formalism, you end up with simple baseline algorithms, e.g. something a lot closer to flow matching in continuous, or something like this in discrete. It's your vanilla transformer but with bi-directional attention, where you iteratively re-sample and re-mask all tokens in your "tokens canvas" based on a noise schedule until you get the final sample at the last step. (Bi-directional attention is a lot more powerful, and you get a lot stronger autoregressive language models if you train with it, unfortunately it makes training a lot more expensive because now you can't parallelize across sequence dim). So autoregression is doing an `.append(token)` to the tokens canvas while only attending backwards, while diffusion is refreshing the entire token canvas with a `.setitem(idx, token)` while attending bidirectionally. Human thought naively feels a bit more like autoregression but it's hard to say that there aren't more diffusion-like components in some latent space of thought. It feels quite possible that you can further interpolate between them, or generalize them further. And it's a component of the LLM stack that still feels a bit fungible. Now I must resist the urge to side quest into training nanochat with diffusion.

English
21
118
980
176.1K
Chumeng Liang
Chumeng Liang@lowerbad·
We therefore build a benchmark to extract paper diagrams from arXiv with one command and evaluate the quality of LLM-generated diagrams accordingly, with a agentic template to generate diagrams. Show your tricks for producing high-quality paper diagrams on our new benchmark!
English
0
0
0
100
Chumeng Liang
Chumeng Liang@lowerbad·
Representing a diagram as a directed graph, our EMNLP paper shows that over 50% of nodes and 60% of edges (between correct nodes) are incorrect in LLM-generated paper diagrams (see the last figure). Diagram generation remains a mission incomplete.
English
1
0
0
119