Yaofang Liu

148 posts

Yaofang Liu banner
Yaofang Liu

Yaofang Liu

@stephenajason

Ph. D. Candidate, CityUHK, intern at Noah's Ark Lab, Prev. Tencent AI Lab, visiting at CambridgeU, working on diffusion models, video generation, and mutimodal

Cambridge, England Katılım Ağustos 2017
380 Takip Edilen169 Takipçiler
Sabitlenmiş Tweet
Yaofang Liu
Yaofang Liu@stephenajason·
🚀 ​​Pusa V1.0 Release Can you believe training a SOTA level Image-to-Video model with only $500 training cost? No way? But yes, we made it! And we achieved much more beyond that. We’re thrilled to release ​​Pusa V1.0​​—a paradigm shift in video generation, redefining video diffusion efficiency. With our novel ​​Vectorized Timestep Adaptation (VTA)​​ based on our prior FVDM work: 🔥 ​​Key Features: ✅Unprecedented Efficiency: - Surpasses Wan-I2V-14B with ≤ 1/200 of the training cost ($500 vs. ≥ $100,000) - Trained on a dataset ≤ 1/2500 of the size (4K vs. ≥ 10M samples) - Achieves a VBench-I2V score of 87.32% with 10 inference steps (vs. 86.86% for Wan-I2V-14B with 50 steps) ✅ Comprehensive Multi-task Support: VTA fully preserves Text-to-Video from the base model Wan-T2V, and after finetuning, Pusa V1.0 extends to the following all in a zero-shot way (no task-specific training): - Image-to-Video - Start-End Frames - Video completion/transitions - Video Extension - And more... ✅Complete Open-Source Release: - Full codebase and training/inference scripts - Model weights and dataset for Pusa V1.0 - Paper/ Tech Report with Detailed and Comprehensive Methodology 💡 ​​Scientific breakthrough​​: VTA enables granular temporal control via frame-level noise adaptation—no task-specific training needed. 🌍 ​​Fully open-sourced​​: • Codebase: github.com/Yaofang-Liu/Pu… • Project Page: yaofang-liu.github.io/Pusa_Web/ • Technical report: github.com/Yaofang-Liu/Pu… • Model weights: huggingface.co/RaphaelLiu/Pus… • Dataset: huggingface.co/datasets/Rapha… [1/n]
English
8
13
52
13K
Yaofang Liu
Yaofang Liu@stephenajason·
@ethanjohnweber I’ve only used cc to write HTML to replace slides for pre before. never realized it would also work this well for a poster! such a massive time-saver. Thanks again for the skill!
English
0
0
0
20
Ethan Weber
Ethan Weber@ethanjohnweber·
@stephenajason Woah super cool! Thanks for sharing the result! I’m really glad it worked out for you!
English
1
0
1
46
Yaofang Liu
Yaofang Liu@stephenajason·
It recently comes to my mind: The so called “language” in LLM is not the language for human, it’s just the language for computer/machine. Everything on computer/machine could be learnt and generated. And that’s the limit also.
English
0
0
2
49
Ethan Weber
Ethan Weber@ethanjohnweber·
I made a Claude Code skill that generates conference posters 🛠️ Instead of a static PDF, it outputs a single HTML file — drag to resize columns, swap sections, adjust fonts, then give your layout back to Claude. 🔁 🔗 Skill 👉 github.com/ethanweber/pos…
English
29
332
2.5K
181.5K
Yaofang Liu
Yaofang Liu@stephenajason·
I have to say, at least for this case, cc seems much better than Codex GPT-5.4-high, the following is what generated by Codex with basically the same prompt
Yaofang Liu tweet media
English
0
0
0
55
Yaofang Liu
Yaofang Liu@stephenajason·
🎉🚀 Pusa Accepted to ICLR 2026! 🚀🎉 Our vectorized timestep-powered video diffusion model—trained for just $500, <4k samples, SOTA-level I2V performance & multi-task zero-shot support—makes it to ICLR 2026! Fully open-sourced: 🔗 github.com/Yaofang-Liu/Pu…
Yaofang Liu@stephenajason

🚀 ​​Pusa V1.0 Release Can you believe training a SOTA level Image-to-Video model with only $500 training cost? No way? But yes, we made it! And we achieved much more beyond that. We’re thrilled to release ​​Pusa V1.0​​—a paradigm shift in video generation, redefining video diffusion efficiency. With our novel ​​Vectorized Timestep Adaptation (VTA)​​ based on our prior FVDM work: 🔥 ​​Key Features: ✅Unprecedented Efficiency: - Surpasses Wan-I2V-14B with ≤ 1/200 of the training cost ($500 vs. ≥ $100,000) - Trained on a dataset ≤ 1/2500 of the size (4K vs. ≥ 10M samples) - Achieves a VBench-I2V score of 87.32% with 10 inference steps (vs. 86.86% for Wan-I2V-14B with 50 steps) ✅ Comprehensive Multi-task Support: VTA fully preserves Text-to-Video from the base model Wan-T2V, and after finetuning, Pusa V1.0 extends to the following all in a zero-shot way (no task-specific training): - Image-to-Video - Start-End Frames - Video completion/transitions - Video Extension - And more... ✅Complete Open-Source Release: - Full codebase and training/inference scripts - Model weights and dataset for Pusa V1.0 - Paper/ Tech Report with Detailed and Comprehensive Methodology 💡 ​​Scientific breakthrough​​: VTA enables granular temporal control via frame-level noise adaptation—no task-specific training needed. 🌍 ​​Fully open-sourced​​: • Codebase: github.com/Yaofang-Liu/Pu… • Project Page: yaofang-liu.github.io/Pusa_Web/ • Technical report: github.com/Yaofang-Liu/Pu… • Model weights: huggingface.co/RaphaelLiu/Pus… • Dataset: huggingface.co/datasets/Rapha… [1/n]

English
1
0
2
117
李继刚
李继刚@lijigang·
ai 时代,内容生成变得过载级别丰盛。 ____成为了这个时代新的稀缺。
中文
221
40
333
132.8K
Yaofang Liu
Yaofang Liu@stephenajason·
From early last year, our group has been thinking what’s the boundary between SFT and RL. Is SFT truly useless w.r.t developing reasoning ability? The intuition back then was no. And now, we give our new work that really reveals how SFT can benefit reasoning!! Please check our new paper “GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning” arxiv.org/abs/2507.10628 Code is also open sourced! github.com/hkgc-1/GHPO Big salute to Ziru Liu @LiuZiru111 , Cheng Gong @ChengGO15379635 , Xinyu Fu, Rui Liu @RuiLiu888 et al. 👏👏
Ziru Liu@LiuZiru111

SFT + RLVR: Addressing Reward Sparsity through Hybrid Learning 🙀Ever wonder why LLMs get stuck during RLVR training? It's often the capacity-difficulty mismatch—training data too tough for the model, leading to reward sparsity and zero progress. Especially painful for smaller LLMs! Well, we've cracked it! 🎉 Say hello to GHPO, a groundbreaking RLVR framework that skillfully combines SFT and online RL. It's a difficulty-aware system designed to tackle this exact problem head-on! 🤖GHPO is smart. It dynamically adjusts task difficulty using adaptive prompt refinement, offering precise guidance. This unique approach means it knows when to use direct imitation learning for challenging problems (where the model needs a boost!) and when to switch to exploration-based RL for tasks it can handle. The result? A smooth, optimized learning journey that completely avoids reward sparsity! ✨ Key Highlights: - Automated & Adaptive: Automatically detects how hard a problem is and intelligently blends on-policy RL with guided imitation learning via prompt refinement. - Big Wins: Our experiments show GHPO delivers an impressive 5% average performance gain over GRPO method on six math benchmarks! 🧑‍💻Built on Openr1 #openr1, with its Trainer based on TRL #TRL, really appreciate great works from huggingface teams. @QGallouedec We even rolled out a custom ‘GHPOTrainer’ to make things even smoother. 👇🔗 Check it out! Paper: arxiv.org/abs/2507.10628 GitHub: github.com/hkgc-1/GHPO Dataset: huggingface.co/datasets/hkgc/… #artificialintellegence #TRL #huggingface #openr1 #RLVR #MachineLearning #DeepLearning #LLM #PhD [1/N]

English
0
1
4
478
Yaofang Liu retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Training a language model with raw rewards collapses when every hard question hands back zero. Guided Hybrid Policy Optimization (GHPO) fixes that by dropping small hints only on those stubborn items. RL with verifiable rewards is a game of 1 for correct, 0 for wrong. If the whole batch reads 0, gradients vanish and learning stalls. GHPO looks at each query’s G sampled answers, all zeros mark it difficult. The algorithm pastes 25% of the official solution into the prompt, reruns, and climbs to 50% or 75% until at least one answer lands. Easy queries skip guidance, so exploration keeps discovering fresh paths. This dynamic mix of imitation and exploration steadies gradients, cuts wasted compute, and raises sample efficiency. Experiments on 6 competitive math suites show about 5% higher accuracy over strong on policy and curriculum baselines, even on a compact 7B model. Metrics in the study confirm smoother optimization and longer, clearer reasoning chains. ---- Paper – arxiv. org/abs/2507.10628 Paper Title: "GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning"
Rohan Paul tweet media
English
2
10
31
3.3K
AK
AK@_akhaliq·
Pusa V1.0 Surpassing Wan-i2v-14b With $500 Training Cost By Vectorized Timestep Adaptation
English
11
64
465
42.2K