Ziming Liu (@lzm_mlsys) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Ziming Liu@lzm_mlsys·19 Eki

🚀Serving MoE models made EASY and CHEAP!! We built EaaS — think of experts not as layers in a model, but as microservices you can spin up, replicate, or kill independently. No all-to-all. No static process groups. No system-wide crash when one GPU dies. Just: ⚙️ Clients (attention) ↔ Servers (experts) 🧠 Stateless → easy replication 📡 Asymmetric async P2P (no CPU involved!) 🧱 Fine-grained scaling without restarting and save real 💰！ Monolithic inference is over. Serving is becoming cloud-native. Preprint here → arxiv.org/abs/2509.17863

English

1

0

4

2.5K

Ziming Liu retweetledi

Tencent HY@TencentHunyuan·5 Mar

One static model does not fit all😭 We just dropped our latest work: Functional Neural Memory. Instead of static models, we generate custom "parameters" for every single input. ✅Prompt your model anytime ✅Instant personalization ✅Better instruction following ✅Flexible & dynamic memory (w/o memory bank✌️) (🧵1/6)

English

11

139

333

68.6K

Ziming Liu@lzm_mlsys·19 Eki

Communication library to be released soon!

English

0

113

Ziming Liu@lzm_mlsys·19 Eki

🚀Serving MoE models made EASY and CHEAP!! We built EaaS — think of experts not as layers in a model, but as microservices you can spin up, replicate, or kill independently. No all-to-all. No static process groups. No system-wide crash when one GPU dies. Just: ⚙️ Clients (attention) ↔ Servers (experts) 🧠 Stateless → easy replication 📡 Asymmetric async P2P (no CPU involved!) 🧱 Fine-grained scaling without restarting and save real 💰！ Monolithic inference is over. Serving is becoming cloud-native. Preprint here → arxiv.org/abs/2509.17863

English

1

0

4

2.5K

Ziming Liu retweetledi

Victor.Kai Wang@VictorKaiWang1·20 Haz

Customizing Your LLMs in seconds using prompts🥳! Excited to share our latest work with @HPCAILab, @VITAGroupUT, @k_schuerholt, @YangYou1991, @mmbronstein, @damianborth : Drag-and-Drop LLMs(DnD). 2 features: tuning-free, comparable or even better than full-shot tuning.(🧵1/8)

English

5

72

114

17.7K

Ziming Liu@lzm_mlsys·19 Şub

@6___0 @MSFTResearch @YangYou1991 Hope we can collaborate with the HF team to implement RAS in the diffusers soon :)

English

0

51

kfant@6___0·19 Şub

@lzm_mlsys @MSFTResearch @YangYou1991 when in HF transformers library?

English

1

0

63

Ziming Liu@lzm_mlsys·17 Şub

🚀Towards efficient Diffusion Transformers! 😆We are happy to introduce RAS, the first diffusion sampling strategy that allows for regional variability in sampling ratios, achieving up to 2x+ speedup! 🔌Training-free, plug and play! 💪Nice work with @MSFTResearch @YangYou1991 @Yif_Yang et al. 📜Paper: huggingface.co/papers/2502.10… 📖Blog: aka.ms/ras-dit ⌨️Code: github.com/microsoft/RAS (1/5)

English

6

40

188

18K

Ziming Liu@lzm_mlsys·18 Şub

@aryanvs_ @MSFTResearch @YangYou1991 Thanks! We have been trying to make it simple to use as a plug-in of diffusers and it would be really nice if it could be supported in diffusers LOL.

English

0

2

135

Aryan V S@aryanvs_·18 Şub

@lzm_mlsys @MSFTResearch @YangYou1991 Hey, might be cool to support it in diffusers directly! Congrats on the release and this is super cool work 🎉

English

1

0

1

205

Ziming Liu retweetledi

Yang You@YangYou1991·17 Şub

RAS yields over 2x the acceleration with almost no image quality loss! Nice work and congrats!

Ziming Liu@lzm_mlsys

🚀Towards efficient Diffusion Transformers! 😆We are happy to introduce RAS, the first diffusion sampling strategy that allows for regional variability in sampling ratios, achieving up to 2x+ speedup! 🔌Training-free, plug and play! 💪Nice work with @MSFTResearch @YangYou1991 @Yif_Yang et al. 📜Paper: huggingface.co/papers/2502.10… 📖Blog: aka.ms/ras-dit ⌨️Code: github.com/microsoft/RAS (1/5)

English

0

2

8

2.4K

Ziming Liu@lzm_mlsys·17 Şub

Results: To further evaluate the qualitative performance of RAS, we conducted a human evaluation. We randomly selected 14 prompts from the official research papers and blogs of Stable Diffusion 3 and Lumina, generating two images for each prompt: one using dense inference and the other using RAS, both with the same random seed and default number of timesteps. 633 out of 1400 votes (45.21%) indicated that the two images were of similar quality. Additionally, 28.29% of votes favored the dense image over the RAS result, while 26.50% preferred RAS over the dense result. These results demonstrate that RAS achieves a significant improvement in throughput (1.625× for Stable Diffusion 3 and 1.561× for Lumina-Next-T2I) without noticeably affecting human preference. (5/5)

English

0

6

413

Ziming Liu@lzm_mlsys·17 Şub

We also introduced techniques like starvation prevention, dynamic sampling ratio, accumulated error resetting, key & value recovery, and kernel fusing to further improve the performance of our method. Please refer to our paper for more details. (4/5)

English

1

0

5

451

Ziming Liu retweetledi

Victor.Kai Wang@VictorKaiWang1·20 Oca

Generating ~200 million parameters in just minutes! 🥳 Excited to share our work with @MTDovent , @heisejiasuo96 , and @YangYou1991: 'Recurrent Diffusion for Large-Scale Parameter Generation' (RPG for short). Example: Obtain customized models using prompts (see below). (🧵1/8)

English

4

85

286

45.2K

Ziming Liu retweetledi

Shijie Wang@ShijieWang20·24 Ara

How can we better animate images solely following text descriptions? We present Motion Focal Loss (MotiF) (arxiv.org/abs/2412.16153) to better align motions with text descriptions in text-image-to-video (TI2V) task and release TI2V-Bench, a comprehensive TI2V benchmark. (1/n)

English

6

11

54

5.8K

Ziming Liu retweetledi

Yang Luo@YangL_7·20 Ara

Training-free Video Enhancement: Achieved 🎉 Nice work with @oahzxl @shaowenqi126301 @VictorKaiWang1 @VitaGroupUT @YangYou1991 et al. Non-trivial enhancement, training-free, and plug-and-play 🥳 Blog: oahzxl.github.io/Enhance_A_Vide… (🧵1/6)

English

9

82

251

46.4K

Ziming Liu retweetledi

Together AI@togethercompute·5 Mar

We are excited that @Leon75421958 is joining the stellar Together AI Research team as VP Frontier Technologies. Leon has driven LLM ecosystem innovations such as Deepspeed and Project Brainwave. Leon will help our mission to build the fastest cloud for generative AI.

English

5

31

6.5K

Ziming Liu@lzm_mlsys·28 Şub

@oahzxl @Haofan_Wang @zzk_zhao @NUSingapore Thanks a lot 😆

English

0

47

Xuanlei Zhao@oahzxl·28 Şub

@Haofan_Wang @zzk_zhao @lzm_mlsys @NUSingapore Thanks for sharing our work!

English

1

0

3

165

Frank (Haofan) Wang@Haofan_Wang·28 Şub

OpenDiT is a great work by @oahzxl @zzk_zhao @lzm_mlsys from @NUSingapore, which is an Easy, Fast and Memory-Efficent System for DiT Training and Inference. This year will belong to DiT, you can't miss it if you are on generative boat. github.com/NUS-HPC-AI-Lab…

English

2

9

56

5.5K

Ziming Liu@lzm_mlsys·27 Şub

@YangYou1991 Really enjoyed chipping in on this project. It's been awesome!😃

English

0

2

227

Yang You@YangYou1991·27 Şub

Want to train a model like #Sora? Check out our new project #OpenDiT! OpenDiT is an easy-to-use, fast, and memory-efficient system for training and deploying DiT models, which are the foundation of models like Sora. With OpenDiT, you can achieve: * Up to 80% faster in training * 50% reduction in memory usage * Over 50% reduction in communication volume with novel sequence parallelism Github: github.com/NUS-HPC-AI-Lab…

English

5

65

314

88.8K

Ziming Liu@lzm_mlsys·27 Şub

Really enjoyed chipping in on this project. It's been awesome!😃

Yang You@YangYou1991

Want to train a model like #Sora? Check out our new project #OpenDiT! OpenDiT is an easy-to-use, fast, and memory-efficient system for training and deploying DiT models, which are the foundation of models like Sora. With OpenDiT, you can achieve: * Up to 80% faster in training * 50% reduction in memory usage * Over 50% reduction in communication volume with novel sequence parallelism Github: github.com/NUS-HPC-AI-Lab…

English

0

3

115

Ziming Liu

Keşfet