Xuefei Ning

11 posts

Xuefei Ning

Xuefei Ning

@FiNingm

Katılım Şubat 2015
165 Takip Edilen34 Takipçiler
Xuefei Ning retweetledi
Zinan Lin
Zinan Lin@lin_zinan·
[𝗜𝗻𝘁𝗲𝗿𝗻 𝗛𝗶𝗿𝗶𝗻𝗴]We are hiring a [𝐒𝐩𝐫𝐢𝐧𝐠 𝟐𝟎𝟐𝟓] [𝐟𝐮𝐥𝐥-𝐭𝐢𝐦𝐞] intern working on 𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝗘𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻. If you are interested, please apply here jobs.careers.microsoft.com/global/en/job/… and send me an email: zinanlin at microsoft dot com
Zinan Lin@lin_zinan

#ICML2024 Spotlight <Differentially Private Synthetic Data via Foundation Model APIs 2: Text> was accepted at ICML 2024 as Spotlight! Unfortunately, we are not attending ICML in person. But feel free to reach out to us if you are interested! Paper: arxiv.org/abs/2403.01749

English
0
4
20
5K
Xuefei Ning retweetledi
AK
AK@_akhaliq·
DiTFastAttn Attention Compression for Diffusion Transformer Models Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to self-attention's quadratic complexity. We propose DiTFastAttn, a novel post-training compression
AK tweet media
English
1
28
142
17.4K
Xuefei Ning retweetledi
Zinan Lin
Zinan Lin@lin_zinan·
Thank @_akhaliq for featuring Skeleton-of-Thoughts! Check out the recorded demo: sites.google.com/view/sot-llm/h… It is just a start--more work to do towards a usable tool. But we genuinely believe in the potential of this data-driven direction to make LLMs more efficient and powerful!
AK@_akhaliq

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding paper page: huggingface.co/papers/2307.15… This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One of the major causes of the high generation latency is the sequential decoding approach adopted by almost all state-of-the-art LLMs. In this work, motivated by the thinking and writing process of humans, we propose "Skeleton-of-Thought" (SoT), which guides LLMs to first generate the skeleton of the answer, and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point in parallel. Not only does SoT provide considerable speed-up (up to 2.39x across 11 different LLMs), but it can also potentially improve the answer quality on several question categories in terms of diversity and relevance. SoT is an initial attempt at data-centric optimization for efficiency, and reveal the potential of pushing LLMs to think more like a human for answer quality.

English
1
1
2
407
Xuefei Ning retweetledi
Xuefei Ning retweetledi
Zinan Lin
Zinan Lin@lin_zinan·
#ICML Accelerate 𝐒𝐭𝐚𝐛𝐥𝐞 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 by 𝟐𝐱 through a new perspective! Visit poster #102 on Thursday, July 27th, at 10:30 am. [𝐏𝐚𝐩𝐞𝐫] OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models arxiv.org/abs/2306.08860 (1/3)
English
1
2
5
407
Xuefei Ning retweetledi
Song Han
Song Han@songhan_mit·
Welcome to my new course on TinyML and Efficient Deep Learning, starting tomorrow: efficientml.ai
English
11
97
574
0