Xu Tan

159 posts

Xu Tan

Xu Tan

@xutan_tx

Working on Large Language Models and Video/Audio Multimodality

Beijing Katılım Ocak 2022
618 Takip Edilen1.5K Takipçiler
Xu Tan retweetledi
AK
AK@_akhaliq·
Microsoft presents Chain-of-Model Learning for Language Model
AK tweet media
English
2
35
189
29.3K
Xu Tan retweetledi
Kaitao Song
Kaitao Song@SongKaitao·
Todays, I want to introduce my latest research - "Chain-of-Model Learning for Language Model (arxiv.org/abs/2505.11820)", which presents a learning paradigm named Chain-of-Model (CoM) and devise Chain-of-Language-Model (CoLM), to unlock new functionalities of foundation models.
Kaitao Song tweet media
English
1
1
2
454
Xu Tan
Xu Tan@xutan_tx·
Congrats to the leading authors @SongKaitao @WangXiaohua and all contributors! Thanks @_akhaliq for posting this work! Hope CoLM can become a new paradigm for efficient model scaling and elastic inference in LLMs, Diffusion Models, CNNs, and any other neural networks! 10/10
English
0
0
0
206
Xu Tan
Xu Tan@xutan_tx·
4) Elastic inference: use varying chains for varying capacity and latency in inference. Due to resource limit, authors cannot scale up model any further. Call for interests to try CoLM in larger scale to unleash its potential in efficient scaling and elastic inference. 9/n
English
1
0
1
214
Xu Tan retweetledi
Ruibin Yuan
Ruibin Yuan@abc43992899·
1/n: 🚀 Announcing YuE (乐) – the most powerful open-source full-song music generation model! 🎵 Tackle the lyrics-to-song task (like Suno.ai) with support for diverse genres, stunning vocals, & multiple languages. Bonus? It’s Hugging Face & LLAMA-compatible for easy fine-tuning. 🛠️ Code: github.com/multimodal-art… Demo: map-yue.github.io
English
13
149
590
90.1K
Xu Tan retweetledi
Ge Zhang
Ge Zhang@GeZhang86038849·
🚀 Excited to announce that the tech report of MAP-Neo (map-neo.github.io): a fully open-source and transparent bilingual LLM suite with superior performance to bridge the gap with closed-source models, is now available: arxiv.org/pdf/2405.19327 🔧MAP-Neo's workflow encapsulates the entire process of building an LLM from data curation, model training, to evaluation and we provide the *FULL details* for every single stage, including access to its data, training code, models, and evaluation protocol 👇 [Tweet 1]
English
2
9
28
6.7K
Xu Tan
Xu Tan@xutan_tx·
@Nor072635987 If someone has interests in studying mel-spectrogram or other representations and architectures, we can have discussions.
English
0
0
0
188
Xu Tan
Xu Tan@xutan_tx·
1/n Scaling law is the key to LLMs. How about scaling law for multimodality (e.g., audio, visual)? We plot some speech synthesis/recognition models and speech scaling law. Seems most synthesis models are OVER-parameterized compared to the compute-optimal model/data allocation.
Xu Tan tweet media
English
4
17
124
19.5K
Xu Tan
Xu Tan@xutan_tx·
@ch3njus Currently, there is no paper available, maybe paper in the future or just slides.
English
0
0
1
131
Xu Tan
Xu Tan@xutan_tx·
7/n We also collect scaling law coefficients for multimodality (text/speech/image/molecules/code) and get the optimal model/data allocation. Although most modalities show similar scaling strength on model and data (a~=b~=0.5), there are clear differences... TO BE CONTINUED...
Xu Tan tweet media
English
1
0
6
1.1K