Xu Tan

159 posts

Xu Tan

@xutan_tx

Working on Large Language Models and Video/Audio Multimodality

Beijing Katılım Ocak 2022

618 Takip Edilen1.5K Takipçiler

Xu Tan retweetledi

AK@_akhaliq·20 May

Microsoft presents Chain-of-Model Learning for Language Model

English

189

29.3K

Xu Tan retweetledi

Kaitao Song@SongKaitao·20 May

Todays, I want to introduce my latest research - "Chain-of-Model Learning for Language Model (arxiv.org/abs/2505.11820)", which presents a learning paradigm named Chain-of-Model (CoM) and devise Chain-of-Language-Model (CoLM), to unlock new functionalities of foundation models.

English

454

Xu Tan@xutan_tx·20 May

Congrats to the leading authors @SongKaitao @WangXiaohua and all contributors! Thanks @_akhaliq for posting this work! Hope CoLM can become a new paradigm for efficient model scaling and elastic inference in LLMs, Diffusion Models, CNNs, and any other neural networks! 10/10

English

206

Xu Tan@xutan_tx·20 May

4) Elastic inference: use varying chains for varying capacity and latency in inference. Due to resource limit, authors cannot scale up model any further. Call for interests to try CoLM in larger scale to unleash its potential in efficient scaling and elastic inference. 9/n

English

214

Xu Tan@xutan_tx·20 May

🔥 Chain-of-Model, a novel learning paradigm using the concept of chain in hidden dimension level (differ from Chain-of-Thought): 1) progressively add new chain for efficient model scaling; 2) offer varying model sizes (chains) for elastic inference. arxiv.org/pdf/2505.11820 1/n

AK@_akhaliq

Microsoft presents Chain-of-Model Learning for Language Model

English

936

Xu Tan@xutan_tx·28 Nis

🔥Kimi-Audio, a universal audio foundation model pre-trained on 13+ million hours of audio data and achieving SOTA performance on 10+ audio benchmarks. Tech Report: arxiv.org/abs/2504.18425 Model & Code & Evalkit: github.com/MoonshotAI/Kim… Congrats to the excellent team!

Kimi.ai@Kimi_Moonshot

Announcing 🎙️ Kimi-Audio! Our new open-source audio foundation model advances capabilities in audio understanding, generation, and conversation. Key Features & Achievements: ✅ Universal audio foundation model handles diverse tasks like speech recognition, audio understanding, audio-to-text chat, speech-to-speech conversation. ✅ Large-scale pre-training on >13 Million hours of diverse audio data (speech, music, sounds). ✅ Unique 12.5Hz tokenizer & hybrid architecture for rich perception and efficient generation. ✅ SOTA on 10+ audio benchmarks: excels in Speech Recognition (LibriSpeech 1.28/2.42 WER), Audio Understanding (MMAU, VocalSound), and Conversation (VoiceBench). We're also releasing our comprehensive evaluation toolkit to foster fair benchmarking! 🛠️ 📄Dive into the details in our Technical Report: github.com/MoonshotAI/Kim… 🌟Explore the Code, Models & Eval Toolkit on GitHub: github.com/MoonshotAI/Kim… HuggingFace: huggingface.co/moonshotai/Kim… Excited to see the innovative audio applications the community will build!

English

7.2K

Xu Tan retweetledi

Ruibin Yuan@abc43992899·27 Oca

1/n: 🚀 Announcing YuE (乐) – the most powerful open-source full-song music generation model! 🎵 Tackle the lyrics-to-song task (like Suno.ai) with support for diverse genres, stunning vocals, & multiple languages. Bonus? It’s Hugging Face & LLAMA-compatible for easy fine-tuning. 🛠️ Code: github.com/multimodal-art… Demo: map-yue.github.io

English

149

590

90.1K

Xu Tan@xutan_tx·9 Ara

@heiga_zen @KeiichiTokuda Congrats, Heiga!

English

170

Heiga Zen (全炳河)@heiga_zen·7 Ara

Honored to be elevated to #IEEE Fellow! Huge thanks to my colleagues & collaborators throughout my career, especially Prof. @KeiichiTokuda. Grateful to #Google & Google #DeepMind for providing the opportunity to work on impactful real-world problems. ieee.org/content/dam/ie…

English

161

13.3K

Xu Tan@xutan_tx·11 Haz

Scaling law for domain-specific continual pre-training scenarios.

Quehry Que@QuehryS

Excited to share our latest research paper! 📄📷 In this study, we explore Scaling Law in Domain-specific Continual Pre-training scenarios. Our findings reveal the relationship between model performance and mixture ratios. Check it out here: #LLM" target="_blank" rel="nofollow noopener">arxiv.org/abs/2406.01375… #ScalingLaw

English

1.7K

Xu Tan retweetledi

Ge Zhang@GeZhang86038849·30 May

🚀 Excited to announce that the tech report of MAP-Neo (map-neo.github.io): a fully open-source and transparent bilingual LLM suite with superior performance to bridge the gap with closed-source models, is now available: arxiv.org/pdf/2405.19327 🔧MAP-Neo's workflow encapsulates the entire process of building an LLM from data curation, model training, to evaluation and we provide the *FULL details* for every single stage, including access to its data, training code, models, and evaluation protocol 👇 [Tweet 1]

English

6.7K

Xu Tan@xutan_tx·26 Nis

@Nor072635987 If someone has interests in studying mel-spectrogram or other representations and architectures, we can have discussions.

English

188

Xu Tan@xutan_tx·24 Nis

1/n Scaling law is the key to LLMs. How about scaling law for multimodality (e.g., audio, visual)? We plot some speech synthesis/recognition models and speech scaling law. Seems most synthesis models are OVER-parameterized compared to the compute-optimal model/data allocation.

English

124

19.5K

Xu Tan@xutan_tx·26 Nis

@ch3njus Currently, there is no paper available, maybe paper in the future or just slides.

English

131

Justin Chen@ch3njus·24 Nis

@xutan_tx Can you share a link to the paper?

English

330

Xu Tan@xutan_tx·24 Nis

Thanks Zhi Tian, @nicolaus625, @abc43992899, @GeZhang86038849, Zili Wang for the discussions.

English

874

Xu Tan@xutan_tx·24 Nis

7/n We also collect scaling law coefficients for multimodality (text/speech/image/molecules/code) and get the optimal model/data allocation. Although most modalities show similar scaling strength on model and data (a~=b~=0.5), there are clear differences... TO BE CONTINUED...

English

1.1K

Keşfet

@SongKaitao @_akhaliq @heiga_zen @KeiichiTokuda @ch3njus @nicolaus625 @abc43992899 @GeZhang86038849