Xu Tan
159 posts

Xu Tan
@xutan_tx
Working on Large Language Models and Video/Audio Multimodality




Microsoft presents Chain-of-Model Learning for Language Model

Announcing 🎙️ Kimi-Audio! Our new open-source audio foundation model advances capabilities in audio understanding, generation, and conversation. Key Features & Achievements: ✅ Universal audio foundation model handles diverse tasks like speech recognition, audio understanding, audio-to-text chat, speech-to-speech conversation. ✅ Large-scale pre-training on >13 Million hours of diverse audio data (speech, music, sounds). ✅ Unique 12.5Hz tokenizer & hybrid architecture for rich perception and efficient generation. ✅ SOTA on 10+ audio benchmarks: excels in Speech Recognition (LibriSpeech 1.28/2.42 WER), Audio Understanding (MMAU, VocalSound), and Conversation (VoiceBench). We're also releasing our comprehensive evaluation toolkit to foster fair benchmarking! 🛠️ 📄Dive into the details in our Technical Report: github.com/MoonshotAI/Kim… 🌟Explore the Code, Models & Eval Toolkit on GitHub: github.com/MoonshotAI/Kim… HuggingFace: huggingface.co/moonshotai/Kim… Excited to see the innovative audio applications the community will build!



Excited to share our latest research paper! 📄📷 In this study, we explore Scaling Law in Domain-specific Continual Pre-training scenarios. Our findings reveal the relationship between model performance and mixture ratios. Check it out here: #LLM" target="_blank" rel="nofollow noopener">arxiv.org/abs/2406.01375…
#ScalingLaw





