Xipeng Qiu

39 posts

Xipeng Qiu

Xipeng Qiu

@xpqiu

Natural Language Processing Machine Learning

Shanghai Katılım Nisan 2013
152 Takip Edilen454 Takipçiler
Xipeng Qiu retweetledi
DailyPapers
DailyPapers@HuggingPapers·
MOSS-Audio-Tokenizer A 1.6B parameter pure Transformer audio tokenizer trained end-to-end on 3M hours of audio. Scales gracefully across speech, sound, and music while enabling the first purely autoregressive TTS to surpass non-autoregressive systems.
DailyPapers tweet media
English
2
16
124
5.7K
Xipeng Qiu retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
Yitian Gong, Kuangwei Chen, Zhaoye Fei, Xiaogui Yang, Ke Chen, Yang Wang, Kexin Huang, Mingshu Chen, Ruixiao Li, Qingyuan Cheng, Shimin Li, Xipeng Qiu, "MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models," arxiv.org/abs/2602.10934
Filipino
0
5
28
1.5K
Xipeng Qiu retweetledi
Wildminder
Wildminder@wildmindai·
WOW! New vid model - MOSS-Video-and-Audio: - native bimodal gen, IT2VA,T2VA; - 32B MoE for sync video & audio in one pass, - SOTA multilingual lip-sync + Sound FX; - 360p/720p, with code, weights & LoRA. Beyond words. Seriously cool. mosi.cn/models/mova
English
6
25
249
15.5K
Xipeng Qiu retweetledi
Code_x
Code_x@ifree_news·
MOVA (MOSS Video and Audio), a foundation model designed to synthesizes video and audio simultaneously github.com/OpenMOSS/MOVA
English
0
1
1
64
Xipeng Qiu retweetledi
Wildminder
Wildminder@wildmindai·
Hot! We have a new strong voice model. MOSS-TTS - a production-ready flagship 8B TTS; - high-fidelity zero-shot voice cloning, stable long-form gen; - multilingual; - lossless reconstruction; fine-grained pronunciation control; - token-level duration control, - voice creator, sound effects. Outstanding quality. mosi.cn/models/moss-tts
Wildminder tweet media
English
1
34
241
12.2K
Xipeng Qiu retweetledi
OpenMOSS
OpenMOSS@Open_MOSS·
🚀 The MOSS-TTS Family is here. From zero-shot cloning to real-time VoiceAgents, we have released our most powerful suite of audio models yet. The Lineup: MOSS-TTS Flagship: The industry's best zero-shot voice cloning. Features precise control over duration & Pinyin, capable of generating 1 hour of speech. MOSS-TTSD-v1.0: A new standard for dialogue generation. Comprehensive optimization for conversational scenes and small languages. Best-in-class performance in all evaluations. MOSS-VoiceGenerator: One-shot timbre generation. Create voices with a single sentence and complex instruction handling. MOSS-TTS-Realtime: Built for the next era of VoiceAgents. Synthesis starts in just 2 characters for instant response. MOSS-SoundEffect: Text-to-Audio sound effects to expand your creative toolkit. 🔥 Try it now: studio.mosi.cn/voice-synthesis 💻 Deploy (GitHub): github.com/OpenMOSS/MOSS-… 🔌 API Docs: studio.mosi.cn/docs/moss-tts Welcome to our demo. The era of 'childhood' for TTS is over. #MOSS #AI #TextToSpeech #TTS #OpenClaw #Agent #OpenMOSS #Opensource #VoiceAgent
English
8
5
22
1.7K
Xipeng Qiu retweetledi
Hugging Models
Hugging Models@HuggingModels·
Ever wanted to turn text into natural-sounding speech with just a few lines of code? Meet MOSS-TTSD-v1.0, a text-to-speech model that's making voice synthesis more accessible. It's a community favorite for its simplicity and quality.
Hugging Models tweet media
English
1
1
14
884
Xipeng Qiu
Xipeng Qiu@xpqiu·
We introduce MOVA, a foundation model designed to break the "silent era" of open-source video generation. Unlike cascaded pipelines that generate sound as an afterthought, MOVA synthesizes video and audio simultaneously for perfect alignment. github.com/OpenMOSS/MOVA
English
0
2
9
550
Xipeng Qiu
Xipeng Qiu@xpqiu·
NEX is a project incubated by Shanghai Innovation Institute (nex.sii.edu.cn) , jointly with many entrepreneurial partners. The project is building a sustainable closed-loop open ecosystem that powers industry upgrades and truly ushers in the AI agency era.
Tiezhen WANG@Xianbao_QIAN

Welcome Nex-N1, a new series of agentic foundational models, to @huggingface - available in different sizes from 8B, 30B, 32B to 671B - strong in tool-use, web-search and real-world agentic workflow - some SFT dataset has been open sourced Technical report come up soon!

English
0
5
21
3K
Tiezhen WANG
Tiezhen WANG@Xianbao_QIAN·
Welcome Nex-N1, a new series of agentic foundational models, to @huggingface - available in different sizes from 8B, 30B, 32B to 671B - strong in tool-use, web-search and real-world agentic workflow - some SFT dataset has been open sourced Technical report come up soon!
Tiezhen WANG tweet media
English
19
72
459
81.9K