OpenMOSS

26 posts

OpenMOSS

OpenMOSS

@Open_MOSS

OpenMOSS is an open research community aimed at building artificial general intelligence. Discord 👇 https://t.co/FLvN5uX8wc

Se unió Ocak 2025
23 Siguiendo180 Seguidores
Tweet fijado
OpenMOSS
OpenMOSS@Open_MOSS·
🚀 The MOSS-TTS Family is here. From zero-shot cloning to real-time VoiceAgents, we have released our most powerful suite of audio models yet. The Lineup: MOSS-TTS Flagship: The industry's best zero-shot voice cloning. Features precise control over duration & Pinyin, capable of generating 1 hour of speech. MOSS-TTSD-v1.0: A new standard for dialogue generation. Comprehensive optimization for conversational scenes and small languages. Best-in-class performance in all evaluations. MOSS-VoiceGenerator: One-shot timbre generation. Create voices with a single sentence and complex instruction handling. MOSS-TTS-Realtime: Built for the next era of VoiceAgents. Synthesis starts in just 2 characters for instant response. MOSS-SoundEffect: Text-to-Audio sound effects to expand your creative toolkit. 🔥 Try it now: studio.mosi.cn/voice-synthesis 💻 Deploy (GitHub): github.com/OpenMOSS/MOSS-… 🔌 API Docs: studio.mosi.cn/docs/moss-tts Welcome to our demo. The era of 'childhood' for TTS is over. #MOSS #AI #TextToSpeech #TTS #OpenClaw #Agent #OpenMOSS #Opensource #VoiceAgent
English
7
5
21
1.6K
OpenMOSS
OpenMOSS@Open_MOSS·
Our bench can also test image edit models! It's a truly unified multimodal generative reasoning benchmark testing video models, image edit models and VLMs. Results on mini test set: (6/6)
OpenMOSS tweet media
English
0
0
0
109
OpenMOSS
OpenMOSS@Open_MOSS·
What about text-heavy logic? Sora-2 takes a prompt + image, and generates a video "writing" the step-by-step solution. It even reads the answer via audio! 🔊 Staggering results: 🎯 MATH: 92% 🎯 MMMU: 69.2% (5/6)
English
0
0
0
96
OpenMOSS
OpenMOSS@Open_MOSS·
Sora-2 solves complex visual puzzles (color filling, shape drawing) by understanding symmetry, gradients, and composition. On Visual-Shape tasks, Sora-2's inductive reasoning actually matches Claude 3.5 Sonnet! 🎨🧩 (4/6)
OpenMOSS tweet media
English
0
0
0
94
OpenMOSS
OpenMOSS@Open_MOSS·
We introduce VideoThinkBench to test this. On "Eyeballing Puzzles", Sora-2 reasons by simulating light reflection and manipulating geometry. Result? It outperforms SOTA VLMs and scores 10% higher than GPT-5! 📈🧩 All code and data are open-sourced: github.com/tongjingqi/Thi… (3/6)
OpenMOSS tweet media
English
0
0
0
104
OpenMOSS
OpenMOSS@Open_MOSS·
Current LLM/VLM paradigms ("Thinking with Text/Images") have limits: static images lack dynamics, and split modalities hinder understanding. Our fix: Thinking with Video. Video frames as a unified medium to draw/write reasoning steps! ✍️🎥 Project: thinking-with-video.github.io (2/6)
OpenMOSS tweet media
English
0
0
0
122
OpenMOSS
OpenMOSS@Open_MOSS·
@Misternab Yeah it supports French. We will launch huggingface space demo recently. We will support email signup ASAP.
English
0
0
1
108
Nabil Garbi
Nabil Garbi@Misternab·
@Open_MOSS Does it support French ? i tried to signup for use by API but i dont find signup button.
English
1
0
0
66
OpenMOSS
OpenMOSS@Open_MOSS·
🚀 The MOSS-TTS Family is here. From zero-shot cloning to real-time VoiceAgents, we have released our most powerful suite of audio models yet. The Lineup: MOSS-TTS Flagship: The industry's best zero-shot voice cloning. Features precise control over duration & Pinyin, capable of generating 1 hour of speech. MOSS-TTSD-v1.0: A new standard for dialogue generation. Comprehensive optimization for conversational scenes and small languages. Best-in-class performance in all evaluations. MOSS-VoiceGenerator: One-shot timbre generation. Create voices with a single sentence and complex instruction handling. MOSS-TTS-Realtime: Built for the next era of VoiceAgents. Synthesis starts in just 2 characters for instant response. MOSS-SoundEffect: Text-to-Audio sound effects to expand your creative toolkit. 🔥 Try it now: studio.mosi.cn/voice-synthesis 💻 Deploy (GitHub): github.com/OpenMOSS/MOSS-… 🔌 API Docs: studio.mosi.cn/docs/moss-tts Welcome to our demo. The era of 'childhood' for TTS is over. #MOSS #AI #TextToSpeech #TTS #OpenClaw #Agent #OpenMOSS #Opensource #VoiceAgent
English
7
5
21
1.6K
Love TTS
Love TTS@mohamed17381489·
@Open_MOSS does it support arabic ? any plans to support the language?
English
1
0
0
71
OpenMOSS
OpenMOSS@Open_MOSS·
MOSS-TTSD-v1.0 is a brand-new conversational speech generation model, which also supports ultra-long sequences and multilingual synthesis.
English
0
0
0
206
OpenMOSS
OpenMOSS@Open_MOSS·
MOSS-TTS is our flagship model, trained on millions of hours of high-quality multilingual speech data. It supports a wide range of languages, including Chinese, English, French, Spanish, German, Portuguese, Japanese, and Korean. The model features fine-grained duration and phoneme control, as well as the generation of ultra-long speech up to one hour.
English
0
1
2
352
OpenMOSS
OpenMOSS@Open_MOSS·
Huge shoutout to the SGLang community @lmsysorg for their incredible support! 🚀 We are thrilled to announce that MOVA features Day-0 support for SGLang-Diffusion, ensuring high-performance inference right out of the gate.
English
0
0
9
358
OpenMOSS
OpenMOSS@Open_MOSS·
Sora 2? Closed. Veo 3? Closed. Kling? Closed. 🚫 MOVA? Open. ✅ We’re thrilled to release MOVA (MOSS-Video-and-Audio), a powerhouse foundation model designed for high-fidelity, synchronized video-audio synthesis. ✨ The Magic: Traditional Video model generates sound as an afterthought. MOVA synthesizes sight and sound simultaneously via bidirectional cross-attention. The result? Audio that doesn't just match—it belongs. 18B Active Params (MoE Architecture, 32B in total.) LoRA Support for fine-tuning Production-ready generation pipelines The era of "hollow" AI video is gone. Long live MOVA. 🚀 Star the repo: github.com/OpenMOSS/MOVA #MOVA #SORA2 #Veo3 #OpenSourceAI #VideoGeneration #AI
English
7
4
34
5.1K
OpenMOSS
OpenMOSS@Open_MOSS·
MOVA achieves state-of-the-art (SOTA) performance among open-source models in both human subjective arena evaluations and objective metrics such as lip-sync and audio-visual synchronization, rivaling the capabilities of proprietary closed-source models.
OpenMOSS tweet mediaOpenMOSS tweet media
English
0
0
6
682