Yang Gao

104 posts

Yang Gao

@YangGao07

Researcher @Open_MOSS | ex-Shanghai AI Lab | Dev @OpenMMLab #MMRazor | Member @intern_lm | Building open-source LLM & ML systems

Katılım Ağustos 2016

199 Takip Edilen34 Takipçiler

Yang Gao@YangGao07·1d

@GoJun315 欢迎试用我们开源的100M TTS模型，Mac本地也可以跑，中文强支持：github.com/OpenMOSS/MOSS-…

中文

1.4K

高军@GoJun315·1d

本地跑的开源 TTS，把 ElevenLabs 干掉了。 Supertonic，完全跑在本地的语音合成模型，不联网、零 API 费用。 - 仅 99M 参数，M4 Pro 上比实时快 167 倍，树莓派也能跑 - 支持 31 种语言，覆盖英、中、日、韩、等主流语种 - 金额、电话号码、技术单位这类主流云服务经常读错的内容，它全部读对 - 浏览器插件把任意网页转成语音，不到 1 秒完成 GitHub：github.com/supertone-inc/… 支持 Python、Node.js、Rust、Go 等 11 种运行时，一行命令装好。需要给 AI 产品加语音能力、又不想被云服务 API 费用卡脖子的朋友，可以看看这个。

中文

115

564

47.3K

Yang Gao retweetledi

clem 🤗@ClementDelangue·2d

As President Trump meets President Xi this week, a call to the American AI community: If your startup, lab, non-profit or company benefits from open international AI - especially Chinese (Deepseek, Qwen, Kimi, GLM,…), please share! Open source is the most important driver of competition, jobs and wealth creation in AI today. Let’s support and promote it at critical times like this week!

English

541

70K

Yang Gao@YangGao07·6 May

@Tu7uruu Thanks for the reply! On it ✍️

English

steven@Tu7uruu·6 May

@YangGao07 hey! would love to add them to the leaderboad! feel free to open a pr here: github.com/huggingface/op…

English

292

steven@Tu7uruu·6 May

Big announcement for speech AI Benchmarks get gamed. So we added a repellent. The Open ASR Leaderboard now includes private evaluation data from Appen and DataoceanAI, making speech recognition benchmarks more robust against test-set contamination and “benchmaxxing.” Better signal. Less overfitting. More real-world ASR.

English

114

11.8K

Yang Gao@YangGao07·24 Nis

@OrganicGPT @iScienceLuvr Otherwise, AGI will become a monopoly tool in the hands of capitalist giants. The Codex vs. Claude Code race over the past few days makes that pretty clear.

English

Yang Gao@YangGao07·24 Nis

@OrganicGPT @iScienceLuvr This reflects the company’s values, and it is a value shared by many frontier labs in China. I think most people would agree that an open-source ecosystem is the only viable path for humanity to reach AGI together.

English

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·24 Nis

My quick paper summary: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) Two new compressed attention mechanisms for long context manifold hyper connections Muon training 32T tokens FP4 Quantization-Aware Training Post-training pipeline: SFT+RL for specialized experts, on-policy distillation for consolidated model So many infra details!!!!

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

DEEPSEEK-V4 IS RELEASED

English

357

32.4K

Yang Gao retweetledi

MOSI@MosiAI_Official·24 Nis

Meet MOSS-Audio. A unified open-source model for real-world audio understanding, built to handle speech, emotion, speakers, sound events, music, temporal grounding, and reasoning in one system. In the reported evaluation, our 4B model outperforms many 7B–9B open models, and MOSS-Audio-8B-Thinking reaches 71.08 average accuracy. Strong results. Real-world audio. GitHub: github.com/OpenMOSS/MOSS-… HuggingFace: huggingface.co/collections/Op… MOSI.AI: mosi.cn OpenMOSS: open-moss.com

English

347

16.5K

Yang Gao retweetledi

Adina Yakup@AdinaYakup·16 Nis

MOSS-TTS-Nano 🔥 Open TTS that actually run locally from @Open_MOSS Model: huggingface.co/OpenMOSS-Team/… Demo: huggingface.co/spaces/OpenMOS… ✨ 0.1B - Apache 2.0 ✨ 48kHz stereo ✨ Supports 20 languages

English

126

7.6K

Yang Gao@YangGao07·16 Nis

@PandaTalk8 可以尝试下MOSS-TTS和MOSS-TTS-Nano：github.com/OpenMOSS/MOSS-…

中文

367

Mr Panda@PandaTalk8·15 Nis

求推荐哪家tts 最具性价比，开源或闭源、付费或免费都可以？性价比只的是既真实好用价格又跟不要钱一样。

中文

111

34.7K

Yang Gao retweetledi

OpenMOSS@Open_MOSS·14 Nis

Welcome to try MOSS-TTS-Nano! github.com/OpenMOSS/MOSS-… huggingface.co/OpenMOSS-Team/…

ModelScope@ModelScope2022

Say hello to MOSS-TTS-Nano 🚀 0.1B multilingual TTS from MOSI.AI and OpenMOSS. Designed for realtime speech generation without a GPU. Runs directly on CPU, keeping the deployment stack simple enough for local demos, web serving, and lightweight product integration. Part of the MOSS-TTS family alongside the 1.7B and 8B flagship models. 🤖 modelscope.cn/models/openmos… 🌍 modelscope.ai/models/openmos… 💻 github.com/OpenMOSS/MOSS-…

English

511

Yang Gao@YangGao07·14 Nis

@AYi_AInotes 也欢迎体验下MOSS-TTS哈：github.com/OpenMOSS/MOSS-…

中文

AYi@AYi_AInotes·12 Nis

holy shit，刷到一个开源语音生成项目，中英文语音效果真的吊炸，有点行业掀桌子的意思🚀 ElevenLabs 个人版每月 $5 到 $99 ，商业版更是开到 $1320一个月，现在这个完全开源、免费本地运行的模型 VoxCPM2，在部分语音相似度基准上表现我想用惊人来形容🤩 根据公开的 Minimax-MLS 语音相似度测试： • 英语：VoxCPM2 得分 85.4%，ElevenLabs 为 61.3% • 中文：VoxCPM2 得分 82.5%，ElevenLabs 为 67.7% • 阿拉伯语：VoxCPM2 得分 79.1%，ElevenLabs 为 70.6% 支持 30 种语言、48kHz 工作室级输出，可通过短音频片段进行语音克隆，也支持纯文本描述生成新声音（Voice Design），并能在本地 GPU 上运行（最低约 8GB VRAM）。项目由 OpenBMB 与清华大学相关团队开发，采用 Apache 2.0 许可，可免费用于商业用途。GitHub 已登上 Trending。你只要给它一小段录音，它就能把这个人的声音完整复刻下来。口音、情绪、语速、呼吸的节奏，分毫不差。输出是 48kHz 的录音棚品质，普通人根本听不出是 AI 生成的。更离谱的是，你甚至可以用文字直接创造声音。写一句“二十多岁的女生，声音温柔带点沙哑”，它就能从无到有生成一个完全符合描述的声音。不用找参考音频，不用请配音演员，连麦克风都不用开。它的能力有多夸张呢： • 文字造声：描述性别、年龄、音色、情绪，AI 直接生成 • 完美克隆：上传 1 分钟音频，复刻所有声音细节 • 可控克隆：克隆后还能改情绪，比如“语速加快，带着一点兴奋” • 极致还原：给音频加对应文本，连语气的细微起伏都能复现 • 30 种语言自动识别，不用手动标注 • 上下文感知，读新闻像主播，讲故事像说书人 • RTX 4090 上生成速度比播放还快 • 8G 显存就能跑，普通游戏本都能用 • 用 5-10 分钟自己的声音微调，就能得到专属声音模型最震撼的是第三方基准测试的结果：英语相似度 VoxCPM2 85.4% vs ElevenLabs 61.3% 中文相似度 VoxCPM2 82.5% vs ElevenLabs 67.7% 阿拉伯语相似度 VoxCPM2 79.1% vs ElevenLabs 70.6% 一个免费开源项目，在核心指标上全面碾压了行业标杆。专业配音一个项目 250 到 1000 美元 AI 语音平台每月 5 到 100 美元录音棚每小时 200 美元 VoxCPM2 完全运行在你本地电脑上。没有 API 费用，没有按字符计费，没有订阅。永远免费，还能商用。一行命令就能安装：pip install voxcpm 以下中文实测视频是@emwstudio 老哥提供，供大家参考，作为新开源模型，在情绪控制稳定性、长文本一致性、企业级可靠性等方面可能仍有提升空间，实际使用体验建议大家亲自测试， GitHub地址和在线 Demo老规矩评论区自取 👇

中文

123

636

48.4K

Yang Gao@YangGao07·8 Nis

@ArtificialAnlys Obviously, Seedance2 remains the best.

English

Artificial Analysis@ArtificialAnlys·7 Nis

Example generations from HappyHorse-1.0 compared to Dreamina Seedance 2.0, Kling 3.0 Pro, grok-video-imagine and PixVerse V6 (Text to Video with Audio): Prompt [1/4]: A hula hoop spinning on a kid's waist, gradually climbing to their chest, then dropping to knees, then clattering to the floor. They pick it up to try again.

English

116

36.5K

Artificial Analysis@ArtificialAnlys·7 Nis

We’ve added a new pseudonymous video model to our Text to Video and Image to Video Arenas.‘HappyHorse-1.0’ is currently landing in the #1 spot for Text and Image to Video (No Audio) and the #2 spot for Text and Image to Video (With Audio). Further details coming soon. Example generations below from HappyHorse-1.0 in the Artificial Analysis Video Arena 🚀

English

477

349.4K

Yang Gao retweetledi

青龍聖者@bdsqlsz·22 Mar

Mova，video and audio sync based wan 2.2 i2v a14B

青龍聖者@bdsqlsz

Participating in an offline meeting

English

185

26.5K

Yang Gao@YangGao07·15 Mar

@Prince_Canuma The MLX community is amazing👏. We’re getting MOSS-TTS and MOVA ready for contribution, and we’d love to get them into MLX soon🙌

English

Prince Canuma@Prince_Canuma·13 Mar

MLX-Video updates coming soon 🚀 Took a bit longer to debug LTX-2 dev model and 2.3 but we are almost there! Note: PR is tbd, so bear with me :) github.com/Blaizzy/mlx-vi…

English

2.5K

Yang Gao@YangGao07·15 Mar

@alphacep Thanks for sharing the benchmark results. We will continue improving MOSS-TTS🙏

English

AlphaCephei@alphacep·14 Mar

We systematically test modern TTS engines on Russian dataset. Qwen feels most interesting one. Good clarity and sound quality, reasonable intonation. Issues with pronunciation as always, it is a common thing. VibeVoice hallucinates. Fish is reasonable but a bit plain.

English

3.2K

Yang Gao@YangGao07·13 Mar

@ZhouyunhuaB Impressive work👏👏👏

English

zhouyunhua-Seek@ZhouyunhuaB·11 Mar

emmm, I think this is big news. We believe we may have found the best optimizer available today😜, and we welcome you to follow our work.

English

124

Yang Gao@YangGao07·11 Mar

@fractal_friend @DLKFZWilliam2 I see. We will focus on this part.

English

Fractal Friend@fractal_friend·11 Mar

@YangGao07 @DLKFZWilliam2 I’ve tried some and it’s not quite there yet, at least the ones I tried

English

Yang Gao@YangGao07·11 Mar

@fractal_friend @DLKFZWilliam2 Good question. Thanks for pointing this out :). We will work on solutions in upcoming releases. For now, the temporary workaround is to use denoising models to standardize them.

English

Fractal Friend@fractal_friend·11 Mar

@YangGao07 @DLKFZWilliam2 Have you encountered a solution to this?

English

Keşfet

@GoJun315 @Tu7uruu @OrganicGPT @iScienceLuvr @Open_MOSS @PandaTalk8 @AYi_AInotes @emwstudio