Yang Gao

104 posts

Yang Gao banner
Yang Gao

Yang Gao

@YangGao07

Researcher @Open_MOSS | ex-Shanghai AI Lab | Dev @OpenMMLab #MMRazor | Member @intern_lm | Building open-source LLM & ML systems

Katılım Ağustos 2016
199 Takip Edilen34 Takipçiler
高军
高军@GoJun315·
本地跑的开源 TTS,把 ElevenLabs 干掉了。 Supertonic,完全跑在本地的语音合成模型,不联网、零 API 费用。 - 仅 99M 参数,M4 Pro 上比实时快 167 倍,树莓派也能跑 - 支持 31 种语言,覆盖英、中、日、韩、等主流语种 - 金额、电话号码、技术单位这类主流云服务经常读错的内容,它全部读对 - 浏览器插件把任意网页转成语音,不到 1 秒完成 GitHub:github.com/supertone-inc/… 支持 Python、Node.js、Rust、Go 等 11 种运行时,一行命令装好。 需要给 AI 产品加语音能力、又不想被云服务 API 费用卡脖子的朋友,可以看看这个。
高军 tweet media
中文
15
115
564
47.3K
Yang Gao retweetledi
clem 🤗
clem 🤗@ClementDelangue·
As President Trump meets President Xi this week, a call to the American AI community: If your startup, lab, non-profit or company benefits from open international AI - especially Chinese (Deepseek, Qwen, Kimi, GLM,…), please share! Open source is the most important driver of competition, jobs and wealth creation in AI today. Let’s support and promote it at critical times like this week!
English
33
71
541
70K
Yang Gao
Yang Gao@YangGao07·
@Tu7uruu Thanks for the reply! On it ✍️
English
0
0
1
30
steven
steven@Tu7uruu·
Big announcement for speech AI Benchmarks get gamed. So we added a repellent. The Open ASR Leaderboard now includes private evaluation data from Appen and DataoceanAI, making speech recognition benchmarks more robust against test-set contamination and “benchmaxxing.” Better signal. Less overfitting. More real-world ASR.
steven tweet media
English
7
17
114
11.8K
Yang Gao
Yang Gao@YangGao07·
@OrganicGPT @iScienceLuvr Otherwise, AGI will become a monopoly tool in the hands of capitalist giants. The Codex vs. Claude Code race over the past few days makes that pretty clear.
English
0
0
0
82
Yang Gao
Yang Gao@YangGao07·
@OrganicGPT @iScienceLuvr This reflects the company’s values, and it is a value shared by many frontier labs in China. I think most people would agree that an open-source ecosystem is the only viable path for humanity to reach AGI together.
English
1
0
0
50
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
My quick paper summary: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) Two new compressed attention mechanisms for long context manifold hyper connections Muon training 32T tokens FP4 Quantization-Aware Training Post-training pipeline: SFT+RL for specialized experts, on-policy distillation for consolidated model So many infra details!!!!
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

DEEPSEEK-V4 IS RELEASED

English
9
42
357
32.4K
Yang Gao retweetledi
MOSI
MOSI@MosiAI_Official·
Meet MOSS-Audio. A unified open-source model for real-world audio understanding, built to handle speech, emotion, speakers, sound events, music, temporal grounding, and reasoning in one system. In the reported evaluation, our 4B model outperforms many 7B–9B open models, and MOSS-Audio-8B-Thinking reaches 71.08 average accuracy. Strong results. Real-world audio. GitHub: github.com/OpenMOSS/MOSS-… HuggingFace: huggingface.co/collections/Op… MOSI.AI: mosi.cn OpenMOSS: open-moss.com
MOSI tweet media
English
9
49
347
16.5K
Mr Panda
Mr Panda@PandaTalk8·
求推荐哪家tts 最具性价比,开源或闭源、付费或免费都可以? 性价比只的是既真实好用价格又跟不要钱一样。
中文
50
12
111
34.7K
Yang Gao retweetledi
OpenMOSS
OpenMOSS@Open_MOSS·
ModelScope@ModelScope2022

Say hello to MOSS-TTS-Nano 🚀 0.1B multilingual TTS from MOSI.AI and OpenMOSS. Designed for realtime speech generation without a GPU. Runs directly on CPU, keeping the deployment stack simple enough for local demos, web serving, and lightweight product integration. Part of the MOSS-TTS family alongside the 1.7B and 8B flagship models. 🤖 modelscope.cn/models/openmos… 🌍 modelscope.ai/models/openmos… 💻 github.com/OpenMOSS/MOSS-…

English
0
4
7
511
AYi
AYi@AYi_AInotes·
holy shit,刷到一个开源语音生成项目,中英文语音效果真的吊炸,有点行业掀桌子的意思🚀 ElevenLabs 个人版每月 $5 到 $99 ,商业版更是开到 $1320一个月,现在这个完全开源、免费本地运行的模型 VoxCPM2,在部分语音相似度基准上表现我想用惊人来形容🤩 根据公开的 Minimax-MLS 语音相似度测试: • 英语:VoxCPM2 得分 85.4%,ElevenLabs 为 61.3% • 中文:VoxCPM2 得分 82.5%,ElevenLabs 为 67.7% • 阿拉伯语:VoxCPM2 得分 79.1%,ElevenLabs 为 70.6% 支持 30 种语言、48kHz 工作室级输出,可通过短音频片段进行语音克隆,也支持纯文本描述生成新声音(Voice Design),并能在本地 GPU 上运行(最低约 8GB VRAM)。 项目由 OpenBMB 与清华大学相关团队开发,采用 Apache 2.0 许可,可免费用于商业用途。GitHub 已登上 Trending。 你只要给它一小段录音,它就能把这个人的声音完整复刻下来。 口音、情绪、语速、呼吸的节奏,分毫不差。 输出是 48kHz 的录音棚品质,普通人根本听不出是 AI 生成的。 更离谱的是,你甚至可以用文字直接创造声音。 写一句“二十多岁的女生,声音温柔带点沙哑”,它就能从无到有生成一个完全符合描述的声音。 不用找参考音频,不用请配音演员,连麦克风都不用开。 它的能力有多夸张呢: • 文字造声:描述性别、年龄、音色、情绪,AI 直接生成 • 完美克隆:上传 1 分钟音频,复刻所有声音细节 • 可控克隆:克隆后还能改情绪,比如“语速加快,带着一点兴奋” • 极致还原:给音频加对应文本,连语气的细微起伏都能复现 • 30 种语言自动识别,不用手动标注 • 上下文感知,读新闻像主播,讲故事像说书人 • RTX 4090 上生成速度比播放还快 • 8G 显存就能跑,普通游戏本都能用 • 用 5-10 分钟自己的声音微调,就能得到专属声音模型 最震撼的是第三方基准测试的结果: 英语相似度 VoxCPM2 85.4% vs ElevenLabs 61.3% 中文相似度 VoxCPM2 82.5% vs ElevenLabs 67.7% 阿拉伯语相似度 VoxCPM2 79.1% vs ElevenLabs 70.6% 一个免费开源项目,在核心指标上全面碾压了行业标杆。 专业配音一个项目 250 到 1000 美元 AI 语音平台每月 5 到 100 美元 录音棚每小时 200 美元 VoxCPM2 完全运行在你本地电脑上。 没有 API 费用,没有按字符计费,没有订阅。 永远免费,还能商用。 一行命令就能安装:pip install voxcpm 以下中文实测视频是@emwstudio 老哥提供,供大家参考, 作为新开源模型,在情绪控制稳定性、长文本一致性、企业级可靠性等方面可能仍有提升空间,实际使用体验建议大家亲自测试, GitHub地址和在线 Demo老规矩评论区自取 👇
中文
24
123
636
48.4K
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Example generations from HappyHorse-1.0 compared to Dreamina Seedance 2.0, Kling 3.0 Pro, grok-video-imagine and PixVerse V6 (Text to Video with Audio): Prompt [1/4]: A hula hoop spinning on a kid's waist, gradually climbing to their chest, then dropping to knees, then clattering to the floor. They pick it up to try again.
English
16
10
116
36.5K
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
We’ve added a new pseudonymous video model to our Text to Video and Image to Video Arenas.‘HappyHorse-1.0’ is currently landing in the #1 spot for Text and Image to Video (No Audio) and the #2 spot for Text and Image to Video (With Audio). Further details coming soon. Example generations below from HappyHorse-1.0 in the Artificial Analysis Video Arena 🚀
Artificial Analysis tweet mediaArtificial Analysis tweet mediaArtificial Analysis tweet mediaArtificial Analysis tweet media
English
21
65
477
349.4K
Yang Gao
Yang Gao@YangGao07·
@Prince_Canuma The MLX community is amazing👏. We’re getting MOSS-TTS and MOVA ready for contribution, and we’d love to get them into MLX soon🙌
English
0
0
2
34
Prince Canuma
Prince Canuma@Prince_Canuma·
MLX-Video updates coming soon 🚀 Took a bit longer to debug LTX-2 dev model and 2.3 but we are almost there! Note: PR is tbd, so bear with me :) github.com/Blaizzy/mlx-vi…
English
5
4
47
2.5K
Yang Gao
Yang Gao@YangGao07·
@alphacep Thanks for sharing the benchmark results. We will continue improving MOSS-TTS🙏
English
0
0
2
34
AlphaCephei
AlphaCephei@alphacep·
We systematically test modern TTS engines on Russian dataset. Qwen feels most interesting one. Good clarity and sound quality, reasonable intonation. Issues with pronunciation as always, it is a common thing. VibeVoice hallucinates. Fish is reasonable but a bit plain.
AlphaCephei tweet media
English
5
4
34
3.2K
zhouyunhua-Seek
zhouyunhua-Seek@ZhouyunhuaB·
emmm, I think this is big news. We believe we may have found the best optimizer available today😜, and we welcome you to follow our work.
zhouyunhua-Seek tweet media
English
1
0
1
124
Yang Gao
Yang Gao@YangGao07·
@fractal_friend @DLKFZWilliam2 Good question. Thanks for pointing this out :). We will work on solutions in upcoming releases. For now, the temporary workaround is to use denoising models to standardize them.
English
1
0
1
18