Diffio AI

26 posts

Diffio AI

Diffio AI

@diffioai

Audio restoration for podcasts. Visit https://t.co/KxDOlns8Wq for more.

Colorado Springs Katılım Kasım 2025
5 Takip Edilen6 Takipçiler
Indiana Jones
Indiana Jones@IndianaJ9000·
@diffioai Really interesting idea behind Diffio. I’m a SaaS writer and a few content ideas came to mind while checking it out — happy to share if useful.
English
1
0
0
19
Pierre Richemond 🇪🇺
Pierre Richemond 🇪🇺@TheOneKloud·
Pierre Richemond 🇪🇺@TheOneKloud

Excited and proud to introduce our latest: Cohere Transcribe, the best dedicated ASR model in the world. #1 EN HF leaderboard, SotA human evals, ahead of ElevenLabs, Qwen3, Mistral, Kyutai, and OpenAI. 14 supported languages. Apache 2.0, on HF for you to try. Our first audio model and a key step in powering North experiences. huggingface.co/CohereLabs/coh…

English
2
10
35
2K
Fahd Mirza
Fahd Mirza@fahdmirza·
💥 Cohere Transcribe is HERE and it's OPEN SOURCE 🎙️ ♠ A free 2B parameter ASR model you can run locally — Audio In, Text Out 🚀 🔹 14 languages supported: English, French, German, Arabic, Japanese, Korean & more 🔹 Conformer architecture — built from scratch for speech, not repurposed 🔹 Up to 3× faster real-time factor than other dedicated ASR models of the same size 🔹 Apache 2.0 license — fully free, no strings attached 🔹 Works on your own GPU, your own data, no API calls, no cost 🔥 Watch the full demo below 👇
English
2
9
56
4K
clem 🤗
clem 🤗@ClementDelangue·
Got to meet @nickfrosst in Miami today to celebrate their awesome release of an open-source Apache 2.0 Transcribe model that could be a whisper killer and already trending on @huggingface! @cohere deserves much more visibility in the community as one of the leaders of North American open-source!
clem 🤗 tweet media
English
16
8
186
36.5K
Jaxson Khan
Jaxson Khan@jaxson·
If you regularly transcribe audio, @cohere Transcribe was just released - it's a free, open-source model that runs locally and is definitely worth checking out. I ran some tests against OpenAI's Whisper (which powers ChatGPT and many other apps). I used Steve Jobs' 2005 Stanford Commencement Address (15 min) on YouTube as the test video. Both models running locally on a MacBook M4. Some highlights of what each model heard: Cohere: "I learned about serif and sans serif typefaces" Whisper: "I learned about Sarah and Sans Sarah of typefaces" Cohere: "Bob Noyce" Whisper: "Bob Nois" Cohere: "tried to apologize for screwing up so badly" Whisper: "tried to apologize for sparing up so badly" I also tested Whisper's largest model (1.55B parameters) to get a closer comparison to Cohere's 2B parameters. It fixed some of the name errors but started repeating phrases and took much longer. How they compared: - Cohere (2B params): 119 seconds, ~98% accuracy - Whisper base (74M params): 69 seconds, ~90% accuracy - Whisper large (1.55B params): 915 seconds, ~93% accuracy Full side-by-side transcript comparison: github.com/jaxson/tests-p… (Note I believe that some of the different word counts stem from hallucination loops that were encountered by Whisper). Cohere Transcribe Model on Hugging Face: huggingface.co/CohereLabs/coh… Test video on YouTube: youtube.com/watch?v=UF8uR6… * Results may vary based on hardware, audio quality, and content. This is a very non scientific test! **Audio clips used under fair use for commentary/analysis. All rights belong to their respective owners.
YouTube video
YouTube
English
2
5
17
1.2K
Diffio AI
Diffio AI@diffioai·
@Tu7uruu github.com/Diffio-AI/Cohe… WhisperX style interface for cohere. Has VAD (cohere recommends) and word alignment which is always helpful. I also added automatic language detection.
English
0
0
0
2
steven
steven@Tu7uruu·
Just dropped on HF: Cohere’s cohere-transcribe-03-2026 > 🥇 #1 on the Open ASR leaderboard > 🌍 #4 multilingual > 📄 #6 long-form > Supports 12+ languages: English, German, French, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Arabic, Vietnamese, Chinese, Japanese, Korean > Conformer-based encoder + lightweight Transformer decoder for transcription > And of course: Apache 2.0 license
steven tweet media
English
6
20
166
12.3K
Pasha S
Pasha S@psk90_ai·
Cohere just took #1 on the Hugging Face Open ASR Leaderboard. First speech model. 5.42% WER. Open source. ━━━━━━━━━━━━━━━━━━━ Cohere Transcribe. Their first speech-to-text model — and it immediately tops the English accuracy charts. → 5.42% word error rate — #1 on HuggingFace Open ASR Leaderboard → Validated by human evaluation, not just automated benchmarks → One of the strongest accuracy-to-speed ratios at its size class → Minutes of audio → usable transcripts in seconds → Open source — download weights directly from Hugging Face ━━━━━━━━━━━━━━━━━━━ The bigger picture: This isn't a standalone model release. Cohere is building toward full enterprise speech intelligence inside North — their agentic AI orchestration platform. Translation: your AI agent will soon listen, transcribe, reason, and act — all within one enterprise platform. Transcribe is the ears. The ASR space just got very crowded very fast. In the last few months: Mistral Voxtral, IBM Granite Speech, ElevenLabs Scribe, and now Cohere Transcribe — all pushing open-source ASR past what Whisper could do. 🔗 Blog: cohere.com/blog/transcribe 🔗 Model: lnkd.in/ggfeZye5 Building enterprise speech pipelines — transcription, voice agents, real-time audio processing — on-premise? That's what we do at Zingaro AI and LiteCompute AI. DM me. ♻️ Repost if useful. Follow Pasha S for daily open-source AI drops.
English
1
0
0
39
Cohere
Cohere@cohere·
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.
English
82
297
2.6K
598.5K
Diffio AI
Diffio AI@diffioai·
- OpenAI Whisper timing github.com/openai/whisper OpenAI Whisper timing uses Whisper’s internal alignment heads and decoder cross-attention, then applies DTW over the token-to-frame attention matrix to derive word timestamps from the token sequence.
English
0
0
3
80
Diffio AI
Diffio AI@diffioai·
Word alignment error relative to SNR. See 🧵for details.
Diffio AI tweet media
English
3
2
4
56
Diffio AI
Diffio AI@diffioai·
- whisper-char-alignment github.com/30stomercury/w… whisper-char-alignment Whisper’s own decoder cross-attention maps, teacher-forces the reference text at character level, and uses DTW plus attention-head aggregation to infer word boundaries.
English
0
0
2
72
Diffio AI
Diffio AI@diffioai·
- WhisperX github.com/m-bain/whisperX WhisperX performs forced alignment with an external phoneme/CTC aligner, typically a wav2vec2-based model, to align a known transcript to the waveform and recover word timestamps.
English
0
0
2
35
Diffio AI
Diffio AI@diffioai·
Codex Wrapped 2025 Total Tokens: 3,073,600,806 Total Messages: 1,782 Total Sessions: 512 Top model: GPT 5.2 Codex Total Estimated Cost: $814.14 Credit: @nummanali @moddi3io
Diffio AI tweet media
English
0
0
0
81
Diffio AI
Diffio AI@diffioai·
@Meta Has anyone ever figured out how to get help from @Meta. I guess they're too busy spending all their money on gpus to take any of my money.
Diffio AI tweet media
English
3
0
0
82
Diffio AI
Diffio AI@diffioai·
No help so far @Meta I have tried email and requesting support from the website. 😢
English
1
0
0
60
Diffio AI
Diffio AI@diffioai·
@Meta Any chance I can get some help?
Diffio AI tweet media
English
2
0
0
84