Jaxson Khan: "If you regularly transcribe audio, @cohere Transcribe wa

If you regularly transcribe audio, @cohere Transcribe was just released - it's a free, open-source model that runs locally and is definitely worth checking out. I ran some tests against OpenAI's Whisper (which powers ChatGPT and many other apps). I used Steve Jobs' 2005 Stanford Commencement Address (15 min) on YouTube as the test video. Both models running locally on a MacBook M4. Some highlights of what each model heard: Cohere: "I learned about serif and sans serif typefaces" Whisper: "I learned about Sarah and Sans Sarah of typefaces" Cohere: "Bob Noyce" Whisper: "Bob Nois" Cohere: "tried to apologize for screwing up so badly" Whisper: "tried to apologize for sparing up so badly" I also tested Whisper's largest model (1.55B parameters) to get a closer comparison to Cohere's 2B parameters. It fixed some of the name errors but started repeating phrases and took much longer. How they compared: - Cohere (2B params): 119 seconds, ~98% accuracy - Whisper base (74M params): 69 seconds, ~90% accuracy - Whisper large (1.55B params): 915 seconds, ~93% accuracy Full side-by-side transcript comparison: github.com/jaxson/tests-p… (Note I believe that some of the different word counts stem from hallucination loops that were encountered by Whisper). Cohere Transcribe Model on Hugging Face: huggingface.co/CohereLabs/coh… Test video on YouTube: youtube.com/watch?v=UF8uR6… * Results may vary based on hardware, audio quality, and content. This is a very non scientific test! **Audio clips used under fair use for commentary/analysis. All rights belong to their respective owners.

YouTube

English

1.2K

Diffio AI@diffioai·29 Mar

@jaxson @cohere github.com/Diffio-AI/Cohe… WhisperX style interface for cohere. Has VAD (cohere recommends) and word alignment which is always helpful. I also added automatic language detection.

English