Julian Mack

253 posts

Julian Mack

Julian Mack

@Julianfmack

ML researcher. Multimodal, foundations @cohere

Katılım Mayıs 2014
581 Takip Edilen2.3K Takipçiler
Sabitlenmiş Tweet
Julian Mack
Julian Mack@Julianfmack·
Happy to share what I've been working on recently: today we release Cohere Transcribe, a state-of-the-art speech recognition model that beats both commercial and open-source models to land at #1 on the Open ASR Leaderboard!
Julian Mack tweet media
English
3
12
80
3.7K
Julian Mack
Julian Mack@Julianfmack·
Happy to share what I've been working on recently: today we release Cohere Transcribe, a state-of-the-art speech recognition model that beats both commercial and open-source models to land at #1 on the Open ASR Leaderboard!
Julian Mack tweet media
English
3
12
80
3.7K
Julian Mack retweetledi
Victor M
Victor M@victormustar·
Very hyped by the new Cohere Transcribe model 🌍 Works surprisingly well on bad quality audio when the mic doesn't cooperate. 2B params, 14 supported languages and it's Apache 2.0. try the official Hugging Face demo ⬇️
English
13
30
306
20.2K
Julian Mack
Julian Mack@Julianfmack·
@kushtrimvisoka Our tokenizer does use byte fallback though so while a totally new character set will be challenging for the current vocab, it's not an absolute constraint
English
1
0
0
20
Julian Mack
Julian Mack@Julianfmack·
@kushtrimvisoka We haven't tested this but we'd be very interested in your results if you try! The main practical constraint is the tokenizer, which covers Latin, Greek, Arabic, Chinese, Japanese kana and Korean Hangul. Adaptation to languages outside these would need tokenizer changes
English
1
0
0
19
Julian Mack retweetledi
Cohere
Cohere@cohere·
Introducing: Cohere Transcribe – a new state-of-the-art in open source speech recognition.
English
81
295
2.6K
591.1K
Julian Mack retweetledi
Pierre Richemond 🇪🇺
Excited and proud to introduce our latest: Cohere Transcribe, the best dedicated ASR model in the world. #1 EN HF leaderboard, SotA human evals, ahead of ElevenLabs, Qwen3, Mistral, Kyutai, and OpenAI. 14 supported languages. Apache 2.0, on HF for you to try. Our first audio model and a key step in powering North experiences. huggingface.co/CohereLabs/coh…
Pierre Richemond 🇪🇺 tweet media
English
3
23
112
13.9K
Julian Mack
Julian Mack@Julianfmack·
We validated our quality in human preference evaluations. In head-to-head comparisons we come out ahead (>50% win-rate) against all competitors. Meaning preservation was the key criteria. But we also wanted well formatted, verbatim responses with correctly cased proper nouns
Julian Mack tweet media
English
1
0
4
195
Julian Mack
Julian Mack@Julianfmack·
@jeankaddour Maybe an annealed aux loss formatting term to put special tokens <start/stop_thinking> in the right place during sft? As the trajectory is ~unchanged after 250, the aux term isn't adding new knowledge vs the baseline
English
0
0
0
112
Jean Kaddour
Jean Kaddour@jeankaddour·
ML interview question: What is happening here?
Jean Kaddour tweet media
English
156
19
564
145.3K
Julian Mack retweetledi
Davis Blalock
Davis Blalock@davisblalock·
🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀 arxiv.org/abs/2602.23349 A bunch of cool ideas make this possible: [1/n]
Davis Blalock tweet media
English
30
228
1.6K
212.9K
Julian Mack retweetledi
gavin leech (Non-Reasoning)
gavin leech (Non-Reasoning)@g_leech_·
New paper on a long-shot I've been obsessed with for a year: How much are AI reasoning gains confounded by expanding the training corpus 10000x? How much LLM performance is down to "local" generalisation (pattern-matching to hard-to-detect semantically equivalent training data)?
gavin leech (Non-Reasoning) tweet mediagavin leech (Non-Reasoning) tweet media
English
32
133
967
221.5K
Julian Mack retweetledi
Siyan Zhao
Siyan Zhao@siyan_zhao·
Introducing 💡On-Policy Self-Distillation💡, a simple method that enables LLM to teach itself with dense per-token feedback on its own on-policy generations—achieving 4-8x more token efficiency vs. GRPO and outperforming both GRPO and SFT/Off-Policy Distillation. Key insight: like a student reviewing solutions, rationalizing them, and correcting prior mistakes, an LLM can be conditioned on privileged info (e.g., correct solution or a reasoning trace) and supervise its weaker self—the version without such access—by matching the privileged-info-induced distribution from itself. 🌐Blog: siyan-zhao.github.io/blog/2026/opsd/ 🧵👇
Siyan Zhao tweet media
English
31
157
921
131.6K
Julian Mack retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
Global AI deserves reproducible and transparent evaluation. 🌎 With Global MMLU Lite now part of @kaggle Benchmarks, you can track the multilingual performance of top models as well as test your own! Check out the leaderboard and notebook linked below.
Cohere Labs tweet media
English
1
10
19
7.4K
Julian Mack retweetledi
Dwarak
Dwarak@DwaraknathG·
I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a lot of fun! Link in thread. <3
English
14
45
458
67.1K