Kirill Solodskikh

282 posts

Kirill Solodskikh banner
Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Almost Phd, Almost Founder, Almost Team Lead, Almost Successful, married. @TheStageAI Co-founder, CEO, ex Huawei P50 AI cameras

Katılım Ekim 2022
1.4K Takip Edilen698 Takipçiler
Sabitlenmiş Tweet
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
We updated TheWhisper. Open-source multilingual speech-to-text for noisy, real-world audio. 6.00 WER on Open ASR, beating NVIDIA Parakeet and OpenAI Whisper. Compressed and accelerated with @TheStageAI ANNA, Automated Neural Networks Accelerator. Try it on our GitHub →
English
4
8
99
523K
Kirill Solodskikh retweetledi
TheStage AI
TheStage AI@TheStageAI·
How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓
English
0
3
7
41
Kirill Solodskikh retweetledi
TheStage AI
TheStage AI@TheStageAI·
Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private. ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization. SDK demo 👇
English
4
7
18
841
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
@juntao I missed something? You are saying MLX with NPU support. MLX is not supporting NPU inference
English
0
0
1
147
Michael Yuan
Michael Yuan@juntao·
Rust implementation for Speech-to-Text based on open-source Qwen3 models * Self-contained binary build — no external dependencies * Uses libtorch on Linux with optional Nvidia GPU support * Uses MLX on MacOS with Apple GPU/NPU support 🔨 CLI for AI agents and humans: github.com/second-state/q… 🖥️ OpenAI compatible API server: github.com/second-state/q… 🤖 OpenClaw skill: money.flows.network Why and how x.com/juntao/status/…
Shady Hollow, TX 🇺🇸 English
14
74
546
31.8K
vLLM
vLLM@vllm_project·
📈 vLLM community + @nvidia pushed gpt-oss-120b performance on Blackwell GPUs to new heights: ⚡ +38% max throughput 🎯 +13% min latency 📈 Entire Pareto frontier improved Key ingredients: FlashInfer integration, torch.compile kernel fusions, async scheduling, and stream interval optimizations. Deep dive + deployment recipes: blog.vllm.ai/2026/02/01/gpt… Thanks to the teams at @NVIDIAAI , @RedHat_AI, @AIatMeta, and the vLLM community for the collaboration 🙏
vLLM tweet media
English
7
24
223
26K
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
There was also 3.13 build with no GIL, but agree this is big. Using processes which don't share memory and create additional overhead is not good. It's even interesting to run dense(GPU) + sparse(CPU) inference for linear layers with a true multithreading. Previously I have used no GIL with Cython, but it was allowed only not for pure C types not PyObject structure
English
1
0
2
3.2K
Guido Appenzeller
Guido Appenzeller@appenz·
The GIL is dead, long live Python! Few non-programmers will understand how liberating this is.
Guido Appenzeller tweet media
English
39
68
1.2K
106.1K
Kirill Solodskikh retweetledi
TheStage AI
TheStage AI@TheStageAI·
Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps
English
2
31
359
131K
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
Good weekend! I spent time testing our releases more extensively and writing usage guides during my tests. Suddenly @akshat_b and @charles_irl from @modal liked my notebook. While testing TheWhisper with @quaz1m, I found that @matiii started following me! Quietly motivating!
English
0
0
6
186
Mati Staniszewski
Mati Staniszewski@matiii·
New brand ideas for ElevenLabs product lines. We’ve grown a lot over the last year - today we combine our own audio AI models + platforms for Agents, Creative and API. We’re experimenting with colour and shorter names to make it easier to understand at glance. Feedback welcome!
Mati Staniszewski tweet media
English
46
26
750
36.7K
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
Mistal-Small-24B from @MistralAI with @nvidia CuDNN paged attention and w8a8 int8 quantization gives more than 2x acceleration on Nvidia B200. Just covered simple tutorial to build a custom image for @modal notebooks and run there @TheStage AI ElasticModels with an integrated CuDNN paged attention and int8 w8a8 quantization (S size). Got acceleration from 40 tok/s -> 95 tok/s (actually faster as it was measured with printing during streaming). Notebook link in the thread 👇
English
1
1
7
274
Kirill Solodskikh retweetledi
TheStage AI
TheStage AI@TheStageAI·
We know what you mean @Adele
English
1
5
14
38.6K
Rishi
Rishi@notnotrishi·
@GarchFather do you also plan to do fine tunes on smaller models of whisper?
English
1
0
1
19
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
We updated TheWhisper, our open source speech-to-text engine for self-hosted/on-device use. It now supports NVIDIA H100, L40S, RTX 4090, and RTX 5090. Benchmarks vs other Whisper libs show the best Time to First Token and Real-Time Factor. Try it
English
15
17
361
1M
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
We updated TheWhisper. Open-source multilingual speech-to-text for noisy, real-world audio. 6.00 WER on Open ASR, beating NVIDIA Parakeet and OpenAI Whisper. Compressed and accelerated with @TheStageAI ANNA, Automated Neural Networks Accelerator. Try it on our GitHub →
English
4
8
99
523K