Kirill Solodskikh

282 posts

Kirill Solodskikh

@GarchFather

Almost Phd, Almost Founder, Almost Team Lead, Almost Successful, married. @TheStageAI Co-founder, CEO, ex Huawei P50 AI cameras

Katılım Ekim 2022

1.4K Takip Edilen698 Takipçiler

Sabitlenmiş Tweet

Kirill Solodskikh@GarchFather·9 Oca

We updated TheWhisper. Open-source multilingual speech-to-text for noisy, real-world audio. 6.00 WER on Open ASR, beating NVIDIA Parakeet and OpenAI Whisper. Compressed and accelerated with @TheStageAI ANNA, Automated Neural Networks Accelerator. Try it on our GitHub →

English

523K

Kirill Solodskikh retweetledi

TheStage AI@TheStageAI·1d

How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓

English

Kirill Solodskikh@GarchFather·3d

@yazins Cool! Please check transcription on NPU, more than 5x power consumption reduction github.com/TheStageAI/The…

English

444

yazin@yazins·3d

OpenGranola now runs 100% locally with Ollama. LLM suggestions, knowledge base embeddings, transcription — all on your Mac, nothing hits the network. Just point it at Ollama instead of OpenRouter in settings and you're done. github.com/yazinsai/OpenG…

yazin@yazins

Introducing: OpenGranola 🔥 I built an open source meeting copilot for macOS. It transcribes both sides of your call on-device, searches your own notes in real time, and hands you talking points right when the conversation needs them. No audio leaves your Mac. Point it at a folder of markdown files, pick any LLM through OpenRouter (Claude, GPT-4o, Gemini, Llama), and it just works. It's invisible to screen share too — nobody knows you have it. The whole thing is open source. Link below

English

248

84.6K

Kirill Solodskikh retweetledi

TheStage AI@TheStageAI·4 Mar

Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private. ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization. SDK demo 👇

English

841

Kirill Solodskikh@GarchFather·1 Mar

@juntao I missed something? You are saying MLX with NPU support. MLX is not supporting NPU inference

English

147

Michael Yuan@juntao·1 Mar

Rust implementation for Speech-to-Text based on open-source Qwen3 models * Self-contained binary build — no external dependencies * Uses libtorch on Linux with optional Nvidia GPU support * Uses MLX on MacOS with Apple GPU/NPU support 🔨 CLI for AI agents and humans: github.com/second-state/q… 🖥️ OpenAI compatible API server: github.com/second-state/q… 🤖 OpenClaw skill: money.flows.network Why and how x.com/juntao/status/…

Shady Hollow, TX 🇺🇸 English

546

31.8K

Kirill Solodskikh@GarchFather·5 Şub

@vllm_project @nvidia Hey guys, cool work! Please consider @TheStageAI for future collaboration. We got lossless compression techniques huggingface.co/spaces/TheStag…

English

125

vLLM@vllm_project·4 Şub

📈 vLLM community + @nvidia pushed gpt-oss-120b performance on Blackwell GPUs to new heights: ⚡ +38% max throughput 🎯 +13% min latency 📈 Entire Pareto frontier improved Key ingredients: FlashInfer integration, torch.compile kernel fusions, async scheduling, and stream interval optimizations. Deep dive + deployment recipes: blog.vllm.ai/2026/02/01/gpt… Thanks to the teams at @NVIDIAAI , @RedHat_AI, @AIatMeta, and the vLLM community for the collaboration 🙏

English

223

26K

Kirill Solodskikh@GarchFather·23 Oca

There was also 3.13 build with no GIL, but agree this is big. Using processes which don't share memory and create additional overhead is not good. It's even interesting to run dense(GPU) + sparse(CPU) inference for linear layers with a true multithreading. Previously I have used no GIL with Cython, but it was allowed only not for pure C types not PyObject structure

English

3.2K

Guido Appenzeller@appenz·22 Oca

The GIL is dead, long live Python! Few non-programmers will understand how liberating this is.

English

1.2K

106.1K

Kirill Solodskikh retweetledi

TheStage AI@TheStageAI·22 Oca

Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps

English

359

131K

Kirill Solodskikh@GarchFather·19 Oca

@tom_doerr Just try this one also github.com/TheStageAI/The…

English

Tom Dörr@tom_doerr·17 Oca

Transcribes YouTube videos and files locally via Whisper github.com/Br3n0k/transcr…

English

390

17.2K

Kirill Solodskikh@GarchFather·19 Oca

Good weekend! I spent time testing our releases more extensively and writing usage guides during my tests. Suddenly @akshat_b and @charles_irl from @modal liked my notebook. While testing TheWhisper with @quaz1m, I found that @matiii started following me! Quietly motivating!

English

186

Kirill Solodskikh@GarchFather·18 Oca

@matiii Noise with gradients a bit old

English

112

Mati Staniszewski@matiii·18 Oca

New brand ideas for ElevenLabs product lines. We’ve grown a lot over the last year - today we combine our own audio AI models + platforms for Agents, Creative and API. We’re experimenting with colour and shorter names to make it easier to understand at glance. Feedback welcome!

English

750

36.7K

Kirill Solodskikh@GarchFather·17 Oca

Here is the notebook: modal.com/notebooks/thes…

English

151

Kirill Solodskikh@GarchFather·17 Oca

Mistal-Small-24B from @MistralAI with @nvidia CuDNN paged attention and w8a8 int8 quantization gives more than 2x acceleration on Nvidia B200. Just covered simple tutorial to build a custom image for @modal notebooks and run there @TheStage AI ElasticModels with an integrated CuDNN paged attention and int8 w8a8 quantization (S size). Got acceleration from 40 tok/s -> 95 tok/s (actually faster as it was measured with printing during streaming). Notebook link in the thread 👇

English

274

Kirill Solodskikh retweetledi

Ruslan Aydarkhanov@rusaydar·15 Oca

At @TheStageAI, Elastic Models started with paged FlashAttention. This month we’re moving sequence generation to cuDNN Paged Attention to stay fast and speed up bring-up across newer @NVIDIA GPUs (including Jetson). Details: app.thestage.ai/blog/Integrati…

English

269

Kirill Solodskikh retweetledi

TheStage AI@TheStageAI·14 Oca

We know what you mean @Adele

English

38.6K

Kirill Solodskikh@GarchFather·13 Oca

@notnotrishi We will produce smaller models soon!

English

Rishi@notnotrishi·12 Oca

@GarchFather do you also plan to do fine tunes on smaller models of whisper?

English

Kirill Solodskikh@GarchFather·3 Ara

We updated TheWhisper, our open source speech-to-text engine for self-hosted/on-device use. It now supports NVIDIA H100, L40S, RTX 4090, and RTX 5090. Benchmarks vs other Whisper libs show the best Time to First Token and Real-Time Factor. Try it

English

361

Kirill Solodskikh@GarchFather·9 Oca

There are a lot of releases on ASR! One of them is open-weight and with optimized Apple inference engines. github.com/TheStageAI/The…

ElevenLabs@ElevenLabs

Today we’re introducing Scribe v2: the most accurate transcription model ever released. While Scribe v2 Realtime is optimized for ultra low latency and agents use cases, Scribe v2 is built for batch transcription, subtitling, and captioning at scale.

English

225

Kirill Solodskikh@GarchFather·9 Oca

@TheStageAI @grok, what is the best next release for TheWhisper?

English

354

Kirill Solodskikh@GarchFather·9 Oca

English

523K

Kirill Solodskikh@GarchFather·9 Oca

Hey @grok, please summarize this TheWhisper repo: github.com/TheStageAI/The…

English

108

Kirill Solodskikh@GarchFather·10 Ara

@BharatMhaskar Yep

223

Bharat Mhaskar@BharatMhaskar·10 Ara

@GarchFather Does it work with audio streaming input?

English

255

Keşfet

@MireloAI @yazins @brilliantlabsAR @neuphonicspeech @juntao @vllm_project @nvidia @TheStageAI