TheStage AI

79 posts

TheStage AI banner
TheStage AI

TheStage AI

@TheStageAI

Automated Enterprise Inference Stack & Research Lab

AI Katılım Mayıs 2023
556 Takip Edilen375 Takipçiler
Sabitlenmiş Tweet
TheStage AI
TheStage AI@TheStageAI·
New SOTA TheWhisper checkpoint. Update is out. Open-source multilingual STT built for real-time streaming and noisy audio. 6.0 WER on Open ASR, ahead of Parakeet and Whisper. Optimized with our stack – ANNA, Automated Neural Networks Accelerator. Code is open. GitHub →
English
2
2
63
354.7K
TheStage AI
TheStage AI@TheStageAI·
How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓
English
0
3
7
41
TheStage AI
TheStage AI@TheStageAI·
Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private. ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization. SDK demo 👇
English
4
7
18
840
TheStage AI
TheStage AI@TheStageAI·
Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps
English
2
31
359
131K
TheStage AI
TheStage AI@TheStageAI·
At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library. We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation. Early results on B200: INT8 Llama 8B ~200 tok/s per sequence @ bs16 (≈ 3,200 tok/s aggregate). The write-up also covers CUDA Graphs, graph caching, cuDNN Paged Attention, and INT8 LLMs. Next we are moving to native inference support across NVIDIA hardware including Jetson. Check blog for details: app.thestage.ai/blog/Integrati…
TheStage AI tweet media
English
0
1
9
287
TheStage AI
TheStage AI@TheStageAI·
Significant speed and size gains in model inference are possible without hurting output quality. ANNA is our PyTorch framework for automated model acceleration, a new way to think about MLOps. Smaller ckpts, lower cost, faster inference, no retrain. Test demo or request access
English
1
10
152
844.2K
TheStage AI
TheStage AI@TheStageAI·
We’ve made it easy to run text-to-image models on @Modal with the speed you’d expect from top inference providers. Follow our quick guide to deploy containers with an @OpenAI compatible API and get 2× faster performance. Big thanks to @MireloAI for the soundtrack magic 🎶
English
1
4
24
424.3K
TheStage AI retweetledi
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
Great communities make great products. At @TheStageAI, we’re building ANNA, our Autonomous Neural Networks Accelerator, for faster, cheaper inference. We need a Community Manager now. Be part of the early story →
English
2
1
12
2.1K
TheStage AI
TheStage AI@TheStageAI·
TheStage AI is now SOC 2 Type I compliant. We did it to keep models, data, and IP secure. Clients get confidence, simpler procurement, and compliant AI deployment. This milestone sets us up to grow into enterprise, government, and regulated markets.
TheStage AI tweet media
English
0
1
8
579
TheStage AI
TheStage AI@TheStageAI·
Excited to share our MLPerf Inference v5.1 results (@MLCommons). We ran @StabilityAI SDXL on 8×H100 via @nebiusai with our stack, ANNA. 18.1 img/s in target quality range. Fast, reproducible, world-class performance from our team, submitted alongside top AI players ↓
English
0
5
30
165.3K
TheStage AI retweetledi
Azim K
Azim K@quaz1m·
Validation is a key step when compressing or accelerating models. It shows if the network still performs well. Our research team @TheStageAI shared evaluation methods for sharpness, tone, color, object placement, and more
English
1
3
35
93.5K
TheStage AI retweetledi
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
How to measure the quality of text-to-image models? Our research team @TheStageAI put together a comprehensive guide to check perceptual quality, sharpness, color, prompt alignment, and more. All the tricky image quality questions researchers usually ask are covered here↓
English
0
6
55
258.7K
TheStage AI retweetledi
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
🚀 Early access to ANNA: Automated NNs Accelerator now available! ✨ Get your access here: app.thestage.ai/contact Questions? DM or comment below! 💬 With ANNA, you can: 🔄 Simply upload your model, data, and desired metrics 🎛️ Fine-tune model size, latency, and quality with an intuitive slider 🔗 Combine multiple compression & acceleration algorithms in a single neural network ⚡ Boost performance by more than 2x with zero quality loss!
Kirill Solodskikh tweet media
English
0
1
9
494
TheStage AI
TheStage AI@TheStageAI·
For AI builders and researchers: get early access to QLIP + ANNA for DNN optimization and acceleration – cloud, self-host, edge. Get a free commercial license. Collaborate with us on research, integrate your algorithms, or simplify deployment. Limited spots – apply today ↓
English
2
5
41
405.7K
TheStage AI retweetledi
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
Quantization delivers speedup but can reduce quality. Our researchers prepared a tutorial showing how ANNA automatically quantizes Flux and accelerates it 2× while keeping quality high. Orig. model latency: 6.4 s. Check the link. DM or comment for early access.
English
1
5
59
109.7K
TheStage AI retweetledi
Kirill Solodskikh
Kirill Solodskikh@GarchFather·
Self-hosted text-to-image on H100 with @TheStageAI Elastic Models, accelerated from FLUX.1-schnell @bfl_ml. Our fastest model S generates a high-quality image in 0.5 s. Precompiled and ready-to-deploy – minimal cold start. Tutorial + access token inside if you want to try.
English
1
10
132
138.1K
TheStage AI
TheStage AI@TheStageAI·
Imagine paying $30 for 10k images when @SaladTech + ANNA does it for $1 💀 FLUX.1-schnell ~1.2 s/image, high-quality output ANNA auto-tunes models to balance speed and quality OpenAI-compatible API, fully self-hosted. Quick guide shows how to run your own endpoint
English
1
3
23
116.3K