TheStage AI

94 posts

TheStage AI banner
TheStage AI

TheStage AI

@TheStageAI

Automated Enterprise Inference Stack & Research Lab

AI Katılım Mayıs 2023
539 Takip Edilen407 Takipçiler
Sabitlenmiş Tweet
TheStage AI
TheStage AI@TheStageAI·
TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.
English
34
30
154
3.6M
billyG88
billyG88@billyG881·
@TheStageAI I see NeuTTS mention on your Andoird SDK repo.... Release coming soon? Example coming soon or is it stable already ? 🤓
English
1
0
1
19
TheStage AI
TheStage AI@TheStageAI·
Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private. ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization. SDK demo 👇
English
6
9
25
2.1K
TheStage AI
TheStage AI@TheStageAI·
@sebuzdugan Its not just an idea, team already applying that for models compression. You can check some benchmarks for compressed models here: github.com/TheStageAI/The… This month we are releasing a lot of benchmarks and ablation study.
English
0
0
0
94
TheStage AI
TheStage AI@TheStageAI·
TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.
English
34
30
154
3.6M
TheStage AI
TheStage AI@TheStageAI·
@Liqui_Sniper Yes, we are adding onboarding templates. They will be released next week.
English
0
0
0
312
Liquidity Sniper
Liquidity Sniper@Liqui_Sniper·
@TheStageAI Love this, good on ya! Stoked to give it a go for mobile apps. Any tips for someone just getting started?
English
1
0
1
1.7K
TheStage AI
TheStage AI@TheStageAI·
@Nau__One We have local engines, so it can run fully on-device. We also provide ready-to-go containers for inference on your GPUs. We are SOC 2 compliant, and you can easily scan the container for vulnerabilities.
English
0
0
1
96
ThisIsNoOne
ThisIsNoOne@Nau__One·
@TheStageAI What about privacy of the conversation if we think of this as a internal company solution
English
1
0
2
177
TheStage AI
TheStage AI@TheStageAI·
Beyoncé heard cursing. TheWhisper heard Arsenal. The fastest Whisper in the world. Open-source real-time ASR. Top 5 on OpenASR benchmarks. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo
English
4
19
179
2.7M
TheStage AI
TheStage AI@TheStageAI·
@Beyonce heard cursing. TheWhisper heard @Arsenal. Fastest open-source real-time ASR in the world. Top 5 on OpenASR. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo
English
0
0
2
75
Praveen Kumar Verma
Praveen Kumar Verma@Alacritic_Super·
@TheStageAI For AI engineers, latency is the experience. If it takes too long, users leave. No matter how good the model is. 5s video in 34s, 1800x real-time speech, instant LoRA switching… this is insane.
English
2
0
3
883
TheStage AI
TheStage AI@TheStageAI·
For AI engineers, latency is product. Wan 2.2 in Elastic Models now generates 5s of video in 34s on H100. Elastic Models is a library of accelerated open-source models. Also new: TheWhisper at 1800 RTFx on a single H100 and instant FLUX LoRA switching. Try it
English
14
42
573
7.7M
dnu
dnu@DnuLkjkjh·
on-device STT + diarization on glasses is the right call. the moment voice data hits a server for processing you've created a biometric data liability — voiceprints are as unique as fingerprints. curious what model size you're running for the wake word detection and whether the NPU handles the full diarization pipeline or just the VAD portion
English
1
0
4
34
TheStage AI
TheStage AI@TheStageAI·
How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓
English
0
4
8
355
TheStage AI
TheStage AI@TheStageAI·
Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps
English
2
29
346
131.2K
TheStage AI
TheStage AI@TheStageAI·
At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library. We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation. Early results on B200: INT8 Llama 8B ~200 tok/s per sequence @ bs16 (≈ 3,200 tok/s aggregate). The write-up also covers CUDA Graphs, graph caching, cuDNN Paged Attention, and INT8 LLMs. Next we are moving to native inference support across NVIDIA hardware including Jetson. Check blog for details: app.thestage.ai/blog/Integrati…
TheStage AI tweet media
English
0
1
11
408