Sabitlenmiş Tweet
TheStage AI
94 posts

TheStage AI
@TheStageAI
Automated Enterprise Inference Stack & Research Lab
AI Katılım Mayıs 2023
539 Takip Edilen407 Takipçiler

@TheStageAI I see NeuTTS mention on your Andoird SDK repo....
Release coming soon? Example coming soon or is it stable already ? 🤓
English

Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine.
Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private.
ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization.
SDK demo 👇
English

@sebuzdugan Its not just an idea, team already applying that for models compression. You can check some benchmarks for compressed models here: github.com/TheStageAI/The…
This month we are releasing a lot of benchmarks and ablation study.
English

@TheStageAI cool idea but without transparent benchmarks this just reads like marketing
English

@Liqui_Sniper Yes, we are adding onboarding templates. They will be released next week.
English

@TheStageAI Love this, good on ya! Stoked to give it a go for mobile apps. Any tips for someone just getting started?
English

@Nau__One We have local engines, so it can run fully on-device. We also provide ready-to-go containers for inference on your GPUs. We are SOC 2 compliant, and you can easily scan the container for vulnerabilities.
English

@TheStageAI What about privacy of the conversation if we think of this as a internal company solution
English

@TheStageAI For AI engineers, latency is the experience.
If it takes too long, users leave.
No matter how good the model is.
5s video in 34s, 1800x real-time speech, instant LoRA switching…
this is insane.
English

@TvShowAU188166 Just follow that usage instruction: app.thestage.ai/models/Wan-2.2…
English

@TheStageAI that's impressive latency for video gen. excited to try elastic models on my setup
English

@TheStageAI @brilliantlabsAR @neuphonicspeech Revolutionizing wearable tech with cutting-edge privacy solutions.
English

@DnuLkjkjh @brilliantlabsAR @neuphonicspeech NPU used not only for VAD, its also used for transcription and for TTS partially. We are using heterogeneous inference to deliver the best speed and lowest power consumption.
English

on-device STT + diarization on glasses is the right call. the moment voice data hits a server for processing you've created a biometric data liability — voiceprints are as unique as fingerprints. curious what model size you're running for the wake word detection and whether the NPU handles the full diarization pipeline or just the VAD portion
English

@TheStageAI @brilliantlabsAR @neuphonicspeech Exciting collaboration for cutting-edge privacy in smart glasses.
English

How do you make text-to-music run in real time in production?
The model has to keep audio generation ahead of playback.
Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput.
See the full case study ↓
English

At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library.
We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation.
Early results on B200: INT8 Llama 8B ~200 tok/s per sequence @ bs16 (≈ 3,200 tok/s aggregate).
The write-up also covers CUDA Graphs, graph caching, cuDNN Paged Attention, and INT8 LLMs. Next we are moving to native inference support across NVIDIA hardware including Jetson.
Check blog for details:
app.thestage.ai/blog/Integrati…

English

Multilingual, open-source STT built for real-time streaming ↓
github.com/TheStageAI/The…
English