TheStage AI

94 posts

TheStage AI

@TheStageAI

Automated Enterprise Inference Stack & Research Lab

AI Katılım Mayıs 2023

539 Takip Edilen407 Takipçiler

Sabitlenmiş Tweet

TheStage AI@TheStageAI·12 May

TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.

English

154

3.6M

TheStage AI@TheStageAI·5d

@billyG881 Wait several days for announcement 😀

English

billyG88@billyG881·6d

@TheStageAI I see NeuTTS mention on your Andoird SDK repo.... Release coming soon? Example coming soon or is it stable already ? 🤓

English

TheStage AI@TheStageAI·4 Mar

Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private. ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization. SDK demo 👇

English

2.1K

TheStage AI@TheStageAI·5d

@sebuzdugan Its not just an idea, team already applying that for models compression. You can check some benchmarks for compressed models here: github.com/TheStageAI/The… This month we are releasing a lot of benchmarks and ablation study.

English

Sebastian Buzdugan@sebuzdugan·6d

@TheStageAI cool idea but without transparent benchmarks this just reads like marketing

English

172

TheStage AI@TheStageAI·12 May

TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.

English

154

3.6M

TheStage AI@TheStageAI·17 May

Try it yourself, thestage.ai

TheStage AI@TheStageAI

TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.

English

234

TheStage AI@TheStageAI·15 May

@Liqui_Sniper Yes, we are adding onboarding templates. They will be released next week.

English

312

Liquidity Sniper@Liqui_Sniper·15 May

@TheStageAI Love this, good on ya! Stoked to give it a go for mobile apps. Any tips for someone just getting started?

English

1.7K

TheStage AI@TheStageAI·12 Nis

@Nau__One We have local engines, so it can run fully on-device. We also provide ready-to-go containers for inference on your GPUs. We are SOC 2 compliant, and you can easily scan the container for vulnerabilities.

English

ThisIsNoOne@Nau__One·12 Nis

@TheStageAI What about privacy of the conversation if we think of this as a internal company solution

English

177

TheStage AI@TheStageAI·10 Nis

Beyoncé heard cursing. TheWhisper heard Arsenal. The fastest Whisper in the world. Open-source real-time ASR. Top 5 on OpenASR benchmarks. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo

English

179

2.7M

TheStage AI@TheStageAI·10 Nis

@Beyonce heard cursing. TheWhisper heard @Arsenal. Fastest open-source real-time ASR in the world. Top 5 on OpenASR. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo

English

TheStage AI@TheStageAI·9 Nis

@Alacritic_Super Exactly!!!

English

450

Praveen Kumar Verma@Alacritic_Super·9 Nis

@TheStageAI For AI engineers, latency is the experience. If it takes too long, users leave. No matter how good the model is. 5s video in 34s, 1800x real-time speech, instant LoRA switching… this is insane.

English

883

TheStage AI@TheStageAI·8 Nis

For AI engineers, latency is product. Wan 2.2 in Elastic Models now generates 5s of video in 34s on H100. Elastic Models is a library of accelerated open-source models. Also new: TheWhisper at 1800 RTFx on a single H100 and instant FLUX LoRA switching. Try it

English

573

7.7M

TheStage AI@TheStageAI·9 Nis

@TvShowAU188166 Just follow that usage instruction: app.thestage.ai/models/Wan-2.2…

English

398

Poseidon-⚡(Ø,G)π²@TvShowAU188166·9 Nis

@TheStageAI that's impressive latency for video gen. excited to try elastic models on my setup

English

531

TheStage AI@TheStageAI·30 Mar

@Wendy_WendyU @brilliantlabsAR @neuphonicspeech 😎

QME

Weendy Nosequee@Wendy_WendyU·7 Mar

@TheStageAI @brilliantlabsAR @neuphonicspeech Revolutionizing wearable tech with cutting-edge privacy solutions.

English

TheStage AI@TheStageAI·30 Mar

@DnuLkjkjh @brilliantlabsAR @neuphonicspeech NPU used not only for VAD, its also used for transcription and for TTS partially. We are using heterogeneous inference to deliver the best speed and lowest power consumption.

English

dnu@DnuLkjkjh·4 Mar

on-device STT + diarization on glasses is the right call. the moment voice data hits a server for processing you've created a biometric data liability — voiceprints are as unique as fingerprints. curious what model size you're running for the wake word detection and whether the NPU handles the full diarization pipeline or just the VAD portion

English

TheStage AI@TheStageAI·30 Mar

@billyG881 @brilliantlabsAR @neuphonicspeech Thank you! Release of SDK is coming!

English

billyG88@billyG881·29 Mar

@TheStageAI @brilliantlabsAR @neuphonicspeech banger

Indonesia

TheStage AI@TheStageAI·30 Mar

@dimqtdl @brilliantlabsAR @neuphonicspeech Thank you!

English

0xww@dimqtdl·5 Mar

@TheStageAI @brilliantlabsAR @neuphonicspeech Exciting collaboration for cutting-edge privacy in smart glasses.

English

TheStage AI@TheStageAI·19 Mar

How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓

English

355

TheStage AI@TheStageAI·22 Oca

Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps

English

346

131.2K

TheStage AI@TheStageAI·15 Oca

At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library. We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation. Early results on B200: INT8 Llama 8B ~200 tok/s per sequence @ bs16 (≈ 3,200 tok/s aggregate). The write-up also covers CUDA Graphs, graph caching, cuDNN Paged Attention, and INT8 LLMs. Next we are moving to native inference support across NVIDIA hardware including Jetson. Check blog for details: app.thestage.ai/blog/Integrati…

English

408

TheStage AI@TheStageAI·14 Oca

Multilingual, open-source STT built for real-time streaming ↓ github.com/TheStageAI/The…

English

10K

TheStage AI@TheStageAI·14 Oca

We know what you mean @Adele

English

38.7K

Keşfet

@billyG881 @brilliantlabsAR @neuphonicspeech @sebuzdugan @Liqui_Sniper @Nau__One @Beyonce @Arsenal