Sabitlenmiş Tweet
TheStage AI
79 posts

TheStage AI
@TheStageAI
Automated Enterprise Inference Stack & Research Lab
AI Katılım Mayıs 2023
556 Takip Edilen375 Takipçiler

How do you make text-to-music run in real time in production?
The model has to keep audio generation ahead of playback.
Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput.
See the full case study ↓
English

Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine.
Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private.
ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization.
SDK demo 👇
English

At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library.
We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation.
Early results on B200: INT8 Llama 8B ~200 tok/s per sequence @ bs16 (≈ 3,200 tok/s aggregate).
The write-up also covers CUDA Graphs, graph caching, cuDNN Paged Attention, and INT8 LLMs. Next we are moving to native inference support across NVIDIA hardware including Jetson.
Check blog for details:
app.thestage.ai/blog/Integrati…

English

Multilingual, open-source STT built for real-time streaming ↓
github.com/TheStageAI/The…
English
TheStage AI retweetledi

Great communities make great products.
At @TheStageAI, we’re building ANNA, our Autonomous Neural Networks Accelerator, for faster, cheaper inference.
We need a Community Manager now. Be part of the early story →
English

Excited to share our MLPerf Inference v5.1 results (@MLCommons).
We ran @StabilityAI SDXL on 8×H100 via @nebiusai with our stack, ANNA.
18.1 img/s in target quality range.
Fast, reproducible, world-class performance from our team, submitted alongside top AI players ↓
English
TheStage AI retweetledi

Validation is a key step when compressing or accelerating models.
It shows if the network still performs well.
Our research team @TheStageAI shared evaluation methods for sharpness, tone, color, object placement, and more
English
TheStage AI retweetledi

How to measure the quality of text-to-image models?
Our research team @TheStageAI put together a comprehensive guide to check perceptual quality, sharpness, color, prompt alignment, and more.
All the tricky image quality questions researchers usually ask are covered here↓
English
TheStage AI retweetledi

🚀 Early access to ANNA: Automated NNs Accelerator now available! ✨
Get your access here: app.thestage.ai/contact
Questions? DM or comment below! 💬
With ANNA, you can:
🔄 Simply upload your model, data, and desired metrics
🎛️ Fine-tune model size, latency, and quality with an intuitive slider
🔗 Combine multiple compression & acceleration algorithms in a single neural network
⚡ Boost performance by more than 2x with zero quality loss!

English
TheStage AI retweetledi
TheStage AI retweetledi

Self-hosted text-to-image on H100 with @TheStageAI Elastic Models, accelerated from FLUX.1-schnell @bfl_ml.
Our fastest model S generates a high-quality image in 0.5 s.
Precompiled and ready-to-deploy – minimal cold start.
Tutorial + access token inside if you want to try.
English

Imagine paying $30 for 10k images when @SaladTech + ANNA does it for $1 💀
FLUX.1-schnell ~1.2 s/image, high-quality output
ANNA auto-tunes models to balance speed and quality
OpenAI-compatible API, fully self-hosted. Quick guide shows how to run your own endpoint
English
