TileRT

13 posts

TileRT

TileRT

@TileRT_AI

Tokens, in a blink.

Beigetreten Mayıs 2026
2 Folgt561 Follower
Angehefteter Tweet
TileRT
TileRT@TileRT_AI·
Proud to core-build this with the MiMo team! Breaking 1,000 TPS on a 1T model with standard 8-GPU nodes is just the beginning of the Speed Scaling era. Technical deep dive coming on our channel! 🚀⚡️
Xiaomi MiMo@XiaomiMiMo

🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive:mimo.xiaomi.com/blog/mimo-tile… Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: platform.xiaomimimo.com/ultraspeed ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com

English
7
7
36
4.9K
踏雪寻仙
踏雪寻仙@TaXue2025·
@XiaomiMiMo @TileRT_AI 这还是个没见过的机构,倒是让我想起了GLM之前的高速版,开源生态真的是越来越繁荣了。祝MIMO模型越做越好
中文
1
0
2
617
Xiaomi MiMo
Xiaomi MiMo@XiaomiMiMo·
🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive:mimo.xiaomi.com/blog/mimo-tile… Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: platform.xiaomimimo.com/ultraspeed ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com
Xiaomi MiMo tweet media
English
124
286
2.2K
328.8K
TileRT
TileRT@TileRT_AI·
How did we push a 1 Trillion parameter MoE model past the 1,000 TPS barrier on a standard 8-GPGPU node with @XiaomiMiMo? 🚀 It’s not just a faster kernel. It’s a total execution model revolution. Key technical breakthroughs inside TileRT:
Xiaomi MiMo@XiaomiMiMo

🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive:mimo.xiaomi.com/blog/mimo-tile… Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: platform.xiaomimimo.com/ultraspeed ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com

English
8
7
58
9.4K
TileRT
TileRT@TileRT_AI·
⚡️ System & Model Co-design: Deep technical synergy with the MiMo team on FP4/FP8 mixed quantization and production-grade DFlash.
English
0
0
5
1K
TileRT
TileRT@TileRT_AI·
⚡️ Heterogeneous Workers & Warp Specialization: Breaking the serial pace to orchestrate specialized worker groups not just within a single SM, but scaling across the entire GPU execution domain.
English
0
0
5
957
TileRT
TileRT@TileRT_AI·
⚡️ Tile-grained Pipelining: Deeply overlapping memory movement, tensor computation, and communication at the physical tile level.
English
0
0
4
859
TileRT
TileRT@TileRT_AI·
⚡️ Persistent Kernels: The entire compute pipeline runs continuously inside the GPU, enabling full-stack continuous prefetching and erasing operator boundaries.
English
0
0
5
664
TileRT
TileRT@TileRT_AI·
@XiaomiMiMo Proud to core-build this with the MiMo team! Breaking 1,000 TPS on a 1T model with standard 8-GPU nodes is just the beginning of the Speed Scaling era. Technical deep dive coming on our channel! 🚀⚡️
English
0
1
69
7.6K
TileRT
TileRT@TileRT_AI·
@zRdianjiao @Zai_org Huge milestone. Grateful to @Zai_org for the partnership. Flagship quality at 400 tok/s is just the start of what we can do together. 🔥
English
0
0
2
25
zR
zR@zRdianjiao·
🚀 GLM-5.1-HighSpeed is live: 400 tokens/s — a new speed ceiling for flagship-tier LLM APIs. Not a smaller model traded for speed. A flagship from @Zai_org that's also the fastest. 📖 Full technical deep-dive 👇 tilert.ai/blog/speed-as-…
GIF
English
44
78
946
69.1K
TileRT retweetet
jietang
jietang@jietang·
GLM-5.1-highspeed is coming, 400 tokens per second. Very expensive, but bring a new possibility.
GIF
jietang tweet media
English
42
26
575
38.2K