mei

176 posts

mei banner
mei

mei

@multiply_matrix

AGI forecaster. In 2022 I predicted an AGI timeline of 2027 MIT dropout

Katılım Şubat 2021
211 Takip Edilen2.4K Takipçiler
mei retweetledi
Akshat Bubna
Akshat Bubna@akshat_b·
Raising $ is cool. What’s even cooler is getting to work every day with this incredible group of humans. We like solving hard problems and building things we can be proud of. If this is you, come join us! We’re just getting started :)
Akshat Bubna tweet media
Modal@modal

x.com/i/article/2057…

English
19
20
278
47.7K
mei retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🐋 DeepSeek V4 is now merged into SGLang main with v0.5.12. What we shipped at launch: 🔹 ShadowRadix: native prefix caching for V4's hybrid attention 🔹 HiSparse: CPU-extended KV for sparse attention (up to 3× long-context throughput) 🔹 MTP speculative decoding with in-graph metadata preparation 🔹 W4A8 MegaMoE kernel 🔹 Flash Compressor + Lightning TopK kernels 🔹 Multiple parallelism methods: Tensor Parallelism/Expert Parallelism/Context Parallelism/Data Parallelism Attention 🔹 Prefill Decode Disaggregation 🔹 Hardware: H100, H200, B200, B300, GB200, GB300, MI35X And what we added since: 🔹 HiCache for V4 under UnifiedRadixTree 🔹 W4A4 MegaMoE kernels for faster MegaMoE 🔹 Marlin/FlashInfer MXFP4 (W4A16) MoE on Hopper 🔹 Hierarchical multi-stream overlap for small-batch decode 🔹 Optimized mHC pipeline: DeepGemm + fused norm + fused hc_head 🔹 Faster KV Compression V2 kernel 🔹 Fused SiLU+clamp+FP8 quantization kernel 🔹 Support TP16 on H100/H20 🔹 Support Multiple Detokenizers 🔹Pipeline Parallelism 🔹One docker image for all supported Nvidia hardware Thanks to @NVIDIAAI, @AMD, @ant_oss, @alibaba_cloud, ByteDance, @iFLYTEKLab, @radixark, and @pranjalssh for the work we shipped together on V4 🙌 More in 0.5.12 👇
LMSYS Org tweet media
English
9
34
202
14K
mei retweetledi
Zyphra
Zyphra@ZyphraAI·
We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵
Zyphra tweet media
English
22
87
693
1.1M
mei retweetledi
Eric Alcaide
Eric Alcaide@eric_alcaide·
SGLang team is cracked. Respect 🫡
LMSYS Org@lmsysorg

🌊 SGLang now supports @poolsideai's Laguna-XS.2, a 33.4B-A3B hybrid SWA + MoE model purpose-built for agentic coding and long-horizon SWE work ☑️ SWE-bench Verified 68.2%; Multilingual 62.4%; Pro 44.5%; Terminal-Bench 2.0 30.1% ☑️ 131K-token context for long agent traces ☑️ Native poolside_v1 reasoning + tool-call parsers (OpenAI-compatible) ☑️ BF16, FP8, and NVFP4 quantizations 👉 Cookbook: docs.sglang.io/cookbook/autor…

English
0
5
19
3.3K
mei retweetledi
Zyphra
Zyphra@ZyphraAI·
Today we’re announcing 15MW of AMD Instinct MI355 GPU capacity through Zyphra Cloud, our full-stack neocloud powered by @AMD.
Zyphra tweet media
English
2
35
361
862K
mei retweetledi
Lucas Atkins
Lucas Atkins@latkins·
I’ve been consistently impressed by zephyra, and have always felt a kinship with their cause. Beautiful work across the board, and what a slate of releases this week. Western open weights is going to have a hell of a year.
Zyphra@ZyphraAI

Today we're releasing ZAYA1-VL-8B, our first vision-language model. ZAYA1-VL-8B is a 700M active / 8B total MoE built on our ZAYA1-8B base trained on @AMD. We achieve strong performance for our size resulting in leading intelligence density and inference efficiency.

English
2
4
64
4.3K
mei retweetledi
Beren Millidge
Beren Millidge@BerenMillidge·
With this release we have rounded out our full suite of core modalities: Language, Vision, Audio, and Thought This is the first step on our path to ubiquitous and efficient open visual understanding, and we have an exciting roadmap ahead. Congrats to the team. Amazing work!
Zyphra@ZyphraAI

Today we're releasing ZAYA1-VL-8B, our first vision-language model. ZAYA1-VL-8B is a 700M active / 8B total MoE built on our ZAYA1-8B base trained on @AMD. We achieve strong performance for our size resulting in leading intelligence density and inference efficiency.

English
2
4
54
4.8K
mei retweetledi
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
Amazing work from the @sgl_project and @radixark team for their work optimizing DeepSeek V4 inference on B200, B300, and the recent 4x iso-interactivity throughput improvements on GB300 by @ChengWan17! As @elonmusk said, The GB300 is the best AI computer, and software optimizations like this show its true potential!
SemiAnalysis tweet media
English
8
36
263
35.3K
mei retweetledi
Robert Washbourne
Robert Washbourne@rawsh0·
new model! strong <1B active MoE led data and posttraining for this release. cca goat @rishiiyer01 and the pretraining squad cooked x.com/ZyphraAI/statu…
Zyphra@ZyphraAI

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English
8
12
73
5.5K
mei retweetledi
Beren Millidge
Beren Millidge@BerenMillidge·
Incredible work from the entire Zyphra team for this one! We never expected that our small ZAYA1 would be able to compete (at least in math) with the frontier giants. Our post-training and pre-training stacks are strong. More general thoughts on the ZAYA release, a 🧵
Zyphra@ZyphraAI

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English
9
8
87
6.3K
mei retweetledi
𝚐𝔪𝟾𝚡𝚡𝟾
Zyphra remains one of my favorite teams in the game because the releases all point in the same direction: capable AI that is cheaper to train, cheaper to run, and easier to deploy across modalities. ZAYA1-8B is the latest proof point for that pattern, extending Zyphra’s all-AMD ZAYA1 stack into post-training. The base model showed AMD Instinct MI300 hardware could train a competitive MoE. ZAYA1-8B was pretrained, midtrained, and SFT’d on a 1,024-node MI300X cluster with AMD Pensando Pollara interconnect built with IBM. This release shows the reasoning side: 8.4B total parameters, only 760M active, trained end-to-end by Zyphra, then pushed into math/code-heavy reasoning where it competes with much larger open reasoning models. The key is active-parameter efficiency. ZAYA1-8B is not a dense 8B; it is a small MoE with sub-1B active compute per token. Architecturally, Zyphra changed three pieces versus a standard MoE: Compressed Convolutional Attention for sequence mixing in a compressed latent space with 8× KV-cache compression, an MLP-based router with PID-controller bias balancing, and learned residual scaling. The training recipe is the other major piece: ZAYA1-8B was trained from scratch for reasoning, with long-CoT data included from pretraining onward using answer-preserving trimming. Post-training then runs SFT followed by a four-stage RL cascade: reasoning warmup on math and puzzles, a 400-task RLVE-Gym adaptive curriculum, math/code RL with TTC traces and synthetic code environments, then behavioral RL for chat and instruction following. The test-time-compute piece is Markovian RSA: multiple traces are generated in parallel, fixed-length tail segments are carried forward, and recursive aggregation prompts seed the next round. The point is bounded context during extended reasoning: with the 40K/4K configuration, ZAYA1-8B reaches 91.9 on AIME’25 and 89.6 on HMMT’25 while forwarding only a 4K-token tail. Outside the TTC setup, what stands out is the reasoning density: AIME’26 89.1, HMMT Feb.’26 71.6, IMO-AnswerBench 59.3, LiveCodeBench-v6 65.8, GPQA-Diamond 71.0, and MMLU-Pro 74.2 from a 760M-active / 8.4B-total MoE. ZAYA1-8B is the small-active MoE reasoning recipe in practice: sparse active compute, efficient inference, and enough reasoning density to make local and test-time-compute deployments interesting.
𝚐𝔪𝟾𝚡𝚡𝟾 tweet media
Zyphra@ZyphraAI

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English
1
3
38
3.8K
mei retweetledi
samsja
samsja@samsja19·
not many Western open-source labs are willing to take real research risks, but @ZyphraAI is one of the few that does great release, interesting new arch, strong rl. Missing a bit of agentic but strong potential imo I feel like the folks at @ZyphraAI are massively underrated
Zyphra@ZyphraAI

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English
8
12
207
28.7K