rishi

2.2K posts

rishi banner
rishi

rishi

@rishiiyer01

squeezing water out of stone

Katılım Ocak 2017
702 Takip Edilen709 Takipçiler
rishi
rishi@rishiiyer01·
@ishaans22 logo 3 early shot clock to tie the game before double ot? at 7'7?
English
0
0
2
54
rishi
rishi@rishiiyer01·
ive seen enough wemby is the goat
English
1
0
6
220
rishi retweetledi
Robert Washbourne
Robert Washbourne@rawsh0·
very excited about scaling ttc with diffusion. sparse active params + diffusion decode means reasoning models can punch above their weight class with competitive latency x.com/ZyphraAI/statu…
Zyphra@ZyphraAI

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English
3
4
45
2.9K
rishi retweetledi
Beren Millidge
Beren Millidge@BerenMillidge·
Diffusion is the endpoint of inference. Decode becomes as efficient as training. Everything sits on the roofline. Pure FLOPs will be the only bottleneck. Excited to push the frontier on language diffusion. Massive congrats to the team and an exciting time ahead.
Zyphra@ZyphraAI

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

English
2
5
46
6.4K
rishi
rishi@rishiiyer01·
@rawsh0 @JZWANG_T1 I am additionally excited for RL native in diffusion mode
English
0
0
2
51
rishi
rishi@rishiiyer01·
Leading the training for this model was a privilege. Training diffusion style models will be the future regardless of whether it is discrete/speculative or continuous.
Zyphra@ZyphraAI

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

English
7
8
78
7.7K
rishi retweetledi
Justus Mattern
Justus Mattern@MatternJustus·
Hosting a research meetup in our North Beach office on Thursday! Come by for food, drinks and talks: @jyangballin (MSL) will present ProgramBench @rawsh0 & @rishiiyer01 (Zyphra) will talk about ZAYA-8B @evan_j_chu and I will speak FrontierSWE and our research bets!
Justus Mattern tweet mediaJustus Mattern tweet media
English
6
10
147
34.2K
rishi
rishi@rishiiyer01·
@LLMenjoyer @rawsh0 i might nano banana all the pictures I have of you in my phone and put them on main
English
1
0
5
127
llm_enjoyer
llm_enjoyer@LLMenjoyer·
@rawsh0 If you are using ZAYA1-8B in openwebui, make sure you stop doing that and use a bigger model immediately.
English
2
0
6
230
Robert Washbourne
Robert Washbourne@rawsh0·
If you are using ZAYA1-8B in openwebui, make sure you set frequency_penalty 0, presence_penalty 0, repeat_penalty 1 (disabled), and temperature 1, top_p 0.95, top k 0 (disabled). default settings can cause looping 🔁
Robert Washbourne tweet media
English
1
2
14
943
rishi
rishi@rishiiyer01·
where do I get chunghwa from in sf
English
1
0
5
500
rishi
rishi@rishiiyer01·
@HudsonGouge Yes, this is exactly why we decided to release our results (agentic, math, code). Losing on GPQA-d and Mmlu-pro are not the end of the world here. Bigger models should win on these knowledge based evals, though we should still come quite close
English
1
0
2
63
Hudson Gouge
Hudson Gouge@HudsonGouge·
@rishiiyer01 Now, I don't mean that in a bad way. That's amazing for such a small lab.
English
1
0
2
29
rishi
rishi@rishiiyer01·
Lots of potential in this model to exploit in RL. Particularly proud of the context extension engineering I cooked on this one. I’m personally far more confident in scaling some of my crazy arch ideas now as well
Zyphra@ZyphraAI

Today we're releasing ZAYA1-74B-Preview, a major milestone in scaling pretraining on @AMD. ZAYA1-74B-Preview is a 4B active / 74B total MoE. This preview model is a strong pre-RL base checkpoint. The final post-trained reasoning model is coming soon. 🧵

English
5
3
66
4.5K