rishi

Pavlo Molchanov@PavloMolchanov

0

33

865

Reflection@reflection_ai·3d

Our open models are designed to support the Genesis Mission by giving the scientists in our national labs the flexibility and sovereignty to work on their own terms. Learn more ⤵️

Reflection@reflection_ai

x.com/i/article/2057…

English

7

11

61

34.4K

rishi@rishiiyer01·6d

frontier will catch on soon

We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between 1⃣Autoregressive, 2⃣Diffusion, and 3⃣Self-Speculation decoding by simply changing the attention pattern/mask. One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels. Up to 4× higher real throughput for a single user. 🤗 HF Collection: huggingface.co/collections/nv…, open license 🛜 Project page: research.nvidia.com/publication/20… 📰 Tech report: bit.ly/Nemotron-Labs-… Details below 👇

English

0

27

4.3K

rishi@rishiiyer01·6d

@ishaans22 logo 3 early shot clock to tie the game before double ot? at 7'7?

English

0

2

54

ishaan@ishaans22·6d

@rishiiyer01 Yes

0

1

52

rishi@rishiiyer01·6d

ive seen enough wemby is the goat

English

0

6

220

rishi@rishiiyer01·15 May

@LLMenjoyer we used to be the duo fr

English

0

5

121

llm_enjoyer@LLMenjoyer·15 May

i remember when this model was js homie’s schizo project, he literally took it from 0 to 100. proud of u homie 😭😭

rishi@rishiiyer01

Leading the training for this model was a privilege. Training diffusion style models will be the future regardless of whether it is discrete/speculative or continuous.

English

3

1

27

1.7K

rishi retweetledi

Robert Washbourne@rawsh0·15 May

@rishiiyer01 cooked here. very excited about scaling ttc with diffusion - sparse active params + diffusion decode means reasoning models can punch above their weight class with competitive latency x.com/rishiiyer01/st…

rishi@rishiiyer01

Leading the training for this model was a privilege. Training diffusion style models will be the future regardless of whether it is discrete/speculative or continuous.

English

16

847

rishi@rishiiyer01·15 May

@Laz4rz @ZyphraAI very soon, after posttraining is complete.

English

0

10

173

Lazarz@Laz4rz·15 May

Booom 💥 Ekhem hello @ZyphraAI full tech report?

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

Svenska

0

16

1.7K

rishi retweetledi

Robert Washbourne@rawsh0·15 May

very excited about scaling ttc with diffusion. sparse active params + diffusion decode means reasoning models can punch above their weight class with competitive latency x.com/ZyphraAI/statu…

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English

3

4

45

2.9K

rishi retweetledi

Beren Millidge@BerenMillidge·15 May

Diffusion is the endpoint of inference. Decode becomes as efficient as training. Everything sits on the roofline. Pure FLOPs will be the only bottleneck. Excited to push the frontier on language diffusion. Massive congrats to the team and an exciting time ahead.

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

English

5

46

6.4K

rishi@rishiiyer01·15 May

@rawsh0 @JZWANG_T1 I am additionally excited for RL native in diffusion mode

English

0

2

51

rishi@rishiiyer01·15 May

RL and vllm optimized inference otw from @rawsh0 @JZWANG_T1

English

0

4

205

rishi@rishiiyer01·15 May

Leading the training for this model was a privilege. Training diffusion style models will be the future regardless of whether it is discrete/speculative or continuous.

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

English

7

8

78

7.7K

rishi retweetledi

Justus Mattern@MatternJustus·12 May

Hosting a research meetup in our North Beach office on Thursday! Come by for food, drinks and talks: @jyangballin (MSL) will present ProgramBench @rawsh0 & @rishiiyer01 (Zyphra) will talk about ZAYA-8B @evan_j_chu and I will speak FrontierSWE and our research bets!

English

6

10

147

34.2K

rishi@rishiiyer01·11 May

Ngl it’s Wednesday now

Stijn@StijnSmits

@rishiiyer01 waiting

English

0

9

756

rishi@rishiiyer01·10 May

@LLMenjoyer @rawsh0 i might nano banana all the pictures I have of you in my phone and put them on main

English

0

5

127

llm_enjoyer@LLMenjoyer·10 May

@rawsh0 If you are using ZAYA1-8B in openwebui, make sure you stop doing that and use a bigger model immediately.

English

0

6

230

Robert Washbourne@rawsh0·9 May

If you are using ZAYA1-8B in openwebui, make sure you set frequency_penalty 0, presence_penalty 0, repeat_penalty 1 (disabled), and temperature 1, top_p 0.95, top k 0 (disabled). default settings can cause looping 🔁

English

2

14

943

rishi@rishiiyer01·8 May

where do I get chunghwa from in sf

English

0

5

500

rishi@rishiiyer01·8 May

@HudsonGouge Yes, this is exactly why we decided to release our results (agentic, math, code). Losing on GPQA-d and Mmlu-pro are not the end of the world here. Bigger models should win on these knowledge based evals, though we should still come quite close

English

0

2

63

Hudson Gouge@HudsonGouge·8 May

@rishiiyer01 Now, I don't mean that in a bad way. That's amazing for such a small lab.

English

0

2

29

rishi@rishiiyer01·8 May

Lots of potential in this model to exploit in RL. Particularly proud of the context extension engineering I cooked on this one. I’m personally far more confident in scaling some of my crazy arch ideas now as well