Ryan Yixiang Wang (@RyanYixiang) - Twitter Profili

Sabitlenmiş Tweet

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English

7

73

527

111.2K

Ryan Yixiang Wang retweetledi

Turing Post@TheTuringPost·11 May

A new Mixture-of-Experts from @allen_ai – EMO Finally, it brings real modularity to MoE architectures, and small groups of experts can work independently. ➡️ Tokens from the same document (which usually belong to the same domain) are routed through a shared pool of experts. The pool size controls how modular the model becomes. Here is how EMO works:

English

3

19

92

7.3K

Ryan Yixiang Wang retweetledi

Sewon Min@sewon__min·8 May

As MoEs grow larger and sparser, they become memory-bottlenecked. What if experts were actually composable - so you only keep the subset relevant to your task? We show that this doesn't emerge in standard MoEs (their training makes this hard), but you can pre-train MoEs to support this kind of modularity! I hope everyone sees the right figure from @RyanYixiang 's original post - I was so excited when I saw this result!!

Ryan Yixiang Wang@RyanYixiang

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

English

4

41

324

46.6K

Ryan Yixiang Wang@RyanYixiang·8 May

@AkshitaB93 @sewon__min + paper can be found here! arxiv.org/abs/2605.06663

English

0

3

15

735

Ryan Yixiang Wang@RyanYixiang·8 May

8/ For reproducibility and to enable further study of modularity in MoEs, we’re releasing EMO, baselines, and code: Models: hf.co/collections/al… Blog: allenai.org/blog/emo Code: github.com/allenai/EMO Viz: emovisualization.netlify.app Shoutout to @AkshitaB93 @sewon__min for making this possible!

English

1

3

14

1.4K

Ryan Yixiang Wang@RyanYixiang·8 May

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English

7

73

527

111.2K

Ryan Yixiang Wang retweetledi

Ai2@allen_ai·8 May

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English

13

57

402

84.5K

Ryan Yixiang Wang retweetledi

Johnny Tian-Zheng Wei@johntzwei·24 Eki

Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵

English

2

41

131

49.6K

Ryan Yixiang Wang retweetledi

Wenjie Ma@wenjie_ma·17 Eki

LLMs solving math benchmarks with verifiable answers like AIME? ✅ LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key bottleneck: reliable proof evaluation. Without a good evaluator, we can't automatically evaluate or train better "provers." Our new work tackles this challenge step by step. 🧵 📄 Paper: arxiv.org/pdf/2510.13888

English

9

37

196

60.3K

Ryan Yixiang Wang retweetledi

Tianyi Lorena Yan@LorenaYannnnn·27 Mar

When answering queries with multiple answers (e.g., listing cities of a country), how do LMs simultaneously recall knowledge and avoid repeating themselves? 🚀 Excited to share our latest work with @robinomial! We uncover a promote-then-suppress mechanism: LMs first recall all answers and then suppress previously generated ones. arxiv.org/abs/2502.20475 👇🧵

English

4

22

109

16.6K

Ryan Yixiang Wang retweetledi

Tianyi Zhou@tianyi_zhou12·6 Şub

Great to see others discovering similar findings as we did in our Neurips2024 paper (arxiv.org/abs/2406.03445). We call these Fourier features instead of helix. How are these features useful for representing numbers? Stay tuned for our new number embedding paper coming soon!

Subhash Kantamneni@thesubhashk

(1/N) LLMs represent numbers on a helix? And use trigonometry to do addition? Answers below 🧵

English

0

5

17

2.8K

Ryan Yixiang Wang@RyanYixiang·11 Ağu

Presenting our work at ACL on using data watermarks for detecting if an LM used your data during pretraining (with statistical guarantees)! Come find me and @johntzwei Monday 5:45 pm! Also happy to chat about how hot Bangkok is, or anything LM pre-training/memorization, etc

Ryan Yixiang Wang@RyanYixiang

We could detect SHA hashes that only occurred in BLOOM-176B pre-training data 90 times! For reference BLOOM has a training corpus of 341B tokens 🫣🥸

English

0

5

16

2.6K

Ryan Yixiang Wang@RyanYixiang·19 Şub

We could detect SHA hashes that only occurred in BLOOM-176B pre-training data 90 times! For reference BLOOM has a training corpus of 341B tokens 🫣🥸

Johnny Tian-Zheng Wei@johntzwei

To detect if your data was used for LLM pretraining, consider using data watermarks: arxiv.org/pdf/2402.10892… Detection can be framed as hypothesis testing (statistical guarantees!), if you contributed multiple training documents and watermarked them before public release. 🧵

English

0

1

11

6.8K

Ryan Yixiang Wang@RyanYixiang·6 May

Chirp chirp

English

0

1

0

Ryan Yixiang Wang

Keşfet