Ryan Yixiang Wang

19 posts

Ryan Yixiang Wang banner
Ryan Yixiang Wang

Ryan Yixiang Wang

@RyanYixiang

Phding at @berkeleynlp with @sewon__min | prev @nlp_usc

Berkeley, CA Katılım Haziran 2017
518 Takip Edilen373 Takipçiler
Sabitlenmiş Tweet
Ryan Yixiang Wang
Ryan Yixiang Wang@RyanYixiang·
MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!
Ryan Yixiang Wang tweet media
Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English
7
73
527
111.2K
Ryan Yixiang Wang retweetledi
Turing Post
Turing Post@TheTuringPost·
A new Mixture-of-Experts from @allen_ai – EMO Finally, it brings real modularity to MoE architectures, and small groups of experts can work independently. ➡️ Tokens from the same document (which usually belong to the same domain) are routed through a shared pool of experts. The pool size controls how modular the model becomes. Here is how EMO works:
Turing Post tweet media
English
3
19
92
7.3K
Ryan Yixiang Wang retweetledi
Sewon Min
Sewon Min@sewon__min·
As MoEs grow larger and sparser, they become memory-bottlenecked. What if experts were actually composable - so you only keep the subset relevant to your task? We show that this doesn't emerge in standard MoEs (their training makes this hard), but you can pre-train MoEs to support this kind of modularity! I hope everyone sees the right figure from @RyanYixiang 's original post - I was so excited when I saw this result!!
Ryan Yixiang Wang@RyanYixiang

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

English
4
41
324
46.6K
Ryan Yixiang Wang
Ryan Yixiang Wang@RyanYixiang·
MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!
Ryan Yixiang Wang tweet media
Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English
7
73
527
111.2K
Ryan Yixiang Wang retweetledi
Ai2
Ai2@allen_ai·
Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵
Ai2 tweet media
English
13
57
402
84.5K
Ryan Yixiang Wang retweetledi
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
Johnny Tian-Zheng Wei tweet media
English
2
41
131
49.6K
Ryan Yixiang Wang retweetledi
Wenjie Ma
Wenjie Ma@wenjie_ma·
LLMs solving math benchmarks with verifiable answers like AIME? ✅ LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key bottleneck: reliable proof evaluation. Without a good evaluator, we can't automatically evaluate or train better "provers." Our new work tackles this challenge step by step. 🧵 📄 Paper: arxiv.org/pdf/2510.13888
English
9
37
196
60.3K
Ryan Yixiang Wang retweetledi
Tianyi Lorena Yan
Tianyi Lorena Yan@LorenaYannnnn·
When answering queries with multiple answers (e.g., listing cities of a country), how do LMs simultaneously recall knowledge and avoid repeating themselves? 🚀 Excited to share our latest work with @robinomial! We uncover a promote-then-suppress mechanism: LMs first recall all answers and then suppress previously generated ones. arxiv.org/abs/2502.20475 👇🧵
Tianyi Lorena Yan tweet media
English
4
22
109
16.6K
Ryan Yixiang Wang
Ryan Yixiang Wang@RyanYixiang·
Presenting our work at ACL on using data watermarks for detecting if an LM used your data during pretraining (with statistical guarantees)! Come find me and @johntzwei Monday 5:45 pm! Also happy to chat about how hot Bangkok is, or anything LM pre-training/memorization, etc
Ryan Yixiang Wang@RyanYixiang

We could detect SHA hashes that only occurred in BLOOM-176B pre-training data 90 times! For reference BLOOM has a training corpus of 341B tokens 🫣🥸

English
0
5
16
2.6K
Ryan Yixiang Wang
Ryan Yixiang Wang@RyanYixiang·
We could detect SHA hashes that only occurred in BLOOM-176B pre-training data 90 times! For reference BLOOM has a training corpus of 341B tokens 🫣🥸
Johnny Tian-Zheng Wei@johntzwei

To detect if your data was used for LLM pretraining, consider using data watermarks: arxiv.org/pdf/2402.10892… Detection can be framed as hypothesis testing (statistical guarantees!), if you contributed multiple training documents and watermarked them before public release. 🧵

English
0
1
11
6.8K