So Yeon (Tiffany) Min

584 posts

So Yeon (Tiffany) Min

So Yeon (Tiffany) Min

@SoYeonTiffMin

MTS @MicrosoftAI Superintelligence; Prev @AnthropicAI, @Meta, @Apple, PhD @mldcmu, B.S./M.Eng from @MIT.

Katılım Ekim 2021
277 Takip Edilen1.3K Takipçiler
Shrimai
Shrimai@shrimai_·
Thrilled to be joining @MistralAI 😻! Looking forward to pushing the frontier of pretraining and advancing the next generation of LLMs. 💚Grateful for my time at @nvidia -- it's been an incredible foundation for this next chapter.
English
21
11
347
17K
So Yeon (Tiffany) Min retweetledi
Devendra Chaplot
Devendra Chaplot@dchaplot·
I'm joining SpaceX and xAI, working closely with Elon and team to build superintelligence. Together SpaceX and xAI combine physical and digital intelligence under a leader who understands hardware at the deepest level. Add a high-agency culture with frontier-scale resources, and you get the possibility to achieve something truly unique. I’m excited to advance the fields I’ve obsessed over for years, from robotics research to building AI models on the founding teams of Mistral and TML. Both were extraordinary journeys with extraordinary people that shaped how I think about building intelligence from the ground up. Grateful for everything that brought me here and can’t wait to get started.
Devendra Chaplot tweet media
English
2.9K
2.1K
28.2K
43.5M
So Yeon (Tiffany) Min retweetledi
Fahim Tajwar
Fahim Tajwar@FahimTajwar10·
Are we done with new RL algorithms? Turns out we might have been optimizing the wrong objective. Introducing MaxRL, a framework to bring maximum likelihood optimization to RL settings. Paper + code + project website: zanette-labs.github.io/MaxRL/ 🧵 1/n
English
14
161
807
205.7K
So Yeon (Tiffany) Min retweetledi
So Yeon (Tiffany) Min retweetledi
Shrimai
Shrimai@shrimai_·
Excited to be part of the launch of @nvidia Nemotron 3 Nano (30B) 🚀 A hybrid MoE reasoning model with 1M context, SWE-Bench-leading performance, and 1.5–3.3× faster inference. Super and Ultra are coming in the next few months. Open, fast, frontier-level 🔥
Shrimai tweet media
English
1
3
33
2K
So Yeon (Tiffany) Min retweetledi
Devendra Chaplot
Devendra Chaplot@dchaplot·
Tinker is now open to everyone! We are also adding: - Vision support with Qwen3-VL - New model: Kimi K2 Thinking (1T params) - OpenAI API-compatible inference Start training models within minutes: thinkingmachines.ai/blog/tinker-ge…
Thinking Machines@thinkymachines

Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. thinkingmachines.ai/blog/tinker-ge…

English
13
33
548
134.8K
So Yeon (Tiffany) Min retweetledi
Jing Yu Koh
Jing Yu Koh@kohjingyu·
I resigned from TBD Lab / MSL last Friday. It was a really fulfilling experience leading the computer use agents team over the past 1.45 years, and I learnt and grew a ton personally and professionally. It was a really fun opportunity to build CUA infra, data pipelines, evals, and models from scratch, and work together with a very talented team to discover how to get to frontier level CUA performance. I was also really impressed by the calibre of TBD Lab and how quickly everything came together. It was a difficult decision to leave: I will miss my colleagues and friends at Meta, but I'm sure our paths will cross again soon!
Jing Yu Koh tweet media
English
26
18
692
199K
So Yeon (Tiffany) Min retweetledi
Yutong (Kelly) He
Yutong (Kelly) He@electronickale·
Diffusion/Flow-based models can sample in 1-2 steps now 👍 But likelihood? Still requires 100-1000 NFEs (even for these fast models) 😭 We fix this! Introducing F2D2: simultaneous fast sampling AND fast likelihood via joint flow map distillation. arxiv.org/abs/2512.02636 1/🧵
Yutong (Kelly) He tweet media
English
9
72
442
131.2K
So Yeon (Tiffany) Min retweetledi
Russ Salakhutdinov
Russ Salakhutdinov@rsalakhu·
Check out new work: Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models Paper: arxiv.org/abs/2512.02636 Log-likelihood evaluation enables key capabilities in generative modeling: model comparison, certain fine-tuning objectives, and many downstream tasks. Yet state-of-the-art diffusion and flow models still require hundreds to thousands of NFEs to compute them. Recent distillation methods speed up sampling but break likelihood tractability. This work develops Fast Flow Joint Distillation (F2D2), which cuts NFEs for both sampling and likelihood evaluation by two orders of magnitude. The key insight is that in continuous normalizing flows, the coupled ODEs for sampling and likelihood are computed from a shared underlying velocity field, allowing one to jointly distill sampling trajectories and cumulative divergence using a single model. F2D2 preserves sample quality while enabling accurate few-step likelihoods, resolving a long-standing computational bottleneck. With a lightweight self-guidance technique, a 2-step MeanFlow model can even outperform its 1024-step teacher using just one extra backward NFE. Check out an excellent detailed thread by @electronickale!
Yutong (Kelly) He@electronickale

Diffusion/Flow-based models can sample in 1-2 steps now 👍 But likelihood? Still requires 100-1000 NFEs (even for these fast models) 😭 We fix this! Introducing F2D2: simultaneous fast sampling AND fast likelihood via joint flow map distillation. arxiv.org/abs/2512.02636 1/🧵

English
1
4
48
13.8K
So Yeon (Tiffany) Min retweetledi
So Yeon (Tiffany) Min retweetledi
Russ Salakhutdinov
Russ Salakhutdinov@rsalakhu·
Predicting AGI/ASI timelines has become trendy, so I’ll offer mine: AGI/ASI is 5–10 years away. It always has been. It always will be.
English
98
33
637
74.2K
So Yeon (Tiffany) Min retweetledi
Russ Salakhutdinov
Russ Salakhutdinov@rsalakhu·
Check out the new @mldcmu blogpost: How to Explore to Scale RL Training of LLMs on Hard Problems? blog.ml.cmu.edu/2025/11/26/how… Current on-policy RL methods fail to learn from hard problems as they rarely generate a single correct rollout, producing no reward signal and no learning. Including easy problems can also be harmful, as models tend to overfit to them and fail to improve on harder tasks. Distilling human-written solutions is not only costly, but also provides difficult targets for fine-tuning. This blogpost discusses various approaches and introduces a framework that uses existing human or model solutions as privileged guidance to unlock learning on hard problems. The key idea is simple: Prepend a minimal solution prefix to difficult prompts, enabling on-policy RL to obtain reward and learn behaviors that generalize back to the original, unconditioned tasks. This expands the set of solvable problems and results in significant gains on challenging reasoning benchmarks. With @QuYuxiao, @setlur_amrith, @gingsmith, @aviral_kumar2. Paper/Code is coming soon.
GIF
English
6
45
270
25.5K
So Yeon (Tiffany) Min retweetledi
So Yeon (Tiffany) Min retweetledi
Shrimai
Shrimai@shrimai_·
Thank you @rohanpaul_ai for highlighting our work!💫 Front-Loading Reasoning shows that inclusion of reasoning data in pretraining is beneficial, does not lead to overfitting after SFT, & has latent effect unlocked by SFT! Paper: arxiv.org/abs/2510.03264 Blog: research.nvidia.com/labs/adlr/Syne…
Rohan Paul@rohanpaul_ai

New @nvidia paper shows that teaching reasoning early during pretraining builds abilities that later fine-tuning cannot recover. Doing this early gives a 19% average boost on tough tasks after all post-training. Pretraining is the long first stage where the model learns to predict the next word from lots of text. Supervised fine-tuning is a later stage where it studies step by step answers from labeled examples. Reinforcement learning then rewards better answers so the model improves further. Diversity matters most in pretraining, while high quality matters most in supervised fine-tuning, roughly 11% vs 15% gains. Even doubling supervised fine-tuning on a base that skipped early reasoning could not catch up. Adding lots of mixed-quality supervised fine-tuning data even cut math by about 5%. High quality reasoning added in pretraining looked small at first, then showed up strongly after supervised fine-tuning. Teams should load diverse reasoning into pretraining, use a small high quality set for supervised fine-tuning, then stabilize with rewards. ---- Paper – arxiv. org/abs/2510.03264 Paper Title: "Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data"

English
0
2
9
1.1K
So Yeon (Tiffany) Min retweetledi
So Yeon (Tiffany) Min retweetledi
Dylan Sam
Dylan Sam@dylanjsam·
Very interesting insights into understanding when and why synthetic data (although imperfect and biased) can boost the performance of statistical inference!! 📈📈
Emily Byun@yewonbyun_

💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data

English
0
4
13
1.9K