So Yeon (Tiffany) Min

584 posts

So Yeon (Tiffany) Min

@SoYeonTiffMin

MTS @MicrosoftAI Superintelligence; Prev @AnthropicAI, @Meta, @Apple, PhD @mldcmu, B.S./M.Eng from @MIT.

Katılım Ekim 2021

277 Takip Edilen1.3K Takipçiler

So Yeon (Tiffany) Min@SoYeonTiffMin·5d

@shrimai_ @MistralAI @nvidia Congrats @shrimai_ !!

English

160

Shrimai@shrimai_·5d

Thrilled to be joining @MistralAI 😻! Looking forward to pushing the frontier of pretraining and advancing the next generation of LLMs. 💚Grateful for my time at @nvidia -- it's been an incredible foundation for this next chapter.

English

347

17K

So Yeon (Tiffany) Min@SoYeonTiffMin·5d

@rsalakhu @electronickale @subail @chuckjhoover @asenkut @FHaskaraman Congratulations, Russ! 👏👏🚀🚀

English

So Yeon (Tiffany) Min retweetledi

Russ Salakhutdinov@rsalakhu·22 Nis

Foresight will be the defining frontier on the path to AGI. I am excited to start Sooth Labs with my amazing co-founders: Yaser Sheikh @subail, Chuck Hoover @chuckjhoover, David LaRose, and Shih-En Wei. Deeply grateful to Aydin Senkut @asenkut and Feyza Haskaraman @FHaskaraman at @felicis for leading the round, alongside an exceptional group of partners. bloomberg.com/news/articles/…

English

272

70K

So Yeon (Tiffany) Min retweetledi

Devendra Chaplot@dchaplot·14 Mar

I'm joining SpaceX and xAI, working closely with Elon and team to build superintelligence. Together SpaceX and xAI combine physical and digital intelligence under a leader who understands hardware at the deepest level. Add a high-agency culture with frontier-scale resources, and you get the possibility to achieve something truly unique. I’m excited to advance the fields I’ve obsessed over for years, from robotics research to building AI models on the founding teams of Mistral and TML. Both were extraordinary journeys with extraordinary people that shaped how I think about building intelligence from the ground up. Grateful for everything that brought me here and can’t wait to get started.

English

2.9K

2.1K

28.2K

43.5M

So Yeon (Tiffany) Min retweetledi

Fahim Tajwar@FahimTajwar10·5 Şub

Are we done with new RL algorithms? Turns out we might have been optimizing the wrong objective. Introducing MaxRL, a framework to bring maximum likelihood optimization to RL settings. Paper + code + project website: zanette-labs.github.io/MaxRL/ 🧵 1/n

English

161

807

205.7K

So Yeon (Tiffany) Min retweetledi

Sanket Vaibhav Mehta (SVM)@sanketvmehta·17 Ara

Gemini ⚡️ ⚡️ ⚡️

Sundar Pichai@sundarpichai

We’re back in a Flash ⚡ Gemini 3 Flash is our latest model with frontier intelligence built for lightning speed, and pushing the Pareto Frontier of performance and efficiency. It outperforms 2.5 Pro while being 3x faster at a fraction of the cost. With this release, Gemini 3’s next-generation intelligence is now rolling out to everyone across our products including @Geminiapp + AI Mode in Search. Devs can build with it in the Gemini API @GoogleAIStudio, Gemini CLI, and Google @antigravity and enterprises can get it in Vertex AI and Gemini Enterprise.

Indonesia

739

So Yeon (Tiffany) Min retweetledi

Shrimai@shrimai_·15 Ara

Excited to be part of the launch of @nvidia Nemotron 3 Nano (30B) 🚀 A hybrid MoE reasoning model with 1M context, SWE-Bench-leading performance, and 1.5–3.3× faster inference. Super and Ultra are coming in the next few months. Open, fast, frontier-level 🔥

English

So Yeon (Tiffany) Min retweetledi

Devendra Chaplot@dchaplot·12 Ara

Tinker is now open to everyone! We are also adding: - Vision support with Qwen3-VL - New model: Kimi K2 Thinking (1T params) - OpenAI API-compatible inference Start training models within minutes: thinkingmachines.ai/blog/tinker-ge…

Thinking Machines@thinkymachines

Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. thinkingmachines.ai/blog/tinker-ge…

English

548

134.8K

So Yeon (Tiffany) Min retweetledi

Jing Yu Koh@kohjingyu·13 Ara

I resigned from TBD Lab / MSL last Friday. It was a really fulfilling experience leading the computer use agents team over the past 1.45 years, and I learnt and grew a ton personally and professionally. It was a really fun opportunity to build CUA infra, data pipelines, evals, and models from scratch, and work together with a very talented team to discover how to get to frontier level CUA performance. I was also really impressed by the calibre of TBD Lab and how quickly everything came together. It was a difficult decision to leave: I will miss my colleagues and friends at Meta, but I'm sure our paths will cross again soon!

English

692

199K

So Yeon (Tiffany) Min retweetledi

Yutong (Kelly) He@electronickale·10 Ara

Diffusion/Flow-based models can sample in 1-2 steps now 👍 But likelihood? Still requires 100-1000 NFEs (even for these fast models) 😭 We fix this! Introducing F2D2: simultaneous fast sampling AND fast likelihood via joint flow map distillation. arxiv.org/abs/2512.02636 1/🧵

English

442

131.2K

So Yeon (Tiffany) Min retweetledi

Russ Salakhutdinov@rsalakhu·12 Ara

Check out new work: Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models Paper: arxiv.org/abs/2512.02636 Log-likelihood evaluation enables key capabilities in generative modeling: model comparison, certain fine-tuning objectives, and many downstream tasks. Yet state-of-the-art diffusion and flow models still require hundreds to thousands of NFEs to compute them. Recent distillation methods speed up sampling but break likelihood tractability. This work develops Fast Flow Joint Distillation (F2D2), which cuts NFEs for both sampling and likelihood evaluation by two orders of magnitude. The key insight is that in continuous normalizing flows, the coupled ODEs for sampling and likelihood are computed from a shared underlying velocity field, allowing one to jointly distill sampling trajectories and cumulative divergence using a single model. F2D2 preserves sample quality while enabling accurate few-step likelihoods, resolving a long-standing computational bottleneck. With a lightweight self-guidance technique, a 2-step MeanFlow model can even outperform its 1024-step teacher using just one extra backward NFE. Check out an excellent detailed thread by @electronickale!

Yutong (Kelly) He@electronickale

English

13.8K

So Yeon (Tiffany) Min retweetledi

Sanket Vaibhav Mehta (SVM)@sanketvmehta·9 Ara

💯 That’s why my doctoral thesis (2023): Efficient Lifelong Learning in Deep Neural Networks: Optimizing Architecture, Training, and Data kilthub.cmu.edu/articles/thesi…

Pratyush Maini@pratyushmaini

Continual learning is simultaneously a data problem (curricula), architecture problem (entanglement), and optimization problem (starvation). This is probably why the goal has remained elusive to individual groups thinking of it from one lens.

English

7.5K

So Yeon (Tiffany) Min retweetledi

Shrimai@shrimai_·3 Ara

I’ll be at #NeurIPS this week. Check out our paper let by @jaehunjung_com !

Jaehun Jung@jaehunjung_com

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

English

458

So Yeon (Tiffany) Min retweetledi

Russ Salakhutdinov@rsalakhu·29 Kas

Predicting AGI/ASI timelines has become trendy, so I’ll offer mine: AGI/ASI is 5–10 years away. It always has been. It always will be.

English

637

74.2K

So Yeon (Tiffany) Min retweetledi

Russ Salakhutdinov@rsalakhu·28 Kas

Check out the new @mldcmu blogpost: How to Explore to Scale RL Training of LLMs on Hard Problems? blog.ml.cmu.edu/2025/11/26/how… Current on-policy RL methods fail to learn from hard problems as they rarely generate a single correct rollout, producing no reward signal and no learning. Including easy problems can also be harmful, as models tend to overfit to them and fail to improve on harder tasks. Distilling human-written solutions is not only costly, but also provides difficult targets for fine-tuning. This blogpost discusses various approaches and introduces a framework that uses existing human or model solutions as privileged guidance to unlock learning on hard problems. The key idea is simple: Prepend a minimal solution prefix to difficult prompts, enabling on-policy RL to obtain reward and learn behaviors that generalize back to the original, unconditioned tasks. This expands the set of solvable problems and results in significant gains on challenging reasoning benchmarks. With @QuYuxiao, @setlur_amrith, @gingsmith, @aviral_kumar2. Paper/Code is coming soon.

GIF

English

270

25.5K

So Yeon (Tiffany) Min retweetledi

Sanket Vaibhav Mehta (SVM)@sanketvmehta·18 Kas

Geminiii 🚀 🚀 🚀

Sundar Pichai@sundarpichai

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting. Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI. Excited for you to try it!

Indonesia

677

So Yeon (Tiffany) Min retweetledi

Shrimai@shrimai_·9 Eki

Thank you @rohanpaul_ai for highlighting our work!💫 Front-Loading Reasoning shows that inclusion of reasoning data in pretraining is beneficial, does not lead to overfitting after SFT, & has latent effect unlocked by SFT! Paper: arxiv.org/abs/2510.03264 Blog: research.nvidia.com/labs/adlr/Syne…

Rohan Paul@rohanpaul_ai

New @nvidia paper shows that teaching reasoning early during pretraining builds abilities that later fine-tuning cannot recover. Doing this early gives a 19% average boost on tough tasks after all post-training. Pretraining is the long first stage where the model learns to predict the next word from lots of text. Supervised fine-tuning is a later stage where it studies step by step answers from labeled examples. Reinforcement learning then rewards better answers so the model improves further. Diversity matters most in pretraining, while high quality matters most in supervised fine-tuning, roughly 11% vs 15% gains. Even doubling supervised fine-tuning on a base that skipped early reasoning could not catch up. Adding lots of mixed-quality supervised fine-tuning data even cut math by about 5%. High quality reasoning added in pretraining looked small at first, then showed up strongly after supervised fine-tuning. Teams should load diverse reasoning into pretraining, use a small high quality set for supervised fine-tuning, then stabilize with rewards. ---- Paper – arxiv. org/abs/2510.03264 Paper Title: "Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data"

English

1.1K

So Yeon (Tiffany) Min retweetledi

Jimin Mun@jiminmun_·9 Eki

Amazing theoretical work on how to generate text-based synthetic data that will *actually* improve performance on statistical inference!! 🤩

Emily Byun@yewonbyun_

💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data

English

1.1K

So Yeon (Tiffany) Min retweetledi

Dylan Sam@dylanjsam·9 Eki

Very interesting insights into understanding when and why synthetic data (although imperfect and biased) can boost the performance of statistical inference!! 📈📈

Emily Byun@yewonbyun_

English

1.9K

So Yeon (Tiffany) Min@SoYeonTiffMin·9 Eki

Check out this great work from Emily!

Emily Byun@yewonbyun_

English

956

Keşfet

@shrimai_ @MistralAI @nvidia @rsalakhu @electronickale @subail @chuckjhoover @asenkut