
GPT-5.2 Thinking evals
Song Mei
91 posts

@Song__Mei
Assistant Professor at UC Berkeley, Department of Statistics and EECS. Researcher at OpenAI working on LLM training.

GPT-5.2 Thinking evals

🧄GPT-5.2 is here – one small step on version number, one giant leap in capabilities. 🚀 With *incredible* @Song__Mei @yaodong_yu @Yuf_Zh @ofirnachum and rest of the @OpenAI team, we applied new techniques to bring our frontier reasoning model to the next level. GPT-5.2-Thinking is much stronger on intelligence, agentic coding, professional use, long-context understanding, and extended thinking. It’s also better on science/theory research – try pairing with it! Congrats also to @yanndubs @ericmitchellai @.ishaan @christinahkim, and heartfelt thanks to the leadership @_aidan_clark_ @max_a_schwarzer @markchen90 @merettm @sama for making this come together!


GPT-5 is here. Rolling out to everyone starting today. openai.com/gpt-5/

GPT-5 is here. Rolling out to everyone starting today. openai.com/gpt-5/


I’m excited to start at OpenAI this May and help ship the oss model. More to come soon!

We released two open-weight reasoning models—gpt-oss-120b and gpt-oss-20b—under an Apache 2.0 license. Developed with open-source community feedback, these models deliver meaningful advancements in both reasoning capabilities & safety. openai.com/index/introduc…

🙌🎉Our 2025 recipient of the COPSS Presidents' Award, is Lester Mackey! This award is given annually to a young member of the statistical community in recognition of outstanding contributions to the profession of statistics.




🚨 New Paper 🚨 An Overview of Large Language Models for Statisticians 📝: arxiv.org/abs/2502.17814 - Dual perspectives on Statistics ➕ LLMs: Stat for LLM & LLM for Stat - Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability, trustworthiness & more. - LLM for Stat: How LLMs can enhance statistical workflows: from data collection, synthesis, annotation to statistical modeling, with applications to medical research Presents key LLM advances: Architecture, Training, Reasoning, and Self-Alignment: (1) 🧠Evolution of LLM architectures with Transformers and Self-Attention (2) LLM training pipeline from pre-training, SFT, to RLHF and Preference Optimization. (3) 💭 System 2 Prompting and Chain-of-Thought for test-time scaling . (4) 🚀 LLM Self-Alignment for achieving super-human intelligence Statisticians play a key role in the development of large-scale AI models: (1) 💡 Statistical insights improve LLM uncertainty quantification & interpretability (2) 🤖 Watermarking for AI-generated content detection (3) ⚖️ Privacy & algorithmic fairness to ensure responsible AI adoption LLMs can also empower statistical science by: (1) 📈 Scaling up data collection, synthesis, and annotation. (2) 🖥️ Automating statistical coding & exploratory analysis (3) 🔬 Facilitating medical research By bridging statistics & AI, we can: ✅ Improve better LLMs with statistical methodologies. ✅ Leverage LLMs for statistical applications in high-stakes domains

Congratulations to #FlatironCCM research scientist @JiequnH on being awarded @TheSIAMNews' 2025 SIAG/CSE Early Career Prize! Read more: simonsfoundation.org/2025/02/24/ccm… #math

🎉Congrats to the 126 early-career scientists who have been awarded a Sloan Research Fellowship this year! These exceptional scholars are drawn from 51 institutions across the US and Canada, and represent the next generation of groundbreaking researchers. sloan.org/fellowships/20…