Che-Ping Tsai

79 posts

Che-Ping Tsai

@chepingt

PhD @mldcmu, interpretability and representation learning, machine learning theories.

Katılım Kasım 2016

650 Takip Edilen155 Takipçiler

Che-Ping Tsai retweetledi

Christina Baek@_christinabaek·18 Mar

Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵

English

617

93.4K

Che-Ping Tsai retweetledi

Aakash Lahoti@aakash_lahoti·17 Mar

A year of cooking 👨‍🍳and we’re finally serving Mamba-3. What began as a small effort to revisit a few recurring limitations of SSMs grew into a much bigger project. Taking a more principled state space perspective ended up tying these threads together.

Albert Gu@_albertgu

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

English

8.6K

Che-Ping Tsai@chepingt·12 Mar

@dylanjsam @yus167 @OpenAI @zicokolter @andrew_ilyas @furongh Congrats!

English

Dylan Sam@dylanjsam·11 Mar

I defended my PhD thesis! Also, a very (~4 month) late life update, but I've joined @OpenAI to work on safety research and pretraining safer language models! 📈 Thank you to my advisor @zicokolter and my committee: Matt Fredrikson, @andrew_ilyas, and @furongh! 🙏

English

221

21.5K

Che-Ping Tsai retweetledi

Amrith Setlur@setlur_amrith·15 Şub

I'll admit, going in I was not 100% sure this was possible: we trained a tiny 4B model (QED-Nano) to prove math theorems at the Olympiad level! Today, we release the full recipe, from the data curation done for SFT to our RL algorithm that explicitly optimizes for test-time scaling over millions of tokens (i.e., we train QED-Nano to continually improve as we apply modern day test-time scaffolds like DeepSeekMath-agent over it). 🧵⬇️

English

149

23.6K

Che-Ping Tsai retweetledi

Yuda Song@yus167·3 Şub

RL on LLMs inefficiently uses one scalar per rollout. But users regularly give much richer feedback: "make it formal," "step 3 is wrong." Can we train LLMs on this human-AI interaction? We introduce RL from Text Feedback, with 1) Self-Distillation; 2) Feedback Modeling (1/n) 🧵

English

101

601

106.6K

Che-Ping Tsai retweetledi

Amrith Setlur@setlur_amrith·3 Şub

We run online RL on a mixture of problems: some are easy to explore (high pass rate), and some are very very hard (need to sample A LOT before we see any positive sample). Turns out RL on such a mixture can lead to a "rich-gets-richer" effect, where RL over-sharpens on the easy problems, at the cost of getting stuck in a "plateau" on harder ones, making it even harder to sample a correct trace on those. RL literature calls this "ray interference". In our recent work POPE, we show that using privileged info. to guide exploration on hard problems can tackle ray interference! 🧵⬇️

English

306

16.3K

Che-Ping Tsai retweetledi

Amrith Setlur@setlur_amrith·21 Oca

RL training of LLMs spends tons of compute on sampling rollouts 🤖💸 But most runs are YOLO 🤟, telling us little about how to scale sampling compute optimally. Given a fixed sampling compute budget, how should we allocate it across: • sequential iterations ⏩ • parallel rollouts 🎲 Answers to this with scaling laws 📈 and more in our new blog post ⬇️

English

201

12.9K

Che-Ping Tsai retweetledi

Chen Wu@ChenHenryWu·18 Ara

1/⚠️ Parallel test-time scaling (e.g., pass@k) usually wastes compute - models often repeat the same dominant failure❌ How should we effectively generate creative solutions? While typical methods such as increasing temperature 🌡️ usually fail, we put forward Mode‑Conditioning (ModC) - a simple yet powerful training and test-time framework that allocates compute across diverse reasoning modes🎨We show that ModC largely improves pass@k across SFT, distillation, and RL settings. With ModC, we get 4-8x efficiency gains in math reasoning using the same training data!

English

133

24.5K

Che-Ping Tsai@chepingt·30 Kas

Worked closely with Burak on SSL—super thoughtful, rigorous, responsible and supportive. Strongly recommend them for any academic position!

Burak Varıcı@VariciBurak

📢I’m on the academic job market!📢 I mainly work on representation learning and causality (CRL, identifiable SSL, causal discovery, robotics applications, rep. learning for tabular data) Also, I’ll be at #NeurIPS Dec. 2–7, reach out to chat about any of the above :)

English

240

Che-Ping Tsai@chepingt·12 Kas

New SOTA self-supervised learning objective with theoretical guarantees. Worth a look!

Randall Balestriero@randall_balestr

LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...) - 60+ arch., up to 2B params - 10+ datasets - in-domain training (>DINOv3) - corr(train loss, test perf)=95% Paper: arxiv.org/pdf/2511.08544 Code: github.com/rbalestr-lab/l…

English

220

Che-Ping Tsai retweetledi

I-Hung Hsu@IHung_Hsu·31 Eki

🧠🚀 Excited to introduce Supervised Reinforcement Learning—a framework that leverages expert trajectories to teach small LMs how to reason through hard problems without losing their minds. 🤯 Better than SFT && RLVR. Read more: huggingface.co/papers/2510.25… #llms #RL #reasoning

English

336

20.5K

Che-Ping Tsai retweetledi

Yuda Song@yus167·15 Eki

🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)

English

140

30.1K

Che-Ping Tsai retweetledi

Emily Byun@yewonbyun_·9 Eki

💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data

English

143

31K

Che-Ping Tsai retweetledi

Nicholas Boffi@nmboffi·7 Eki

Consistency models, CTMs, shortcut models, align your flow, mean flow... What's the connection, and how should you learn them in practice? We show they're all different sides of the same coin connected by one central object: the flow map. arxiv.org/abs/2505.18825 🧵(1/n)

English

386

65.3K

Che-Ping Tsai retweetledi

Jeremy Cohen@deepcohen·1 Eki

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.

English

212

1.3K

234.2K

Che-Ping Tsai retweetledi

Junhong Shen@JunhongShen1·19 Eyl

Excited to share two papers accepted to #NeurIPS2025 ! 1️⃣ Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction We introduce TTI, an RL algorithm that scales the number of interaction steps beyond thinking tokens per step. Our agents learn to act longer➡️richer exploration➡️better success Paper: arxiv.org/abs/2506.07976 2️⃣ Content-Adaptive Tokenizer (CAT) We develop an image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Importantly, we use just captions (no pixels!) to guide tokenization, enabling adaptive representation for text-to-image generation. Paper: arxiv.org/abs/2501.03120 Look forward to seeing everyone in SD!

English

301

24.2K

Che-Ping Tsai retweetledi

Dylan Sam@dylanjsam·16 Eyl

🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)

English

357

62.5K

Che-Ping Tsai retweetledi

Yuda Song@yus167·11 Eyl

LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]

English

467

39.9K

Che-Ping Tsai retweetledi

Wen-Tse Chen@WenzeChen2·12 Ağu

[0/3] 🚀 Introducing Verlog – an open-source RL framework built specifically for training long-horizon, multi-turn LLM agents. 📊 Max episode length comparison: •VeRL / RAGEN → ~10 turns •verl-agent → ~50 turns •Verlog (ours) → 400+ turns 🔥 ⚙️ Technical foundation: •Built on top of the VeRL •Tested on the BALROG benchmark (BabyAI, BabaIsAI, Crafter) •Followed design principles from pytorch-a2c-ppo-acktr-gail 💡 Why Verlog? •For researchers: Skip the heavy engineering. We give you a strong, validated baseline for long-horizon, multi-turn LLM agent across diverse tasks. •For developers: Train on your own long-horizon environments with minimal setup. •Algorithmic edge: With a well-trained value function as an intermediate supervised signal, rollouts can be truncated at any point and still be used for learning. This reduces GPU idle time and boosts training efficiency. This is a genuine advantage of PPO over the GRPO family, widely recognized and leveraged in classic RL, yet often overlooked in LLM agent frameworks. Key features 🧵👇

GIF

English

398

36.2K

Che-Ping Tsai retweetledi

Lili@lchen915·8 Ağu

Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. There is no external training data – the only input is a single prompt specifying the topic.

English

181

1.1K

145.8K

Keşfet

@dylanjsam @yus167 @OpenAI @zicokolter @andrew_ilyas @furongh @alex_damian_ @elonmusk