ML@CMU

124 posts

ML@CMU

@mlcmublog

Official twitter account for the ML@CMU blog @mldcmu @SCSatCMU

Pittsburgh, PA Katılım Şubat 2020

20 Takip Edilen2.3K Takipçiler

ML@CMU@mlcmublog·17 Mar

blog.ml.cmu.edu/2026/03/17/lum… The method used to segment textual content into ‘chunks’ in RAG pipelines can significantly impact dense retrieval quality. Read more about LumberChunker, a method for dynamically segmenting long-form narrative segments, in our new post!

English

438

ML@CMU@mlcmublog·24 Ara

We asked LLMs: Is Santa real? 🎅 GPT-4o says Yes at any age. Claude tells 5-year-olds the truth. What does this reveal about invisible assumptions in AI? Do LLMs believe in the tooth fairy or the Illuminati? New holiday post here: blog.ml.cmu.edu/2025/12/23/is-…

English

770

ML@CMU@mlcmublog·9 Ara

LLM-as-a-judge is used everywhere, but breaks under standard forced-choice ratings. Modeling rating indeterminacy can lead to more reliable judges. Read more in our latest blog post. blog.ml.cmu.edu/2025/12/09/val…

English

179

ML@CMU@mlcmublog·2 Ara

Check out our latest post on CMU @ NeurIPS 2025! blog.ml.cmu.edu/2025/12/01/car…

English

727

ML@CMU@mlcmublog·27 Kas

Why does LLM training plateau and how can we fix it? In a new blogpost, we discuss how we can improve LLM exploration using ideas from offline RL. blog.ml.cmu.edu/2025/11/26/how…

English

268

ML@CMU@mlcmublog·27 Eki

blog.ml.cmu.edu/2025/10/27/lea… The hardest problems have near-zero success rates and no positive examples during learning. BaNEL (Bayesian Negative Evidence Learning) post-trains using failed attempts only while minimizing the number of reward evaluations. Read more in our latest post!

English

1.5K

ML@CMU@mlcmublog·22 Eyl

blog.ml.cmu.edu/2025/09/22/dif… Check out our new blog post on "Diffusion beats Autoregressive in Data-Constrained settings". The era of infinite internet data is ending. This research paper asks: What is the right generative modeling objective when data—not compute—is the bottleneck?

English

495

ML@CMU@mlcmublog·15 Eyl

blog.ml.cmu.edu/2025/09/15/ver… Check out our latest blog post on Verlog, a multi-turn reinforcement learning framework built for long-horizon LLM-agentic tasks with highly variable episode lengths.

English

884

ML@CMU@mlcmublog·8 Tem

blog.ml.cmu.edu/2025/07/08/car… Check out our latest post on CMU @ ICML 2025!

English

3.7K

ML@CMU@mlcmublog·1 Haz

blog.ml.cmu.edu/2025/06/01/rlh… In this in-depth coding tutorial, @GaoZhaolin and @g_k_swamy walk through the steps to train an LLM via RL from Human Feedback!

English

4.6K

ML@CMU@mlcmublog·22 May

blog.ml.cmu.edu/2025/05/22/unl… Are your LLMs truly forgetting unwanted data? In this new blog post authored by @shengyuan_26734, Yiwei Fu, @zstevenwu, and @gingsmith, we discuss how benign relearning can jog unlearned LLM's memory to recover knowledge that is supposed to be forgotten.

English

ML@CMU@mlcmublog·23 Nis

blog.ml.cmu.edu/2025/04/23/car… Check out our latest blog post on CMU @ ICLR 2025!

English

428

ML@CMU@mlcmublog·21 Nis

blog.ml.cmu.edu/2025/04/21/all… Check out our new blog post on ALLIE, a new chess AI that actually plays like a human! Unlike Stockfish or AlphaZero that focus on winning at all costs, ALLIE uses a transformer model trained on human chess games to make moves, ponder and resign like humans. With time-adaptive MCTS search at inference time (allocating more search budget to positions where humans spend time on), ALLIE can match player skill levels up to grandmaster-level opponents (2500 Elo) in online games, while learning exclusively from humans. Written by @yimingz0, @apjacob03, Vivian Lai, @dan_fried, @daphneipp

English

400

ML@CMU@mlcmublog·18 Nis

blog.ml.cmu.edu/2025/04/18/llm… 📈⚠️ Is your LLM unlearning benchmark measuring what you think it is? In a new blog post authored by @prthaker_, @shengyuan_26734, @neilkale, @yash_maurya01, @zstevenwu, and @gingsmith, we discuss why empirical benchmarks are necessary but not sufficient measures of success (SaTML 2025).

English

2.5K

ML@CMU@mlcmublog·9 Nis

blog.ml.cmu.edu/2025/04/09/cop… How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by @iamwaynechi and @valeriechen_ created @CopilotArena to collect user preferences on in-the-wild workflows. This blogpost overviews the design and deployment of Copilot Arena + new insights into developer code preferences.

English

3.9K

ML@CMU@mlcmublog·9 Oca

blog.ml.cmu.edu/2025/01/08/opt… How can we train LLMs to solve complex challenges beyond just data scaling? In a new blogpost, @setlur_amrith, @QuYuxiao Matthew Yang, @LunjunZhang , @gingsmith and @aviral_kumar2 demonstrate that Meta RL can help LLMs better optimize test time compute

English

18.2K

Keşfet

@GaoZhaolin @g_k_swamy @shengyuan_26734 @zstevenwu @gingsmith @yimingz0 @apjacob03 @dan_fried