Pranav Vaid (@pranav_vd) - Twitter Profili | Zamantika Mersobahis Locabet

behind the scenes at AC, featuring what we now call the pinapple shirt excited to share some of the work @raymondmfeng and I have done, RMSD has proved super valuable in our customer engagements!

Applied Compute@appliedcompute

Some enterprise tasks are challenging to hill-climb with RL-based methods since they involve very out-of-distribution behavior. On-policy self-distillation (OPSD) gives a model learning signal for every token it writes, far richer than the single scalar reward of RL. But that channel is noisy: most tokens don't reflect the behavior you're after. We introduce Relevance-Masked Self-Distillation (RMSD), which uses a two-step filtered loss mask to cut through the noise and find the tokens with the highest signal. Compared to OPSD it trains more stably, provides higher data efficiency, and reaches a higher performance ceiling.

English

4.5K

Pranav Vaid@pranav_vd·16 Haz

this is my first tweet. 😭💀😤🫡👶🏽❌🥳🍾😡👀⚡️🙄😔‼️🪧👀⚡️😈😍🤨🙃🧍‍♂️🌖🙏🚨👏🥥🧍‍♂️✋💭

English

209

Pranav Vaid

Keşfet