Swarnashree

8 posts

Swarnashree

Swarnashree

@swarnashree_ms

ML MTS @reflection_ai ; Past: ML Research Engineer @scale_AI ; Grad @LTIatCMU ;

Sunnyvale, CA Katılım Ekim 2019
298 Takip Edilen40 Takipçiler
Swarnashree retweetledi
Reflection
Reflection@reflection_ai·
Reflection is partnering with Shinsegae Group to build a 250-megawatt sovereign AI factory for the Republic of Korea. Open intelligence. Built on trust between allies. Owned by the nations that need it most. The future of sovereign AI. Read more in the @WSJ.
Reflection tweet media
English
16
31
206
154.9K
Swarnashree retweetledi
Zihao Wang
Zihao Wang@wzihao12·
On-policy distillation with reverse KL as reward works great—IF you have access to teacher logits. But what if you don't? What if you want to distill from multiple teachers? Our solution: distill teacher guidance into rubrics, then do on-policy RL. Check out our work: arxiv.org/abs/2509.21500
Thinking Machines@thinkymachines

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English
2
4
24
3.2K
Swarnashree retweetledi
Bing Liu
Bing Liu@vbingliu·
🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. arxiv.org/abs/2510.07284
Bing Liu tweet media
English
5
42
283
46.3K
Swarnashree retweetledi
Bing Liu
Bing Liu@vbingliu·
New @Scale_AI paper! The culprit behind reward hacking? We trace it to misspecification in high-reward tail. Our fix: rubric-based rewards to tell “excellent” responses apart from “great.” The result: Less hacking, stronger post-training!  arxiv.org/pdf/2509.21500
Bing Liu tweet mediaBing Liu tweet media
English
4
40
178
17.6K
Swarnashree
Swarnashree@swarnashree_ms·
@sama A lone astronaut exploring a mysterious alien planet with a twilight sky, navigating bioluminescent forests, encountering creatures that are communicating through light
English
0
0
0
27
Sam Altman
Sam Altman@sama·
we'd like to show you what sora can do, please reply with captions for videos you'd like to see and we'll start making some!
English
13.8K
2K
28.2K
7.2M
Mistral AI
Mistral AI@MistralAI·
magnet:?xt=urn:btih:208b101a0f51514ecf285885a8b0f6fb1a1e4d7d&dn=mistral-7B-v0.1&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=https%3A%2F%https://t.co/HAadNvH1t0%3A443%2Fannounce RELEASE ab979f50d7d406ab8d0b07d09806c72c
English
239
421
3.8K
1.9M