Anton Baumann

8 posts

Anton Baumann

Anton Baumann

@_antonbaumann

Katılım Ekim 2016
46 Takip Edilen15 Takipçiler
Anton Baumann retweetledi
Ronak Malde
Ronak Malde@rronak_·
We have been exploring new algorithmic frontiers and are excited to share our contributions to Self Distillation Policy Optimization (SDPO) for agentic continual learning, check out our blog post here: trajectory.ai/field-notes/sc…
English
3
6
69
37.8K
Anton Baumann retweetledi
Jonas Hübotter
Jonas Hübotter@jonashubotter·
Today and tomorrow we’ll be presenting self-distillation with orals at ICLR in Rio 🇧🇷 1. “Self-Distillation enables Continual Learning” at lifelong agents workshop (Sun 11:30am) 2. “Reinforcement Learning via Self-Distillation” at scaling post-training workshop (Mon 2:40pm) 3. “Test-Time Self-Distillation” at test-time updates workshop (Mon 4:15pm)
Jonas Hübotter tweet mediaJonas Hübotter tweet media
English
10
48
429
101.6K
Anton Baumann retweetledi
Jonas Hübotter
Jonas Hübotter@jonashubotter·
Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed. Today, we introduce a simple algorithm that enables the model to learn from any rich feedback! And then turns it into dense supervision. (1/n)
Jonas Hübotter tweet media
English
22
138
1.1K
210.8K