Anton Baumann

8 posts

Anton Baumann

Anton Baumann

@_antonbaumann

Entrou em Ekim 2016
44 Seguindo15 Seguidores
Anton Baumann retweetou
Ronak Malde
Ronak Malde@rronak_·
We have been exploring new algorithmic frontiers and are excited to share our contributions to Self Distillation Policy Optimization (SDPO) for agentic continual learning, check out our blog post here: trajectory.ai/field-notes/sc…
English
3
6
69
37.6K
Anton Baumann retweetou
Jonas Hübotter
Jonas Hübotter@jonashubotter·
Today and tomorrow we’ll be presenting self-distillation with orals at ICLR in Rio 🇧🇷 1. “Self-Distillation enables Continual Learning” at lifelong agents workshop (Sun 11:30am) 2. “Reinforcement Learning via Self-Distillation” at scaling post-training workshop (Mon 2:40pm) 3. “Test-Time Self-Distillation” at test-time updates workshop (Mon 4:15pm)
Jonas Hübotter tweet mediaJonas Hübotter tweet media
English
10
48
430
101.5K
Anton Baumann retweetou
Jonas Hübotter
Jonas Hübotter@jonashubotter·
Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed. Today, we introduce a simple algorithm that enables the model to learn from any rich feedback! And then turns it into dense supervision. (1/n)
Jonas Hübotter tweet media
English
22
138
1.1K
210.7K