Anton Baumann

8 posts

Anton Baumann

@_antonbaumann

انضم Ekim 2016

46 يتبع15 المتابعون

Anton Baumann أُعيد تغريده

Ronak Malde@rronak_·5d

We have been exploring new algorithmic frontiers and are excited to share our contributions to Self Distillation Policy Optimization (SDPO) for agentic continual learning, check out our blog post here: trajectory.ai/field-notes/sc…

English

37.9K

Anton Baumann أُعيد تغريده

Sasha Rush@srush_nlp·18 May

Been working on text feedback / OPSD in Composer. Really interesting space, and much more to be explored.

Cursor@cursor_ai

We improved Composer by scaling training, generating more complex RL environments, and introducing new learning methods. For example, we use text feedback during RL to learn faster by assigning credit in rollouts spanning hundreds of thousands of tokens.

English

276

39.2K

Anton Baumann أُعيد تغريده

Jonas Hübotter@jonashubotter·18 May

Self-distillation for long-horizon training at scale!

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.

English

4.8K

Anton Baumann أُعيد تغريده

Jonas Hübotter@jonashubotter·26 Nis

Today and tomorrow we’ll be presenting self-distillation with orals at ICLR in Rio 🇧🇷 1. “Self-Distillation enables Continual Learning” at lifelong agents workshop (Sun 11:30am) 2. “Reinforcement Learning via Self-Distillation” at scaling post-training workshop (Mon 2:40pm) 3. “Test-Time Self-Distillation” at test-time updates workshop (Mon 4:15pm)

English

429

101.6K

Anton Baumann أُعيد تغريده

Jonas Hübotter@jonashubotter·15 Şub

Just came across this great discussion of self-distillation on @latentspacepod! Really good run down by Ted Kyi and we’re every bit excited about what’s next as he is! m.youtube.com/watch?v=CrJp0s…

English

3.1K

Anton Baumann أُعيد تغريده

Explainable Machine Learning@ExplainableML·12 Şub

3/ Post-hoc Probabilistic Vision-Language Models @_antonbaumann, @ruili_pml, Marcus Klasson, Santeri Mentu, @ShyamgopalKart1, @zeynepakata, @arnosolin, Martin Trapp [Paper]: arxiv.org/pdf/2412.06014 [Project]: aaltoml.github.io/BayesVLM/ [Code]: github.com/AaltoML/BayesV…

Français

165

Anton Baumann أُعيد تغريده

Jonas Hübotter@jonashubotter·29 Oca

Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed. Today, we introduce a simple algorithm that enables the model to learn from any rich feedback! And then turns it into dense supervision. (1/n)

English

138

1.1K

210.8K

Anton Baumann@_antonbaumann·26 Eki

@OverwatchEU Habt ihr schon an Junkensteins Tür geklopft? Es gibt eine Overwatch-PS4 zu gewinnen! blizz.ly/2epuaZ7 #OWHalloween3

Deutsch

اكتشف

@latentspacepod @ruili_pml @ShyamgopalKart1 @zeynepakata @arnosolin @OverwatchEU @elonmusk @BarackObama