Marco Bagatella

75 posts

Marco Bagatella

@mar_baga

ETH/CLS PhD candidate, interested in reinforcement learning and pizza making.

Katılım Kasım 2021

155 Takip Edilen206 Takipçiler

Marco Bagatella retweetledi

Anselm Paulus@AnselmPaulus·16 Mar

Autodiff libraries compute gradients, but many operations (e.g. sorting, comparisons, logic, indexing) yield uninformative gradients. We release SoftJAX & SoftTorch, which bundle 40+ drop-in replacements that make these functions smoothly differentiable. github.com/a-paulus/softj…

English

1.3K

Marco Bagatella retweetledi

Ariel@redtachyon·5 Şub

Can we train LLMs with RL using the same next token prediction loss as pre-training? (yes) We conduct a study on (log)prob rewards and show they give a simple way to bridge verifiable and non-verifiable settings with a single reward, broadly applicable for fine-tuning LLMs.

English

164

9.3K

Marco Bagatella retweetledi

Yarden As@yarden_as·6 Şub

Happy to share that our paper “Safe Exploration via Policy Priors“ has been accepted to ICLR 2026! tl;dr we develop an online reinforcement learning algorithm that satisfies safety constraints all throughout learning. More here: yardenas.github.io/sooper/

English

133

Marco Bagatella@mar_baga·5 Şub

One more step towards expanding the capabilities of BFMs! Great work by @th_rupf as usual :)

Thomas Rupf@th_rupf

Excited to share that our paper "Optimistic Task Inference for Behavior Foundation Models" was accepted for ICLR 2026. BFMs are great at zero-shot RL, but task inference requires a dataset with reward labels. Our method OpTI-BFM offers an online alternative. (1/5)

English

378

Marco Bagatella retweetledi

Basile Terver@BasileTerv987·4 Şub

𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗘𝗕-𝗝𝗘𝗣𝗔 ⚡ An open-source library making JEPAs accessible, trainable on a single GPU in hours! 🚀 🔗 Paper: arxiv.org/abs/2602.03604 💻 Code: github.com/facebookresear…

English

661

91.4K

Marco Bagatella@mar_baga·30 Oca

TL;DR: LLMs are pretty good at credit assignment in hindsight!

Jonas Hübotter@jonashubotter

Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed. Today, we introduce a simple algorithm that enables the model to learn from any rich feedback! And then turns it into dense supervision. (1/n)

English

177

Marco Bagatella retweetledi

Yarden As@yarden_as·22 Eki

This might useful to other safe RL researchers out there. We've implemented safety gym (by openai) using MJX, reducing training time from >4 hours to <30 minutes Check it out: github.com/yardenas/mjx-s…

English

208

Marco Bagatella retweetledi

Núria Armengol@NriaArmengol2·14 Ağu

Last week I presented our last work: 🐝“Epistemically-guided forward backward exploration (FBEE)”🐝 at the @RL_Conference TLDR: Active learning for unsupervised RL

English

2.9K

Marco Bagatella retweetledi

Thomas Rupf@th_rupf·15 Tem

Zero-shot imitation from just a single sparse demonstration is hard. Goal-conditioned methods tend to “greedily" move from one state to the next and lose the big picture. We're presenting an alternative approach on Tuesday at #ICML2025. (1/3)

English

1.2K

Marco Bagatella@mar_baga·14 Tem

If this sounds interesting, drop by on Wednesday between 11am and 1.30pm at B2-B3 W-609! Joint work with @jonashuebotter, @GMartius, @arkrause 📝 arxiv.org/abs/2410.05026 (3/3)

English

101

Marco Bagatella@mar_baga·14 Tem

We propose an algorithm that does this by actively maximizing information gain on the demonstrator, with a couple of tricks to estimate this quantity and mitigate forgetting. Interestingly, this solution is viable even when no information on pre-training is available (!) (2/3)

English

117

Marco Bagatella@mar_baga·14 Tem

When multiple tasks need improvements, fine-tuning a generalist policy becomes tricky. How do we allocate a demonstration budget across a set of tasks of varied difficulty and familiarity? We are presenting a possible solution at ICML on Wednesday! (1/3)

English

998

Marco Bagatella retweetledi

Xin Chen, Cynthia@XinCynthiaChen·2 Haz

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

English

255

53K

Marco Bagatella retweetledi

Georg Martius@GMartius·23 May

EWRL News!⚡ We’re pleased to announce a new Call for Contributed Talks to bring novel and diverse viewpoints to EWRL2025. Early-career researchers are especially encouraged to submit proposals and share their work with the community. Full details: …p-on-reinforcement-learning.github.io/ewrl18/

English

2.9K

Marco Bagatella@mar_baga·4 May

@RemiCadene The Soul of a New Machine by Tracy Kidder might be interesting, although those computers are not really close to "personal".

English

110

Remi Cadene@RemiCadene·4 May

I am looking for books on the history of personal computers (not just apple computers). I would like to draw the parallel with what's happening in robotics right now. Any recommendation? 🙏

English

10.9K

Marco Bagatella retweetledi

mikel@mikel_zhobro·2 Nis

Introducing 3DGSim🧩— an end-to-end 3D physics simulator trained only on multi-view videos. It achieves spatial & temporal consistency w/o ground truth 3D info or heavy inductive biases— enabling scalability & generalization🚀 🔗 mikel-zhobro.github.io/3dgsim 📺youtube.com/watch?v=3Ar36A…

YouTube

English

28.8K

Marco Bagatella retweetledi

Chenhao Li@breadli428·18 Şub

Join us for the Robotics, Vision, and Controls Talks from this week! We curated a list of exceptional speakers for a series of talks focusing on robotics, computer vision, controls, and learning. Everyone is welcomed! Web: robotics-talks.com Zoom: ethz.zoom.us/j/63716670526

English

138

10.7K

Marco Bagatella retweetledi

Bhavya Sukhija@sukhijabhavy·17 Ara

Excited to share MaxInfoRL! The core focus was developing simple, flexible, and scalable methods for principled exploration. Check the thread to see how MaxInfoRL meets these criteria and achieves SOTA results.

Carlo Sferrazza@carlo_sferrazza

🚨 New reinforcement learning algorithms 🚨 Excited to announce MaxInfoRL, a class of model-free RL algorithms that solves complex continuous control tasks (including vision-based!) by steering exploration towards informative transitions. Details in the thread 👇

English

5.5K

Marco Bagatella retweetledi

Machine Learning Street Talk@MLStreetTalk·1 Ara

Jonas Hübotter is a doctoral Researcher at ETH Zurich. Where he works on Active Fine-Tuning and Local Learning. His paper Transductive Active Learning: Theory and Applications arxiv.org/pdf/2402.15898 is published at NeurIPS24 and we discuss Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs arxiv.org/pdf/2410.08020 (submitted for ICLR 25) - live on MLST now.

English

10.2K

Marco Bagatella retweetledi

Georg Martius@GMartius·6 Kas

@IrisAndrussow is demonstrating our tactile sensor Minsight @corl_conf the next days. Come buy! We also have a cut open version. #CoRL2024 #Robotics #Tactile

English

625

Keşfet

@th_rupf @RL_Conference @GMartius @arkrause @RemiCadene @elonmusk @BarackObama @taylorswift13