Marco Bagatella

75 posts

Marco Bagatella banner
Marco Bagatella

Marco Bagatella

@mar_baga

ETH/CLS PhD candidate, interested in reinforcement learning and pizza making.

Katılım Kasım 2021
155 Takip Edilen206 Takipçiler
Marco Bagatella retweetledi
Anselm Paulus
Anselm Paulus@AnselmPaulus·
Autodiff libraries compute gradients, but many operations (e.g. sorting, comparisons, logic, indexing) yield uninformative gradients. We release SoftJAX & SoftTorch, which bundle 40+ drop-in replacements that make these functions smoothly differentiable. github.com/a-paulus/softj…
Anselm Paulus tweet media
English
2
6
27
1.3K
Marco Bagatella retweetledi
Ariel
Ariel@redtachyon·
Can we train LLMs with RL using the same next token prediction loss as pre-training? (yes) We conduct a study on (log)prob rewards and show they give a simple way to bridge verifiable and non-verifiable settings with a single reward, broadly applicable for fine-tuning LLMs.
Ariel tweet media
English
5
22
164
9.3K
Marco Bagatella retweetledi
Yarden As
Yarden As@yarden_as·
Happy to share that our paper “Safe Exploration via Policy Priors“ has been accepted to ICLR 2026! tl;dr we develop an online reinforcement learning algorithm that satisfies safety constraints all throughout learning. More here: yardenas.github.io/sooper/
English
1
2
4
133
Marco Bagatella retweetledi
Yarden As
Yarden As@yarden_as·
This might useful to other safe RL researchers out there. We've implemented safety gym (by openai) using MJX, reducing training time from >4 hours to <30 minutes Check it out: github.com/yardenas/mjx-s…
English
1
1
4
208
Marco Bagatella retweetledi
Núria Armengol
Núria Armengol@NriaArmengol2·
Last week I presented our last work: 🐝“Epistemically-guided forward backward exploration (FBEE)”🐝 at the @RL_Conference TLDR: Active learning for unsupervised RL
English
2
9
50
2.9K
Marco Bagatella retweetledi
Thomas Rupf
Thomas Rupf@th_rupf·
Zero-shot imitation from just a single sparse demonstration is hard. Goal-conditioned methods tend to “greedily" move from one state to the next and lose the big picture. We're presenting an alternative approach on Tuesday at #ICML2025. (1/3)
English
1
7
16
1.2K
Marco Bagatella
Marco Bagatella@mar_baga·
We propose an algorithm that does this by actively maximizing information gain on the demonstrator, with a couple of tricks to estimate this quantity and mitigate forgetting. Interestingly, this solution is viable even when no information on pre-training is available (!) (2/3)
English
1
0
2
117
Marco Bagatella
Marco Bagatella@mar_baga·
When multiple tasks need improvements, fine-tuning a generalist policy becomes tricky. How do we allocate a demonstration budget across a set of tasks of varied difficulty and familiarity? We are presenting a possible solution at ICML on Wednesday! (1/3)
Marco Bagatella tweet media
English
1
8
17
998
Marco Bagatella retweetledi
Xin Chen, Cynthia
Xin Chen, Cynthia@XinCynthiaChen·
🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵
Xin Chen, Cynthia tweet media
English
5
42
255
53K
Marco Bagatella retweetledi
Georg Martius
Georg Martius@GMartius·
EWRL News!⚡ We’re pleased to announce a new Call for Contributed Talks to bring novel and diverse viewpoints to EWRL2025. Early-career researchers are especially encouraged to submit proposals and share their work with the community. Full details: …p-on-reinforcement-learning.github.io/ewrl18/
Georg Martius tweet media
English
2
7
26
2.9K
Marco Bagatella
Marco Bagatella@mar_baga·
@RemiCadene The Soul of a New Machine by Tracy Kidder might be interesting, although those computers are not really close to "personal".
English
0
0
1
110
Remi Cadene
Remi Cadene@RemiCadene·
I am looking for books on the history of personal computers (not just apple computers). I would like to draw the parallel with what's happening in robotics right now. Any recommendation? 🙏
English
23
5
86
10.9K
Marco Bagatella retweetledi
mikel
mikel@mikel_zhobro·
Introducing 3DGSim🧩— an end-to-end 3D physics simulator trained only on multi-view videos. It achieves spatial & temporal consistency w/o ground truth 3D info or heavy inductive biases— enabling scalability & generalization🚀 🔗 mikel-zhobro.github.io/3dgsim 📺youtube.com/watch?v=3Ar36A…
YouTube video
YouTube
English
1
14
62
28.8K
Marco Bagatella retweetledi
Chenhao Li
Chenhao Li@breadli428·
Join us for the Robotics, Vision, and Controls Talks from this week! We curated a list of exceptional speakers for a series of talks focusing on robotics, computer vision, controls, and learning. Everyone is welcomed! Web: robotics-talks.com Zoom: ethz.zoom.us/j/63716670526
English
6
16
138
10.7K
Marco Bagatella retweetledi
Bhavya Sukhija
Bhavya Sukhija@sukhijabhavy·
Excited to share MaxInfoRL! The core focus was developing simple, flexible, and scalable methods for principled exploration. Check the thread to see how MaxInfoRL meets these criteria and achieves SOTA results.
Carlo Sferrazza@carlo_sferrazza

🚨 New reinforcement learning algorithms 🚨 Excited to announce MaxInfoRL, a class of model-free RL algorithms that solves complex continuous control tasks (including vision-based!) by steering exploration towards informative transitions. Details in the thread 👇

English
3
8
38
5.5K
Marco Bagatella retweetledi
Machine Learning Street Talk
Machine Learning Street Talk@MLStreetTalk·
Jonas Hübotter is a doctoral Researcher at ETH Zurich. Where he works on Active Fine-Tuning and Local Learning. His paper Transductive Active Learning: Theory and Applications arxiv.org/pdf/2402.15898 is published at NeurIPS24 and we discuss Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs arxiv.org/pdf/2410.08020 (submitted for ICLR 25) - live on MLST now.
English
1
18
91
10.2K