Thomas Rupf

10 posts

Thomas Rupf

Thomas Rupf

@th_rupf

Master's student @ ETH interested in ML, RL, and Robotics.

Zurich Inscrit le Temmuz 2025
142 Abonnements11 Abonnés
Thomas Rupf
Thomas Rupf@th_rupf·
OpTI-BFM bares similarities with LinUCB for Bandits which we use to prove sublinear regret in episodic settings under mild assumptions. Because it's online, OpTI-BFM can also adapt to time-varying (non-stationary) rewards by decaying the weight on older observations. (4/5)
Thomas Rupf tweet media
English
1
0
1
71
Thomas Rupf
Thomas Rupf@th_rupf·
Excited to share that our paper "Optimistic Task Inference for Behavior Foundation Models" was accepted for ICLR 2026. BFMs are great at zero-shot RL, but task inference requires a dataset with reward labels. Our method OpTI-BFM offers an online alternative. (1/5)
Thomas Rupf tweet media
English
1
5
12
1K
Thomas Rupf retweeté
Núria Armengol
Núria Armengol@NriaArmengol2·
Last week I presented our last work: 🐝“Epistemically-guided forward backward exploration (FBEE)”🐝 at the @RL_Conference TLDR: Active learning for unsupervised RL
English
2
9
50
2.9K
Thomas Rupf retweeté
Marco Bagatella
Marco Bagatella@mar_baga·
When multiple tasks need improvements, fine-tuning a generalist policy becomes tricky. How do we allocate a demonstration budget across a set of tasks of varied difficulty and familiarity? We are presenting a possible solution at ICML on Wednesday! (1/3)
Marco Bagatella tweet media
English
1
8
17
998
Thomas Rupf
Thomas Rupf@th_rupf·
Our method tackles the occupancy matching objective directly at test-time by estimating the agent's occupancy with samples from a learned world model and matching it to the expert occupancy using Optimal Transport. (2/3)
Thomas Rupf tweet media
English
1
0
1
62
Thomas Rupf
Thomas Rupf@th_rupf·
Zero-shot imitation from just a single sparse demonstration is hard. Goal-conditioned methods tend to “greedily" move from one state to the next and lose the big picture. We're presenting an alternative approach on Tuesday at #ICML2025. (1/3)
English
1
7
16
1.2K