Barna Pásztor

33 posts

Barna Pásztor

Barna Pásztor

@pasztorb

Doctoral Fellow @ETH_AI_Center | RLHF | LLM fine-tuning | Preference Optimisation | (Multi-Agent) Reinforcement Learning

Zurich, Switzerland Katılım Mayıs 2010
279 Takip Edilen213 Takipçiler
Sabitlenmiş Tweet
Barna Pásztor
Barna Pásztor@pasztorb·
🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training 1. Annotating human preference datasets without spending a fortune 2. Quantifying uncertainty for reward models 🔗lasgroup.github.io/rlhf
Barna Pásztor tweet media
English
1
14
73
6K
Barna Pásztor
Barna Pásztor@pasztorb·
@agarwl_ Great work! I often think of the weights the other way around. Model-weights govern the immediate prompt-response connection (System 1) while prompt-weights (or the harness) define the slow-thinking process through reasoning, tool-calls, self-reflection,... (System 2).
English
0
0
0
62
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…
Rishabh Agarwal tweet media
English
18
73
566
69.4K
Barna Pásztor
Barna Pásztor@pasztorb·
A Leader commits to an action, and a Follower refines it. This asymmetry captures richer preferences than scalar rewards and provides stable training. As a bonus, it offers inference-time refinement with two turn rollouts deliver ~60% gains over single turn.
English
1
1
4
155
Barna Pásztor
Barna Pásztor@pasztorb·
What do you do when reward models fail in RLHF? Scalar rewards flatten messy, context dependent human preferences into a single number. The reward model learns a distortion, and the policy optimizes it faithfully. 🧵
English
1
2
21
1.8K
Barna Pásztor
Barna Pásztor@pasztorb·
📄 RewardUQ (arxiv.org/abs/2602.24040) We rigorously compare UQ methods for reward models and draw practical insights for active learning and robust RL post-training. The results were immediately applied in ActiveUltraFeedback!
English
1
0
3
344
Barna Pásztor
Barna Pásztor@pasztorb·
🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training 1. Annotating human preference datasets without spending a fortune 2. Quantifying uncertainty for reward models 🔗lasgroup.github.io/rlhf
Barna Pásztor tweet media
English
1
14
73
6K
Barna Pásztor
Barna Pásztor@pasztorb·
📄 ActiveUltraFeedback (arxiv.org/abs/2603.09692) How much preference data do you really need? We show that active learning can match or beat static baselines using as little as 1/6 of the annotations across datasets and algorithms!
English
0
0
2
85
Barna Pásztor retweetledi
Thomas Kleine Buening
Thomas Kleine Buening@thomasklbg·
Deployed LLMs and users generate millions of conversations every day. These are full of useful learning signals, yet we don't use them for training. We introduce self-distillation for learning directly from user conversations – no rewards, no labels, no extra models.
Thomas Kleine Buening tweet media
English
9
36
255
54.5K
Barna Pásztor
Barna Pásztor@pasztorb·
I am attending @NeurIPSConf 2025 next week in San Diego, CA! Reach out to chat about RLHF and preference optimisation! I am happy to discuss future collaborations and open positions in 2026. #NeurIPS2025
English
0
0
9
353
Barna Pásztor retweetledi
ETH AI Center
ETH AI Center@ETH_AI_Center·
Great to have @eldsjal visit with @shak & @piammichel, yesterday! Many nice demo day interactions with our cutting-edge AI research projects & ventures. Their concluding message: now’s the time to build with massive impact - and ETH AI Center is one of the best places to start 🚀
ETH AI Center tweet mediaETH AI Center tweet media
Zurich, Switzerland 🇨🇭 English
0
4
15
1.8K
Barna Pásztor
Barna Pásztor@pasztorb·
Amazing experience to be part of this project and work on post-training at scale with an exceptional team! More great things to come to push the open-source LLM community!
CSCS Lugano@cscsch

@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: cscs.ch/science/comput… #Apertus #AI

English
0
2
22
2.1K
Barna Pásztor retweetledi
Paul Friedrich
Paul Friedrich@pa_friedrich·
At #AAMAS25 in Detroit this week and presenting my work with @pasztorb & @gio_ramponi Thursday afternoon - if you're here, let's connect and chat about learned algorithmic collusion, or go for a morning run!
Paul Friedrich tweet media
English
0
1
5
1.9K
Barna Pásztor
Barna Pásztor@pasztorb·
I am presenting two papers this week at #NeurIPS2024 focusing on preference-based RL! 1. Contextual Bilevel Reinforcement Learning for Incentive Alignment: #6505 West, 11AM, Thursday 2. Bandits with Preference Feedback: A Stackelberg Game Perspective: #5807 West, 11AM, Friday
Barna Pásztor tweet media
English
0
5
22
2.2K
Barna Pásztor retweetledi
Giorgia Ramponi
Giorgia Ramponi@gio_ramponi·
I am not attending #NeurIPS this year, but Vinzenz Thoma and @pasztorb yes :) Come to chat about our recent work on "Contextual Bilevel Reinforcement Learning for Incentive Alignment" 🗓️ Thu 12 Dec 11 a.m
English
0
1
18
1.7K
Barna Pásztor retweetledi
ETH AI Center
ETH AI Center@ETH_AI_Center·
🔬 Advance the frontiers of AI: @ETH_AI_Center Fellowship Programs –#PhD & #Postdoc Opportunities 🔬 💫Push the boundaries of Reinforcement Learning and Data-driven Control💫 ✍️ Apply by November 19, 2024: ttps://ai.ethz.ch/apply
ETH AI Center tweet media
English
1
5
10
2.1K
Barna Pásztor retweetledi
Gergely Neu
Gergely Neu@neu_rips·
PLS SHARE: I'm hiring a PhD student to work on ML theory, to begin in Fall 2025. Topics include: generalization bounds & statistical inference via online prediction, representation learning via optimal transport, sequential decision making... More info: cs.bme.hu/~gergo/jobs.ht…
Gergely Neu tweet media
ELLIS@ELLISforEurope

The #ELLISPhD application portal is now open! Apply to top #AI labs & supervisors in Europe with a single application, and choose from different areas & tracks. The call for applications: ellis.eu/news/ellis-phd… Deadline: 15 November 2024 #PhD #PhDProgram #MachineLearning #ML

English
5
82
291
58.4K