Barna Pásztor

33 posts

Barna Pásztor

@pasztorb

Doctoral Fellow @ETH_AI_Center | RLHF | LLM fine-tuning | Preference Optimisation | (Multi-Agent) Reinforcement Learning

Zurich, Switzerland Katılım Mayıs 2010

279 Takip Edilen213 Takipçiler

Sabitlenmiş Tweet

Barna Pásztor@pasztorb·28 Mar

🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training 1. Annotating human preference datasets without spending a fortune 2. Quantifying uncertainty for reward models 🔗lasgroup.github.io/rlhf

English

Barna Pásztor@pasztorb·16 May

@agarwl_ Great work! I often think of the weights the other way around. Model-weights govern the immediate prompt-response connection (System 1) while prompt-weights (or the harness) define the slow-thinking process through reasoning, tool-calls, self-reflection,... (System 2).

English

Rishabh Agarwal@agarwl_·15 May

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…

English

566

69.4K

Barna Pásztor@pasztorb·21 Nis

If you're at ICLR 2026, come by 👇 🗓️ Saturday, April 25, 10.30 to 13.00 📍 Poster Session 5, Pavilion 4, #4808 📄 arxiv.org/abs/2512.16626 💻 github.com/lasgroup/stack… Joint work w/ @thomasklbg and @arkrause.

English

242

Barna Pásztor@pasztorb·21 Nis

A Leader commits to an action, and a Follower refines it. This asymmetry captures richer preferences than scalar rewards and provides stable training. As a bonus, it offers inference-time refinement with two turn rollouts deliver ~60% gains over single turn.

English

155

Barna Pásztor@pasztorb·21 Nis

What do you do when reward models fail in RLHF? Scalar rewards flatten messy, context dependent human preferences into a single number. The reward model learns a distortion, and the policy optimizes it faithfully. 🧵

English

1.8K

Barna Pásztor@pasztorb·28 Mar

Huge thanks to all contributing to these papers! @lenalibon @jessicalamjh Daniel Yang @Davit_Melikidze Florian Redhardt @Marian_Schn @Martin_Wertich Samuel Stante @pkassraie_ @idohakimi @arkrause

English

359

Barna Pásztor@pasztorb·28 Mar

📄 RewardUQ (arxiv.org/abs/2602.24040) We rigorously compare UQ methods for reward models and draw practical insights for active learning and robust RL post-training. The results were immediately applied in ActiveUltraFeedback!

English

344

Barna Pásztor@pasztorb·28 Mar

English

Barna Pásztor@pasztorb·27 Mar

📄 ActiveUltraFeedback (arxiv.org/abs/2603.09692) How much preference data do you really need? We show that active learning can match or beat static baselines using as little as 1/6 of the annotations across datasets and algorithms!

English

Barna Pásztor retweetledi

Thomas Kleine Buening@thomasklbg·19 Şub

Deployed LLMs and users generate millions of conversations every day. These are full of useful learning signals, yet we don't use them for training. We introduce self-distillation for learning directly from user conversations – no rewards, no labels, no extra models.

English

255

54.5K

Barna Pásztor retweetledi

ZurichAI@zurichnlp·12 Oca

ZurichNLP#19 is next Monday at @ETH_AI_Center! Sina Ahmadi (@sina_ahm, @UZH_en) on language for low-resource varities, and Barna Pasztor (@pasztorb, @ETH_AI_Center) on sample-efficient dataset collection for RLHF. RSVP below! Spots limited as always.

English

680

Barna Pásztor@pasztorb·30 Kas

I am attending @NeurIPSConf 2025 next week in San Diego, CA! Reach out to chat about RLHF and preference optimisation! I am happy to discuss future collaborations and open positions in 2026. #NeurIPS2025

English

353

Barna Pásztor retweetledi

ETH AI Center@ETH_AI_Center·24 Eyl

Great to have @eldsjal visit with @shak & @piammichel, yesterday! Many nice demo day interactions with our cutting-edge AI research projects & ventures. Their concluding message: now’s the time to build with massive impact - and ETH AI Center is one of the best places to start 🚀

Zurich, Switzerland 🇨🇭 English

1.8K

Barna Pásztor@pasztorb·3 Eyl

Amazing experience to be part of this project and work on post-training at scale with an exceptional team! More great things to come to push the open-source LLM community!

CSCS Lugano@cscsch

@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: cscs.ch/science/comput… #Apertus #AI

English

2.1K

Barna Pásztor retweetledi

Paul Friedrich@pa_friedrich·19 May

At #AAMAS25 in Detroit this week and presenting my work with @pasztorb & @gio_ramponi Thursday afternoon - if you're here, let's connect and chat about learned algorithmic collusion, or go for a morning run!

English

1.9K

Barna Pásztor@pasztorb·12 Ara

I am presenting two papers this week at #NeurIPS2024 focusing on preference-based RL! 1. Contextual Bilevel Reinforcement Learning for Incentive Alignment: #6505 West, 11AM, Thursday 2. Bandits with Preference Feedback: A Stackelberg Game Perspective: #5807 West, 11AM, Friday

English

2.2K

Barna Pásztor retweetledi

Giorgia Ramponi@gio_ramponi·10 Ara

I am not attending #NeurIPS this year, but Vinzenz Thoma and @pasztorb yes :) Come to chat about our recent work on "Contextual Bilevel Reinforcement Learning for Incentive Alignment" 🗓️ Thu 12 Dec 11 a.m

English

1.7K

Barna Pásztor retweetledi

ETH AI Center@ETH_AI_Center·6 Kas

🔬 Advance the frontiers of AI: @ETH_AI_Center Fellowship Programs –#PhD & #Postdoc Opportunities 🔬 💫Push the boundaries of Reinforcement Learning and Data-driven Control💫 ✍️ Apply by November 19, 2024: ttps://ai.ethz.ch/apply

English

2.1K

Barna Pásztor retweetledi

Gergely Neu@neu_rips·4 Kas

PLS SHARE: I'm hiring a PhD student to work on ML theory, to begin in Fall 2025. Topics include: generalization bounds & statistical inference via online prediction, representation learning via optimal transport, sequential decision making... More info: cs.bme.hu/~gergo/jobs.ht…

ELLIS@ELLISforEurope

The #ELLISPhD application portal is now open! Apply to top #AI labs & supervisors in Europe with a single application, and choose from different areas & tracks. The call for applications: ellis.eu/news/ellis-phd… Deadline: 15 November 2024 #PhD #PhDProgram #MachineLearning #ML

English

291

58.4K

Keşfet

@agarwl_ @thomasklbg @arkrause @lenalibon @jessicalamjh @Davit_Melikidze @Marian_Schn @Martin_Wertich