Lior Shani (@LiorShan) - Twitter Profili | Zamantika Mersobahis Locabet

Lior Shani retweetledi

Ofir Nabati@ofirnabati·19 Ara

We're excited to share our new paper: "Personalized and Sequential Text-to-Image Generation"! Check out the paper and our new sequential human rater dataset! 👇 Paper: arxiv.org/pdf/2412.10419 Dataset: kaggle.com/datasets/googl… Details below.. 1/N 🧵

English

1

6

1.3K

Lior Shani@LiorShan·24 May

Check out our paper for more details on how we're pushing the boundaries of RLHF for more natural and effective multi-turn conversations! arxiv.org/abs/2405.14655 @LiorShan @aviv_rosenberg @AsafCassel (8/8)

English

0

1

116

Lior Shani@LiorShan·24 May

Experimental results showing MTPO outperforms single-turn RLHF baselines and a multi-turn generalization of RLHF. This demonstrates the effectiveness of our approach in improving the quality of multi-turn conversations. (7/8)

English

1

0

2

152

Lior Shani@LiorShan·24 May

Excited to share our latest research on aligning Large Language Models (LLMs) with human preferences! We're moving beyond single-turn interactions to improve multi-turn conversations. arxiv.org/abs/2405.14655 Joint work by @GoogleResearch @GoogleDeepMind (1/8)

English

1

12

31

2.2K

Lior Shani retweetledi

Guy Tennenholtz@guytenn·1 Haz

Check out our paper here: arxiv.org/pdf/2205.15376… Code can be found here: github.com/guytenn/Termin… And of course, thank you to everyone who collaborated with me on this project: @NadavMerlis, @LiorShan, @shiemannor, @ShalitUri, @GalChechik, Assaf Hallak, and @DalalGal. (9/n) n=9

English

1

6

0

Lior Shani@LiorShan·27 Şub

Finally, we discuss the need for exploration in imitation learning, and argue that our Apprenticeship Learning based approach which relies on the MDP structure, is superior to supervised learning approaches such as BC.

English

0

2

0

Lior Shani@LiorShan·27 Şub

We show our approach is both theoretically efficient and practical: we provide regret guarantees and show how to avoid solving an MDP at each iteration as in prior works. This allows us to devise a well-performing deep RL implementation of our (OAL) algorithm.

English

1

0

Lior Shani@LiorShan·27 Şub

Glad to present our paper "Online Apprenticeship Learning" at #AAAI2022 arxiv.org/abs/2102.06924 with @TZahavy @MannorShie We show how to efficiently reproduce experts' behavior from an offline data of trajectories, by interacting with the MDP (when rewards are not specified).

English

2

1

16

0

Lior Shani retweetledi

Guy Tennenholtz@guytenn·4 Oca

teleporting, swimming with sharks, and lots more. It turned out better than I've ever expected! Now, before releasing it out to the world, I'm looking for a partner that can help me market the game properly. You can help me out by retweeting! Promo:

English

0

8

6

0

Lior Shani retweetledi

Ludwig Cancer@Ludwig_Cancer·7 Ara

Ludwig @Princeton Director Joshua Rabinowitz, a pioneer of metabolomics, has contributed to the development of a cancer therapy & undone enduring assumptions about metabolism. His work is opening new approaches to cancer therapy. Learn more: bit.ly/3Ex0ndx

English

2

13

43

0

Lior Shani@LiorShan·11 Ara

Come chat today at the deep RL workshop @NeurIPSConf!

Manan Tomar@manan_tomar

Overall, MDPO is an easily scalable policy optimization algorithm with minimal hyper-params/heuristics involved, and is nicely grounded in mirror descent theory :) Joint work with @LiorShan, Yonathan Efroni, Mohammad Ghavamzadeh Come chat on Dec 11, 11:30 am PST!

English

0

4

0

Lior Shani@LiorShan·7 Ara

Applying Mirror Descent in deep RL is nice and easy! slideslive.com/38941342/mirro…

Manan Tomar@manan_tomar

Overall, MDPO is an easily scalable policy optimization algorithm with minimal hyper-params/heuristics involved, and is nicely grounded in mirror descent theory :) Joint work with @LiorShan, Yonathan Efroni, Mohammad Ghavamzadeh Come chat on Dec 11, 11:30 am PST!

English

0

4

0

Lior Shani@LiorShan·7 Ara

@manan_tomar slideslive.com/38941342/mirro…

QME

0

1

0

Manan Tomar@manan_tomar·6 Ara

Overall, MDPO is an easily scalable policy optimization algorithm with minimal hyper-params/heuristics involved, and is nicely grounded in mirror descent theory :) Joint work with @LiorShan, Yonathan Efroni, Mohammad Ghavamzadeh Come chat on Dec 11, 11:30 am PST!

English

1

0

4

0

Manan Tomar@manan_tomar·6 Ara

Mirror Descent has been utilized in RL theory quite extensively, but can we build practical RL algorithms from it? Our contributed talk at the deep RL workshop @NeurIPSConf discusses exactly this! Paper: arxiv.org/abs/2005.09814 Code: github.com/manantomar/Mir… Details below!

English

1

3

14

0

Lior Shani@LiorShan·6 Haz

Prof. Shie Mannor is presenting our work at the great RL theory seminar this Tuesday! The talk will be about the connections between TRPO and convex optimization, possible practical implications and how to explore in policy optimization...

RL Theory Virtual Seminars@RLtheory

Our next talk: 06/09: Shie Mannor (Technion) "Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs" For details, please see the website: sites.google.com/view/rltheorys…

English

0

1

13

0

Lior Shani@LiorShan·6 Haz

Prof. Shie Mannor is presenting our work at the great RL theory seminar this Tuesday! The talk will be about the connections between TRPO and convex optimization, possible practical implications and how to explore in policy optimization...

Gergely Neu@neu_rips

our seminars are back this week with a real black-belt RL theorist!!

English

0

1

4

0

Lior Shani

Keşfet