Sebastian Raschka: "Why do latest language transformers (LLMs like ChatGPT etc.) use reinforcement l"

Post

Why do latest language transformers (LLMs like ChatGPT etc.) use reinforcement learning (RL) for finetuning instead of regular supervised learning (SL)? There are at least 5 reasons ... [1/10]

English

304

506.7K

David Nadeau@Pythonner·28 Şub

@rasbt I don't have a good understanding of RL so I was explaining it to myself as follow: annotation work for SL is slow & expensive = difficult to cover tasks and domains horizontally. RL is 'cheap' in comparison: thumbs up/down or answer ranking = allows covering general language.

English

7.6K

Sebastian Raschka@rasbt·28 Şub

@Pythonner I would say it's similarly expensive in terms of the annotation cost. The cost here is to get the labels to train the reward model in the first place. If you have these labels, you can use the reward model either for RL or SL (/weakly supervised learning)

English

6.9K

Paylaş