Post

Sebastian Raschka
Sebastian Raschka@rasbt·
Why do latest language transformers (LLMs like ChatGPT etc.) use reinforcement learning (RL) for finetuning instead of regular supervised learning (SL)? There are at least 5 reasons ... [1/10]
English
32
304
2K
506.7K
David Nadeau
David Nadeau@Pythonner·
@rasbt I don't have a good understanding of RL so I was explaining it to myself as follow: annotation work for SL is slow & expensive = difficult to cover tasks and domains horizontally. RL is 'cheap' in comparison: thumbs up/down or answer ranking = allows covering general language.
English
1
0
9
7.6K
Sebastian Raschka
Sebastian Raschka@rasbt·
@Pythonner I would say it's similarly expensive in terms of the annotation cost. The cost here is to get the labels to train the reward model in the first place. If you have these labels, you can use the reward model either for RL or SL (/weakly supervised learning)
English
0
0
0
6.9K
Paylaş