
@karpathy So RLHF will always work for absolutes, because there is a predefined and measurable objective.
While for arts and humanities, RLHF relies on the creativity of the human trainer in a defined and often short time interval (which is not optimal for encouraging creative thinking)
English























