
1. take a model and prompt e.g. "name this color: #D02027"
2. generate multiple outputs
3. score outputs
4. calculate group average
5. encourage outputs that beat the average
full version with interactive visualizations: adaptive-ml.com/post/grpo-simp…
English






