
@rjsabouhi @maenstru56 @Vikas_NLP_UA @PanLiangming Exactly. ConciseRL formalizes that intuition: encode “be concise yet correct” in the reward, let PPO follow the gradient, and the attractors are short-but-sufficient chains. Appreciate the gradient-field lens!
English







