
RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇
















