
Babak Hodjat
480 posts











Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: arxiv.org/pdf/2509.24372 Code: github.com/VsonicV/es-fin…









Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: arxiv.org/pdf/2509.24372 Code: github.com/VsonicV/es-fin…





