
Frontier LLMs are doing too much when it comes to editing code. I'm excited to share this work on the Over-Editing problem which refers to models modifying code beyond what is asked of them. The main findings are: - Many frontier models Over-Edit with GPT 5.4 being the biggest culprit - Reasoning models have a higher natural tendency to Over-Edit compared to their non-reasoning counterparts - RL is the best approach to train models to perform minimal code editing while preventing catastrophic forgetting compared to SFT, DPO and Rejection Sampling. Blog and details below!















