
🔥 Excited to share the other key Technology of WizardLM-2! 📙AutoEvol: Automatic Instruction Evolving for Large Language Models 🚀We build a fully automated Evol-Instruct pipeline to create high-quality, highly complex instruction tuning data: -------- 🧵 -------- 👉Motivation First: Over the past six months, we have dedicated ourselves to exploring methods to scale up synthetic training for LLMs. Although Evol-Instruct has demonstrated excellent performance in creating powerful post-training data, it relies too heavily on the efforts of human experts to design specific evolutionary methods for specific tasks. Once Evol-Instruct is applied to an entirely new complex task, the methods for executing evolution need to be redesigned. This limitation of Evol-Instruct makes scaling up extremely challenging, prompting us to develop a new method, 💻Auto Evol-Insturct💻, that can evolve instruction data automatically. Auto Evol allows the training of WizardLM2 to be conducted with nearly an unlimited number and variety of synthetic data. Let's see: 🧐 1. Limitations of Evol-Instruct: Evol-Instruct takes the high-quality data as a starting point, and further iteratively refines it using LLMs, improving its complexity and diversity. It has demonstrated superior performance across a broad range of public benchmarks that evaluate diverse capabilities, including instruction following (WizardLM), code generation (WizardCoder), and mathematical reasoning (WizardMath). While Evol-Instruct exhibits outstanding performance, its heavy reliance on heuristic efforts presents notable challenges. Whenever it is used for a completely new task, the methods for execution evolution need to be redesigned. Such a process requires a high level of expertise and considerable costs, hindering its adaptation to a wider spectrum of capabilities. 2. We want to build a fully automated Evol-Instruct pipeline Auto Evol-Instruct automatically designs evolving methods that make given instruction data more complex, enabling almost cost-free adaptation to different tasks by only changing the input data of the framework. From below figure, we can see the iterative process of optimizing the initial evolving method e0 into the optimal evolving method e∗, which specifically outlines the transition from et−1 to et. We refer to the model used for evolution as the evol LLM, and the model used for optimization as the optimizer LLM. This optimization process involves two critical stages: (1) Evol Trajectory Analysis: The optimizer LLM carefully analyzes the potential issues and failures exposed in instruction evolution performed by evol LLM, generating feedback for subsequent optimization. (2) Evolving Method Optimization: The optimizer LLM optimizes the evolving method by addressing these identified issues in feedback. These stages alternate and repeat to progressively develop an effective evolving method using only a subset of the instruction data. Once the optimal evolving method is identified, it directs the evol LLM to convert the entire instruction dataset into more diverse and complex forms, thus facilitating improved instruction tuning. 3. Fully AI-driven Evol-Instruct can outperform the Evol-Instruct used by human experts. Our experiments show that the evolving methods designed by Auto Evol-Instruct outperform the Evol-Instruct methods designed by human experts in instruction tuning across various capabilities, including instruction following, mathematical reasoning, and code generation. As shown in the below table, on the instruction following task, Auto Evol-Instruct can achieve a improvement of 10.44% over the Evol method used by WizardLM-1 on MT-bench; on the code task HumanEval, it can achieve a 12% improvement over the method used by WizardCoder; on the math task GSM8k, it can achieve a 6.9% improvement over the method used by WizardMath. 4. Scaling Evol-Instruct to various domains and tasks With the new technology of Auto Evol-Instruct, the evolutionary synthesis data of WizardLM-2 has scaled up from the three domains of chat, code, and math in WizardLM-1 to dozens of domains, covering tasks in all aspects of large language models. This allows Arena Learning to train and learn from an almost infinite pool of high-difficulty instruction data, fully unlocking all the potential of Arena Learning. For more details, please refer to: Paper: arxiv.org/pdf/2406.00770 Project: github.com/nlpxucan/Wizar… We are working with our legal team to publicly release the code of Auto Evol-Instruct.

















