Jierui Lin retweetledi

Excited to share our new pre-print arxiv.org/pdf/2502.01600
We train a digital agent that solves diverse day-to-day tasks from the AppWorld benchmark by interacting with its stateful environment using API calls. AppWorld is hard! The previous best open-weight agent (Llama 3 70B) reached only a 7% success rate on the hardest test split. Our RL algorithm, LOOP - a PPO variant with Monte Carlo baselines - achieves a 45.7% success rate, 24% over the base Qwen 2.5 32B, and 9% higher than a much larger OpenAI o1.


English










