
🔥 Autonomous AI Assistants (e.g., #googleio2024, #WWDC24) and coding agents (e.g., #Devin, #SWEAgent) have garnered a lot of attention recently. We can envision coding agents autonomously completing complex day-to-day tasks across apps using APIs on our behalf. But how can we develop & benchmark them in a rigorous & reproducible manner? 🚀 Introducing AppWorld: 🌎a simulated world environment where agents can write code to interact with many apps via APIs on behalf of people 📊a benchmark of complex tasks defined on it, and 🧪a robust evaluation framework for assessing agent’s goal completion. 📢 To appear as an #ACL2024 paper 🌎💻🧑🤝🧑 “AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents” #NLProc #ai #AIagents 📜 arxiv.org/abs/2407.18901 (paper) 🌐 appworld.dev for code, blog, data (tasks, APIs, trajectories) explorer, interactive playground, leaderboard & more!























