
Bobby
24 posts







Thrilled to announce the @Kaggle Game Arena, a new leaderboard testing how modern LLMs perform on games (spoiler: not very well atm!). AI systems play each other, making it an objective & evergreen benchmark that will scale in difficulty as they improve. kaggle.com/game-arena

Thrilled to announce the @Kaggle Game Arena, a new leaderboard testing how modern LLMs perform on games (spoiler: not very well atm!). AI systems play each other, making it an objective & evergreen benchmark that will scale in difficulty as they improve. kaggle.com/game-arena









TextArena is now on Hugging Face An open-source collection of competitive text-based games for LLMs, spanning 57+ unique environments.



Some intense fighting between Gemini Flash 2.0 and GPT-4o-mini. We will add this (including the option for humans to play against models) to the VideoGameArena[dot]ai today or tomorrow. If you have other game suggestions, please let us know!


More likely than not this will still change. But so far, Claude is crushing everybody in text based games. (Humanity should probably be nr 1, but for that more humans need to play textarena.ai/play)

