
David Foster
419 posts

David Foster
@davidADSP
Author of Generative Deep Learning: Teaching Machines how to Paint, Write, Compose and Play (O'Reilly), #generativeAI, Founding Partner of ADSP.


Does this mean the ARC-AGI benchmark has saturated? Yes -- the v1 version of the benchmark is starting to saturate. There were already signs of this in the Kaggle competition this year -- an ensemble of all submissions would score 81%. The competition next year will run on ARC-AGI-2, an updated version of the dataset that keeps the same format as v1, but features fewer tasks that can be easily brute-forced. Early indications are that ARC-AGI-v2 will represent a complete reset of the state-of-the-art, and it will remain extremely difficult for o3. Meanwhile, a smart human or a small panel of average humans would still be able to score >95%.


@DavidSHolz @willdepue in your heart do you believe we’ve solved that one or no?






Yi-Lightning is now in Chatbot Arena! The latest and most capable model from @01AI_Yi. Come chat and vote at lmarena. ai. The leaderboard will be updated soon.


We are happy to announce a new site for Chatbot Arena! Over the past year, with the incredible support of our community, Chatbot Arena has evolved into a mature ecosystem and platform. We believe it's time for it to graduate and stand on its own. By giving Chatbot Arena its own platform, we aim to provide it with more independence and ensure its long-term growth. With a strong partnership with LMSys, we're expanding the platform to evaluate frontier models, not only for chatbots but also in areas like coding, complex tasks, and red-teaming. LMSys has been a research collective dedicated to a variety of projects, such as Vicuna, Chatbot Arena, SGLang, S-LoRA, RouteLLM, and more — beyond just one initiative. Moving forward, LMSys will continue to serve as an incubator for new projects and as a platform for open research and development. Come join us! Chatbot Arena: lmarena.ai New blog site: blog.lmarena.ai Blog: lmsys.org/blog/2024-09-2…































