
Lucio Dery Jnr Mwinm
273 posts




We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.


The standard for frontier coding evals is changing with model maturity. We now recommend reporting SWE-bench Pro and are sharing more detail on why we’re no longer reporting SWE-bench Verified as we work with the industry to establish stronger coding eval standards. SWE-bench Verified was a strong benchmark, but we’ve found evidence it is now saturated due to test-design issues and contamination from public repositories. openai.com/index/why-we-n…









Introducing Gemini 2.5 Flash Image (aka nano-banana), our SOTA image generation and editing model 🍌 As you might have already seen, this model excels at character consistency, creative edits, and has Gemini's world knowledge!



30+ accepted papers 6 oral papers 6 guest speakers join us at @iclr_conf on the 27th Hall 4 #3 for a full day of workshop on Modularity for Collaborative, Decentralized, and Continual Learning sites.google.com/corp/view/mcdc… @derylucio, Fengyuan Liu, and myself will be organizing that day in person


Workshop alert 🚨 We'll host in ICLR 2025 a workshop on modularity, encompassing collaborative + decentralized + continual learning. Those topics are on the critical path to building better AIs. Interested? submit a paper and join us in Singapore! sites.google.com/corp/view/mcdc…