

Hanze Dong
332 posts

@hendrydong
Research @MSFTResearch. RL Science | Generative Models | Autonomous Systems





🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." aisle.com/blog/ai-cybers…







🚨SPIGM is back at ICML 2026 — Call for Papers 🚨 SPIGM: Structured Probabilistic Inference & Generative Modeling — beyond scaling & benchmarks 📍Seoul 🇰🇷 🗓️Submit by April 24 (AoE) 👇Submission link below.



Gemma 4 is here! 🧠 31B and 26B A4B for models with impressive intelligence per parameter 🤏E2B and E4B for mobile and IoT 🤗Apache 2.0 🤖Base and IT checkpoints available Available in AI Studio, Hugging Face, Ollama, Android, and your favorite OS tools 🚀Download it today!




For the next couple years at least, the entire AI industry is going to be defined by this fact: demand is going to wildly outstrip supply, and so what matters is which companies / products have margin to pay for tokens. Those products will then rapidly improve because latency drives retention, and retention creates data to spin flywheels that improve the product and drive more adoption.
