

Petro Snieda
1.2K posts

@PetroSnieda
Senior Tech Leader | 9 yrs as Developer (ex-Netflix) 🔥 Launching a new vibe-coded project every week ❌ 13 failed projects and 🚀 2 succeed ($7k MRR)











GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-world agentic work benchmark GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, which measures performance on real-world, economically valuable knowledge work through long-horizon, multi-turn tasks. Key takeaways: ➤ #3 overall, behind only Claude Fable 5 (1783) and Claude Opus 4.8 (1615), and level with GPT-5.5 (xhigh, 1509) ➤ The leading open weights model by a wide margin: the next open model, MiniMax-M3, scores 1408 ➤ Ahead of many proprietary models, including Google's Gemini 3.5 Flash (1357), Qwen 3.7 Max (1289), Muse Spark (1158) ➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1,999 matches ➤ Consistent with the rest of its launch, GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index, ranks #3 on the Agentic Index, and #3 on AA-Briefcase









🚀 Introducing our latest research: AgentCL: Toward Rigorous Evaluation of Continual Learning in Language Agents Continual learning for language agents has not yet been clearly defined. How should we evaluate their ability to continually learn from experience and improve themselves on complex, long-horizon tasks? - Traditional continual learning provides a useful perspective on the plasticity–stability trade-off, but its formulation does not naturally extend to non-parametric learning paradigms today. - Many recent works on agent memory still focus on retrieval and reasoning over long (static) contexts, rather than on how agents can reuse experience from complex agentic tasks. Across coding, deep research, and language understanding and reasoning tasks, we show that carefully designed task streams and metrics are essential for understanding continual learning in language agents. 📄 Paper: arxiv.org/abs/2606.02461 🤗 Dataset: huggingface.co/datasets/osunl…



Three announcements from our keynote at Compile, including how we're training a new model with SpaceX.



I’ve been finding absolutely unreal AI psychosis content on reels lately





