
Zeno
62 posts



An @SCSatCMU team has released a new interactive platform for data management and machine learning (ML) evaluation called Zeno. It empowers users to explore, visualize and analyze data and ML model performance across custom use cases. cmu.is/zeno




We've teamed up with @AiEleuther to make it super easy to visualize your evaluation results in Zeno! Try it out the next time you run a benchmark: #visualizing-results" target="_blank" rel="nofollow noopener">github.com/EleutherAI/lm-…



🤖There have been recent exciting demos of agents that navigate the web and perform tasks for us. But how well do they work in practice? 🔊To answer this, we built WebArena, a realistic and reproducible web environment with 4+ real-world web apps for benchmarking useful agents🧵

Quadratic attention has been indispensable for information-dense modalities such as language... until now. Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried. With @tri_dao 1/

Since some of you might be wondering whether Mamba 2.8B can serve as a drop-in replacement of some of the larger models, we've compared the Mamba model family to some of the most popular 7B models in @try_zeno Report: hub.zenoml.com/report/2443/Ma… 🧵 1/5

Google just released 𝑮𝒆𝒎𝒊𝒏𝒊, their long-awaited GPT-4 competitor. Their report shows comparison across multiple common benchmarks, but 𝐡𝐨𝐰 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞𝐬𝐞 𝐫𝐞𝐬𝐮𝐥𝐭𝐬? 🧵 on potential issues with the benchmark scores

⚠️ We are removing DROP from the Open LLM Leaderboard! With leaderboard evaluation data openly shared on 2000+ models, we did a deep dive with our friends @AiEleuther and @try_zeno, & found out that its original implementation is unfair to many models 😱 huggingface.co/blog/leaderboa…

We loved collaborating with the @huggingface and @AiEleuther teams to investigate the odd behavior on the DROP benchmark! Check out the blog post and supporting Zeno report & project: Report: hub.zenoml.com/report/1255/DR… Project: hub.zenoml.com/project/2f5dec…

⚠️ We are removing DROP from the Open LLM Leaderboard! With leaderboard evaluation data openly shared on 2000+ models, we did a deep dive with our friends @AiEleuther and @try_zeno, & found out that its original implementation is unfair to many models 😱 huggingface.co/blog/leaderboa…




Zeno now supports 3D 🧊 data! We've uploaded over 1M @allen_ai ObjaverseXL models to a Zeno project to showcase how you can explore 3D data in a Zeno Project: hub.zenoml.com/project/d7fddd…

