Zeno (@try_zeno) - Profil Twitter | Zamantika Mersobahis Locabet

Tweet Disematkan

Zeno@try_zeno·20 Ara

We've teamed up with @AiEleuther to make it super easy to visualize your evaluation results in Zeno! Try it out the next time you run a benchmark: #visualizing-results" target="_blank" rel="nofollow noopener">github.com/EleutherAI/lm-…

English

2

11

49

12.6K

Zeno me-retweet

TwelveLabs (twelvelabs.io)@twelve_labs·23 Oca

@a13xba @cmuhcii @a13xba will give a presentation about @try_zeno, an interactive AI evaluation platform for exploring, debugging, and sharing how your AI systems perform. (co-founded with @a_a_cabrera) twitter.com/CarnegieMellon…

Carnegie Mellon University@CarnegieMellon

An @SCSatCMU team has released a new interactive platform for data management and machine learning (ML) evaluation called Zeno. It empowers users to explore, visualize and analyze data and ML model performance across custom use cases. cmu.is/zeno

English

1

3

6

811

Zeno@try_zeno·10 Oca

We just sent out the first issue of Zeno's Notes, our newsletter on AI evaluation. In case you're not on the recipient list yet, read it here: zenoml.com/blog/newslette…

English

0

1

4

506

Zeno me-retweet

Alex Cabrera@a_a_cabrera·9 Oca

Lots of predictions of synthetic data for AI being big this year. Decided to look at the OG Alpaca dataset: hub.zenoml.com/project/f192ed… Impressive for being GPT-4 generated w/ 1 prompt, but begs the question of how to generate diverse, OOD data

English

1

2

9

864

Zeno me-retweet

Slator@slatornews·8 Oca

Researchers from @CarnegieMellon, BerriAI explore the translation capabilities of Google’s Gemini and suggest Gemini Pro could be a valuable tool for MT. @SNAT02792153 @yu_zichun52802 @AashiqMuhamed @tianyue_01 @a13xba @a_a_cabrera @krrish_dh @XiongChenyan slator.com/is-google-gemi…

English

0

6

692

Zeno me-retweet

Alex Cabrera@a_a_cabrera·4 Oca

In case you missed it over the break - you can now visualize the outputs of any Eleuther LM Eval Harness run in @try_zeno with one command! 𝚙𝚢𝚝𝚑𝚘𝚗 𝚜𝚌𝚛𝚒𝚙𝚝𝚜/𝚣𝚎𝚗𝚘_𝚟𝚒𝚜𝚞𝚊𝚕𝚒𝚣𝚎.𝚙𝚢

Zeno@try_zeno

We've teamed up with @AiEleuther to make it super easy to visualize your evaluation results in Zeno! Try it out the next time you run a benchmark: #visualizing-results" target="_blank" rel="nofollow noopener">github.com/EleutherAI/lm-…

English

1

5

13

4K

Zeno me-retweet

Graham Neubig@gneubig·19 Ara

Google’s Gemini recently made waves as a major competitor to OpenAI’s GPT. Exciting! But we wondered: How good is Gemini really? At CMU, we performed an impartial, in-depth, and reproducible study comparing Gemini, GPT, and Mixtral. Paper: arxiv.org/abs/2312.11444 🧵

English

29

252

1.4K

493.5K

Zeno@try_zeno·13 Ara

💠

Greg Brockman@gdb

evals are surprisingly often all you need

ART

0

1

3

352

Zeno me-retweet

Shuyan Zhou@shuyanzh36·11 Ara

Since the initial release, we have significantly improved the usability of WebArena, accuracy of the evaluation, and provided interactive result analysis with @try_zeno I am attending #NeurIPS2023 , say hi 👋 if you are interested in AI agent, code gen and their evaluations!

Shuyan Zhou@shuyanzh36

🤖There have been recent exciting demos of agents that navigate the web and perform tasks for us. But how well do they work in practice? 🔊To answer this, we built WebArena, a realistic and reproducible web environment with 4+ real-world web apps for benchmarking useful agents🧵

English

2

10

45

10K

Zeno me-retweet

Alex@a13xba·7 Ara

Since some of you might be wondering whether Mamba 2.8B can serve as a drop-in replacement of some of the larger models, we've compared the Mamba model family to some of the most popular 7B models in @try_zeno Report: hub.zenoml.com/report/2443/Ma… 🧵 1/5

Albert Gu@_albertgu

Quadratic attention has been indispensable for information-dense modalities such as language... until now. Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried. With @tri_dao 1/

English

3

22

127

98.6K

Zeno me-retweet

Graham Neubig@gneubig·7 Ara

Recently there were some great results from the new Mamba architecture (arxiv.org/abs/2312.00752) by @_albertgu and @tri_dao. We did a bit of third-party validation, and 1. The results are reproducible 2. Mamba 2.8B is competitive w/ some 7B models (!) 3. Mistral is still strong

Alex@a13xba

Since some of you might be wondering whether Mamba 2.8B can serve as a drop-in replacement of some of the larger models, we've compared the Mamba model family to some of the most popular 7B models in @try_zeno Report: hub.zenoml.com/report/2443/Ma… 🧵 1/5

English

3

30

264

48.4K

Zeno me-retweet

MMitchell@mmitchell_ai·7 Ara

Useful thread wrt understanding Google's new Gemini reported results.

Alex Cabrera@a_a_cabrera

Google just released 𝑮𝒆𝒎𝒊𝒏𝒊, their long-awaited GPT-4 competitor. Their report shows comparison across multiple common benchmarks, but 𝐡𝐨𝐰 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞𝐬𝐞 𝐫𝐞𝐬𝐮𝐥𝐭𝐬? 🧵 on potential issues with the benchmark scores

English

1

6

17

6.3K

Zeno me-retweet

Alex Cabrera@a_a_cabrera·6 Ara

Google just released 𝑮𝒆𝒎𝒊𝒏𝒊, their long-awaited GPT-4 competitor. Their report shows comparison across multiple common benchmarks, but 𝐡𝐨𝐰 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞𝐬𝐞 𝐫𝐞𝐬𝐮𝐥𝐭𝐬? 🧵 on potential issues with the benchmark scores

English

6

28

174

76.3K

Zeno me-retweet

Alex@a13xba·2 Ara

Awesome blogpost by our friends @huggingface and @AiEleuther and a demonstration of how @try_zeno can be used to systematically spot issues with benchmark results. Give it a read! Also: Zeno Report: hub.zenoml.com/report/1255/DR… Zeno Project: hub.zenoml.com/project/2f5dec…

Clémentine Fourrier 🍊 is off till Dec 2026 hiking@clefourrier

⚠️ We are removing DROP from the Open LLM Leaderboard! With leaderboard evaluation data openly shared on 2000+ models, we did a deep dive with our friends @AiEleuther and @try_zeno, & found out that its original implementation is unfair to many models 😱 huggingface.co/blog/leaderboa…

English

0

1

7

701

Zeno me-retweet

Teknium 🪽@Teknium·1 Ara

Lots of info here on the several problems benchmarks can face

Alex Cabrera@a_a_cabrera

We loved collaborating with the @huggingface and @AiEleuther teams to investigate the odd behavior on the DROP benchmark! Check out the blog post and supporting Zeno report & project: Report: hub.zenoml.com/report/1255/DR… Project: hub.zenoml.com/project/2f5dec…

English

1

3

46

7.4K

Zeno me-retweet

Alex Cabrera@a_a_cabrera·1 Ara

We loved collaborating with the @huggingface and @AiEleuther teams to investigate the odd behavior on the DROP benchmark! Check out the blog post and supporting Zeno report & project: Report: hub.zenoml.com/report/1255/DR… Project: hub.zenoml.com/project/2f5dec…

Clémentine Fourrier 🍊 is off till Dec 2026 hiking@clefourrier

⚠️ We are removing DROP from the Open LLM Leaderboard! With leaderboard evaluation data openly shared on 2000+ models, we did a deep dive with our friends @AiEleuther and @try_zeno, & found out that its original implementation is unfair to many models 😱 huggingface.co/blog/leaderboa…

English

0

11

60

22.7K

Zeno me-retweet

Clémentine Fourrier 🍊 is off till Dec 2026 hiking@clefourrier·1 Ara

⚠️ We are removing DROP from the Open LLM Leaderboard! With leaderboard evaluation data openly shared on 2000+ models, we did a deep dive with our friends @AiEleuther and @try_zeno, & found out that its original implementation is unfair to many models 😱 huggingface.co/blog/leaderboa…

English

8

31

147

84.7K

Zeno me-retweet

Alex Cabrera@a_a_cabrera·28 Kas

𝐂𝐚𝐧 𝐀𝐈 𝐝𝐨 𝐲𝐨𝐮𝐫 𝐭𝐚𝐱𝐞𝐬? Probably not 💸 Quick @try_zeno report on @danielgross' benchmark of tax Qs. LLMs struggle with any math, and it's hard to validate text answers without external references Report: hub.zenoml.com/report/1717/Ca… Explore: hub.zenoml.com/project/e7519c…

English

1

5

11

5.7K

Zeno@try_zeno·23 Kas

Happy Thanksgiving! If you don't have one already, there's some turkey on Zeno: hub.zenoml.com/project/d7fddd…

Zeno@try_zeno

Zeno now supports 3D 🧊 data! We've uploaded over 1M @allen_ai ObjaverseXL models to a Zeno project to showcase how you can explore 3D data in a Zeno Project: hub.zenoml.com/project/d7fddd…

English

0

4

243

Zeno@try_zeno·22 Kas

Zeno now supports 3D 🧊 data! We've uploaded over 1M @allen_ai ObjaverseXL models to a Zeno project to showcase how you can explore 3D data in a Zeno Project: hub.zenoml.com/project/d7fddd…

English

0

2

10

912

Zeno

Jelajahi