Yuyao Wang (@YuyaoStarling) - Twitter Profili | Zamantika Mersobahis Locabet

Yuyao Wang retweetledi

Jiawei Liu@JiaweiLiu_·5 Ara

Interested in program synthesis for creating random DNNs? and its application on automated testing? Check our new work: “NeuRI: Diversifying DNN Generation via Inductive Rule Inference” with a Distinguished Paper Award @FSEconf! w/ Jinjun, @YuyaoStarling, @LingmingZhang

English

5

6

71

10.4K

Yuyao Wang retweetledi

Jiawei Liu@JiaweiLiu_·28 Kas

In the past 6-mon release of HumanEval+ we have been improving its toolchain usability and dataset quality from v0.1.0 to v0.1.7 releases. 🔥 Now we release MBPP+, a new benchmark in EvalPlus v0.2.0: tinyurl.com/4pw82wb8 🧵

English

2

7

50

7.7K

Yuyao Wang retweetledi

Jiawei Liu@JiaweiLiu_·16 Eki

Introducing the EvalPlus leaderboard! evalplus.github.io/leaderboard.ht… 🔥28 models have been evaluated on coding HumanEval & HumanEval+ 🔥7B CodeLlama outperforms ~16B models e.g. StarCoder&CodeGen 🔥Phind-CodeLlama-34B-v2 and WizardCoder-Python-34B-V1 as open models both beat ChatGPT 🧵

English

4

18

141

40.4K

Yuyao Wang retweetledi

Univers Tennis 🎾@UniversTennis·11 Haz

🔴 NOVAK DJOKOVIC DÉTIENT SEUL LE RECORD MASCULIN DE TITRES EN GRAND CHELEM. 🐐🇷🇸

Français

91

1.3K

5.9K

436.5K

Yuyao Wang retweetledi

Wenhu Chen@WenhuChen·23 May

New Arxiv: arxiv.org/abs/2305.12524 GPT-4/PaLM-2 have both shown almost perfect performance on existing grade school math dataset. What about more challenging STEM questions, especially the ones which require specific theorems, like Stoke's theorem, Wiener Process, etc?

English

13

112

478

152.7K

Yuyao Wang retweetledi

Jiawei Liu@JiaweiLiu_·6 May

We welcome everyone to try out 📚𝐇𝐮𝐦𝐚𝐧𝐄𝐯𝐚𝐥+! A dataset to reflect the "real" correctness of LLM-generated code. Using📚𝐇𝐮𝐦𝐚𝐧𝐄𝐯𝐚𝐥+ is the same as HumanEval. You can easily pip install it and evaluate in our prepared sandbox (optional). github.com/evalplus/evalp…

AK@_akhaliq

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation extensive evaluation across 14 popular LLMs (including GPT-4 and ChatGPT) demonstrates that HUMANEVAL+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass @k by 15.1% on average! For example, the pass@k of widely studied open-source models like CODEGEN-16B can drop by over 18.0%, while the performance of state-of-the-art commercial models like ChatGPT and GPT-4 can also drop by at least 13.0%, largely affect the result analysis for almost all recent work on LLM-based code generation abs: arxiv.org/abs/2305.01210 github: github.com/evalplus/evalp…

English

2

17

103

15K

Yuyao Wang retweetledi

Steven Xia@steven_xia_·3 May

🚨 Evaluating LLM-generated code on datasets with just "3 test-cases" is NOT enough! 🚨 We built ✨HumanEval+✨: improving HumanEval with up to thousands of new tests to fully evaluate functional correctness of LLM generated code! @JiaweiLiu_ @YuyaoStarling @LingmingZhang

AK@_akhaliq

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation extensive evaluation across 14 popular LLMs (including GPT-4 and ChatGPT) demonstrates that HUMANEVAL+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass @k by 15.1% on average! For example, the pass@k of widely studied open-source models like CODEGEN-16B can drop by over 18.0%, while the performance of state-of-the-art commercial models like ChatGPT and GPT-4 can also drop by at least 13.0%, largely affect the result analysis for almost all recent work on LLM-based code generation abs: arxiv.org/abs/2305.01210 github: github.com/evalplus/evalp…

English

1

12

39

10.8K

Yuyao Wang retweetledi

Ilya Sergey@ilyasergey·4 Oca

Strong PhD candidate in ML/AI: “I have published four NeurIPS papers, two are first-author, one of which was a spotlight”. Strong PhD candidate in PL: “I have solved all exercises from the first two volumes of Software Foundations”.

English

7

35

597

101.6K

Yuyao Wang retweetledi

Jeremy Cohen@deepcohen·28 Oca

I tell new PhD students to pick a research topic according to three criteria: (1) the problem should be important, (2) it should have a reasonable chance of being solvable, and (3) you should personally have a unique edge.

Nassim Nicholas Taleb@nntaleb

The only writing advice I've ever given: write the book that nobody else can write. If there is a single person on Planet Earth who can write anything close to it, find a hobby. Generalize to every line you write. Those who didn't follow such a guideline are punished by ChatGPT.

English

6

42

232

48.3K

Yuyao Wang

Keşfet