Yuyao Wang

9 posts

Yuyao Wang

Yuyao Wang

@YuyaoStarling

PhD Student @ UW

Seattle, WA, USA Katılım Kasım 2022
18 Takip Edilen25 Takipçiler
Yuyao Wang retweetledi
Jiawei Liu
Jiawei Liu@JiaweiLiu_·
Interested in program synthesis for creating random DNNs? and its application on automated testing? Check our new work: “NeuRI: Diversifying DNN Generation via Inductive Rule Inference” with a Distinguished Paper Award @FSEconf! w/ Jinjun, @YuyaoStarling, @LingmingZhang
Jiawei Liu tweet media
English
5
6
71
10.4K
Yuyao Wang retweetledi
Jiawei Liu
Jiawei Liu@JiaweiLiu_·
In the past 6-mon release of HumanEval+ we have been improving its toolchain usability and dataset quality from v0.1.0 to v0.1.7 releases. 🔥 Now we release MBPP+, a new benchmark in EvalPlus v0.2.0: tinyurl.com/4pw82wb8 🧵
Jiawei Liu tweet media
English
2
7
50
7.7K
Yuyao Wang retweetledi
Jiawei Liu
Jiawei Liu@JiaweiLiu_·
Introducing the EvalPlus leaderboard! evalplus.github.io/leaderboard.ht… 🔥28 models have been evaluated on coding HumanEval & HumanEval+ 🔥7B CodeLlama outperforms ~16B models e.g. StarCoder&CodeGen 🔥Phind-CodeLlama-34B-v2 and WizardCoder-Python-34B-V1 as open models both beat ChatGPT 🧵
Jiawei Liu tweet media
English
4
18
141
40.4K
Yuyao Wang retweetledi
Univers Tennis 🎾
Univers Tennis 🎾@UniversTennis·
🔴 NOVAK DJOKOVIC DÉTIENT SEUL LE RECORD MASCULIN DE TITRES EN GRAND CHELEM. 🐐🇷🇸
Univers Tennis 🎾 tweet media
Français
91
1.3K
5.9K
436.5K
Yuyao Wang retweetledi
Wenhu Chen
Wenhu Chen@WenhuChen·
New Arxiv: arxiv.org/abs/2305.12524 GPT-4/PaLM-2 have both shown almost perfect performance on existing grade school math dataset. What about more challenging STEM questions, especially the ones which require specific theorems, like Stoke's theorem, Wiener Process, etc?
Wenhu Chen tweet mediaWenhu Chen tweet media
English
13
112
478
152.7K
Yuyao Wang retweetledi
Jiawei Liu
Jiawei Liu@JiaweiLiu_·
We welcome everyone to try out 📚𝐇𝐮𝐦𝐚𝐧𝐄𝐯𝐚𝐥+! A dataset to reflect the "real" correctness of LLM-generated code. Using📚𝐇𝐮𝐦𝐚𝐧𝐄𝐯𝐚𝐥+ is the same as HumanEval. You can easily pip install it and evaluate in our prepared sandbox (optional). github.com/evalplus/evalp…
Jiawei Liu tweet media
AK@_akhaliq

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation extensive evaluation across 14 popular LLMs (including GPT-4 and ChatGPT) demonstrates that HUMANEVAL+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by 15.1% on average! For example, the pass@k of widely studied open-source models like CODEGEN-16B can drop by over 18.0%, while the performance of state-of-the-art commercial models like ChatGPT and GPT-4 can also drop by at least 13.0%, largely affect the result analysis for almost all recent work on LLM-based code generation abs: arxiv.org/abs/2305.01210 github: github.com/evalplus/evalp…

English
2
17
103
15K
Yuyao Wang retweetledi
Steven Xia
Steven Xia@steven_xia_·
🚨 Evaluating LLM-generated code on datasets with just "3 test-cases" is NOT enough! 🚨 We built ✨HumanEval+✨: improving HumanEval with up to thousands of new tests to fully evaluate functional correctness of LLM generated code! @JiaweiLiu_ @YuyaoStarling @LingmingZhang
Steven Xia tweet media
AK@_akhaliq

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation extensive evaluation across 14 popular LLMs (including GPT-4 and ChatGPT) demonstrates that HUMANEVAL+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by 15.1% on average! For example, the pass@k of widely studied open-source models like CODEGEN-16B can drop by over 18.0%, while the performance of state-of-the-art commercial models like ChatGPT and GPT-4 can also drop by at least 13.0%, largely affect the result analysis for almost all recent work on LLM-based code generation abs: arxiv.org/abs/2305.01210 github: github.com/evalplus/evalp…

English
1
12
39
10.8K
Yuyao Wang retweetledi
Ilya Sergey
Ilya Sergey@ilyasergey·
Strong PhD candidate in ML/AI: “I have published four NeurIPS papers, two are first-author, one of which was a spotlight”. Strong PhD candidate in PL: “I have solved all exercises from the first two volumes of Software Foundations”.
English
7
35
597
101.6K
Yuyao Wang retweetledi
Jeremy Cohen
Jeremy Cohen@deepcohen·
I tell new PhD students to pick a research topic according to three criteria: (1) the problem should be important, (2) it should have a reasonable chance of being solvable, and (3) you should personally have a unique edge.
Nassim Nicholas Taleb@nntaleb

The only writing advice I've ever given: write the book that nobody else can write. If there is a single person on Planet Earth who can write anything close to it, find a hobby. Generalize to every line you write. Those who didn't follow such a guideline are punished by ChatGPT.

English
6
42
232
48.3K