
Option to present your work at FSE and optionally publish a short paper.
Explore the details and join us. Together we can build transparent, reliable benchmarks for code LLMs. poisonedchalice.github.io
English
Ali Al-Kaswan 🍉
62 posts

@aalkaswan1
PhD candidate at SERG Delft Working on Machine Learning for Software Engineering #ML4SE #NLP #SE









Introducing 🌸BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks! BigCodeBench goes beyond simple evals like HumanEval and MBPP and tests LLMs on more realistic and challenging coding tasks.












