Sachin Grover
77 posts

Sachin Grover
@sachingrover
Post-Doctoral Researcher @ Interactive Robotics Lab, @ASU AI systems for humans, by humans.





github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training metrics like MMLU. This requires you to download and install more than 10 repos, each with different dependencies and issues. This is, as you might expect, an actual nightmare. (1/n)

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training metrics like MMLU. This requires you to download and install more than 10 repos, each with different dependencies and issues. This is, as you might expect, an actual nightmare. (1/n)





The video (youtu.be/4kuoPR9zuJU) first shows the core technology and the pipeline, followed by the demo. This work was done in collaboration with @shiwalimohan at PARC, part of #SRI. 2/7





