
Hi @stevibe, thank you for your great benchmarks! It would be great if you could automatically publish a leaderboard for CLI-40 and other types of benchmarks, showing how each LLM (commercial and open-weight) has historically performed on each benchmark. For example, as a wiki in the repository?
English











