Long Phan retweetledi

1/ A year ago, we released Humanity’s Last Exam, a benchmark to measure reasoning in LLMs.
One year later, almost exactly on the day of my one-year anniversary, it’s incredibly rewarding to see this work published in @Nature under open access and to see how much reasoning performance has progressed since.

English

















