
We've released annotated slides for a talk titled "Evaluating LLMs is a minefield". Current ways of evaluating chatbots/LLMs don't work well, especially for questions about societal impact. There are no quick fixes. More research is needed. w/ @sayashk 🧵cs.princeton.edu/~arvindn/talks…
















