Ryan Kwon retweetledi

📢New benchmark out!
We introduce CLASH, a benchmark of 345💥high-stakes dilemmas and 3,795 perspectives to evaluate how well LLMs handle complex value reasoning.
GPT-4 and Claude? Not quite there.
📄 arxiv.org/pdf/2504.10823
🤗 huggingface.co/datasets/launc…

English
