Lav Varshney retweetledi

Are human creativity tests actually good predictors of creativity for large language models?
These tests are now widely applied to assess how “creative” large language models are, but their validity as measures of *machine* creativity has never actually been established.
Our new paper studies this question in detail:
💡We ran a large-scale study correlating LLM performance on human creativity tests with creative writing, divergent thinking, and scientific ideation benchmarks.
Main findings:
✍️ The Divergent Association Test (“name 10 words as different from each other as possible”) is the best predictor of creative writing ability.
💭The Conditional Divergent Association Test (“name 10 words as different from each other as possible, while staying relevant to a given cue word”) is the best predictor of divergent thinking
🚫 However, no single test predicts all three aspects (creative writing, divergent thinking, scientific ideation) well
🚫 Moreover, and contrary to popular belief, none of the tests is a reliable predictor of scientific ideation ability!
Solution:
We introduce the Divergent Remote Association Test, a novel creativity test that assesses both divergent *and* convergent thinking ability at the same time.
✅ The Divergent Remote Association Test is the first test to achieve significance in both evaluation criteria—validity (𝑟 = +0.57, 𝑝 ≈ 0.008) and specificity (𝑟 |𝑔 = +0.50, 𝑝 ≈ 0.02)—for predicting scientific creativity.
————
Thanks to my co-authors @AlexiGlad, @jonahablack, @hengjinlp, as well as other colleagues who provided helpful feedback on the manuscript: @Roger_Beaty, @BabakHemmatian, @lrvarshney.
Paper link: arxiv.org/abs/2605.13450

English






