Samuel Schapiro (@samschapiro) - Twitter Profili

Sabitlenmiş Tweet

As someone who pivoted from ML theory -> cognitive science in 2024 (but is now working to bridge the two), I cannot emphasize this enough. At the beginning of my career, like many other theorists, I was captivated by the mathematical elegance of learning-theoretic frameworks (PAC, ERM, etc.) and excited to use tools from probability theory to characterize the generalization abilities of learning algorithms. However, I soon realized there was a type of generalization (creativity) unexplained by prevailing frameworks. This demanded a return to first principles—philosophy, psychology, epistemology, even metaphysics—transitioning to what Thomas Kuhn would call "extraordinary science." Two years in, the journey has been arduous but rewarding. It's evident that certain problems command an interdisciplinary approach, with creativity clearly being one of them. Along the way, I read many, many books, and one quote from Popper continues to be a strong source of inspiration: "We are not students of some subject matter but students of problems. And problems may cut right across the border of any subject matter or discipline." -Karl Popper, Conjectures and Refutations: The Growth of Scientific Knowledge (p. 141)

Misha Teplitskiy | Science of Science@MishaTeplitskiy

New paper on PhD admissions and pivots! Scientific communities need new ideas to stay productive and relevant. One source of new ideas is students who pivot from other fields. Do such pivots pay off for the student or the community? 🤔 1/3

English

4

28

313

36K

Samuel Schapiro@samschapiro·2d

Happy to share 2/2 papers accepted at the #ICML2026 workshop on Generative AI & Creativity! Paper #2, Do Semantic Distance Tests Actually Predict Creativity in LLMs?, is released in the form of an extended preprint (see thread below)⬇️ Paper #1, CreativityNeuro: Steering Language Model Weights to Improve Divergent Thinking, will be released to arXiv soon - stay tuned! Thanks to all my co-authors for their contributions and support! (@AlexiGlad @jonahablack @lrvarshney @corefpark @flxsosa @hengjinlp)

Samuel Schapiro@samschapiro

Are human creativity tests actually good predictors of creativity for large language models? These tests are now widely applied to assess how “creative” large language models are, but their validity as measures of *machine* creativity has never actually been established. Our new paper studies this question in detail: 💡We ran a large-scale study correlating LLM performance on human creativity tests with creative writing, divergent thinking, and scientific ideation benchmarks. Main findings: ✍️ The Divergent Association Test (“name 10 words as different from each other as possible”) is the best predictor of creative writing ability. 💭The Conditional Divergent Association Test (“name 10 words as different from each other as possible, while staying relevant to a given cue word”) is the best predictor of divergent thinking 🚫 However, no single test predicts all three aspects (creative writing, divergent thinking, scientific ideation) well 🚫 Moreover, and contrary to popular belief, none of the tests is a reliable predictor of scientific ideation ability! Solution: We introduce the Divergent Remote Association Test, a novel creativity test that assesses both divergent *and* convergent thinking ability at the same time. ✅ The Divergent Remote Association Test is the first test to achieve significance in both evaluation criteria—validity (𝑟 = +0.57, 𝑝 ≈ 0.008) and specificity (𝑟 |𝑔 = +0.50, 𝑝 ≈ 0.02)—for predicting scientific creativity. ———— Thanks to my co-authors @AlexiGlad, @jonahablack, @hengjinlp, as well as other colleagues who provided helpful feedback on the manuscript: @Roger_Beaty, @BabakHemmatian, @lrvarshney. Paper link: arxiv.org/abs/2605.13450

English

4

9

50

7.5K

Samuel Schapiro@samschapiro·14 May

Are human creativity tests actually good predictors of creativity for large language models? These tests are now widely applied to assess how “creative” large language models are, but their validity as measures of *machine* creativity has never actually been established. Our new paper studies this question in detail: 💡We ran a large-scale study correlating LLM performance on human creativity tests with creative writing, divergent thinking, and scientific ideation benchmarks. Main findings: ✍️ The Divergent Association Test (“name 10 words as different from each other as possible”) is the best predictor of creative writing ability. 💭The Conditional Divergent Association Test (“name 10 words as different from each other as possible, while staying relevant to a given cue word”) is the best predictor of divergent thinking 🚫 However, no single test predicts all three aspects (creative writing, divergent thinking, scientific ideation) well 🚫 Moreover, and contrary to popular belief, none of the tests is a reliable predictor of scientific ideation ability! Solution: We introduce the Divergent Remote Association Test, a novel creativity test that assesses both divergent *and* convergent thinking ability at the same time. ✅ The Divergent Remote Association Test is the first test to achieve significance in both evaluation criteria—validity (𝑟 = +0.57, 𝑝 ≈ 0.008) and specificity (𝑟 |𝑔 = +0.50, 𝑝 ≈ 0.02)—for predicting scientific creativity. ———— Thanks to my co-authors @AlexiGlad, @jonahablack, @hengjinlp, as well as other colleagues who provided helpful feedback on the manuscript: @Roger_Beaty, @BabakHemmatian, @lrvarshney. Paper link: arxiv.org/abs/2605.13450

English

5

18

77

22.8K

Samuel Schapiro@samschapiro·28 Nis

@AlexiGlad Alison Gopnik is a famous proponent of the "theory theory:" "The basic idea is that children develop their everyday knowledge of the world by using the same cognitive devices that adults use in science." (qtd. in alisongopnik.com/Papers_Alison/…)

English

1

0

4

129

Alexi Gladstone@AlexiGlad·27 Nis

the hallmark of a great scientist is to never stop being a child children are the greatest scientists

English

3

0

26

1.2K

Samuel Schapiro@samschapiro·1 Kas

“The penalty that we pay for the use of statistical principles in the design of the system is a probability that we may get a wrong response in any particular case…” - Rosenblatt (1957)

English

0

1

278

Samuel Schapiro@samschapiro·28 Eki

Interesting example from @blaiseaguera's book "What is Intelligence?" where he argues "the 'Jen' example requires higher-order theory of mind." I would say TOM is a sufficient but not necessary condition for successful NTP here. Seems possible to solve w/out. @flxsosa thoughts?

English

0

1

231

Samuel Schapiro@samschapiro·23 Eki

@AlexiGlad Outliers with opposing signals is what first comes to mind: arxiv.org/abs/2311.04163

English

0

1

60

Alexi Gladstone@AlexiGlad·23 Eki

ive always seen this and wondered... if anyone knows please tell!

zed@zmkzmkz

does anyone have any pointers on what this "hump" is in the gradient norm at the beginning of training a transformer? I've seen this happen at all scales, even in different architectural variants, even with or without warmup/decay lr

English

1

0

3

1.1K

Samuel Schapiro@samschapiro·22 Eki

Modern "conceptual blending" theory posits a fundamental cognitive ability which underlies artistic, scientific, and technological performance. Interesting that as early as 1873, even Nietzsche recognized the importance of constructing blends and metaphors for human cognition.

English

0

1

151

Samuel Schapiro retweetledi

Kasra Jalaldoust@causalkasra·10 Eki

I do agree with the sentiment of this note for the most part. The existing generalization theory fails to explain the real-world generalizations, especially since the iid assumption is often violated. Below is my perspective: There are two big surprises alluded by the note: Surprise 1. Sometimes generalizations seem quite plausible considering the human as a baseline, yet the algorithms had always failed unanimously, e.g., under some adversarial distribution shifts that are often meticulously designed via simulations. Surprise 2. Sometimes we observe machine generalizations that can not be explained by the existing theory, e.g., some LLMs are IMO-grade competent in novel math questions. Surprise 1 suggests either of the following options: Option 1.2. Simulated distribution shifts don’t mimic the reality, and it’s not a machine’s failure in generalization per se. It’s our cognitive flaw to believe in possibility of generalization in those adversarial situations, and comparing the algorithms applied to a narrow-scoped data with a human is unscientific. Even the best of humans can be fooled by the adversarial pattern changes, so what constitutes a good generalization doesn’t need to be so pessimistic about the unseen data. Option 1.2. Machine generalization is not good enough yet. There is an ingredient in our algorithms that mismatches with that of a good learner (e.g., humans), some call it an “inductive bias”. A good inductive bias shields the performance against a wide class of adversarial shifts that we believe as sensible, while a less good inductive bias might allow generalization only in situations that we do not consider particularly interesting. Study of “useful” inductive biases is inherently tied to the real-world and the somewhat subjective notion of interesting/uninteresting generalizations. Is this an ML researcher’s job to study “what”, “why” and “how” of useful inductive biases? Some might say this is their sole job after Vapnik’s ERM, while others seem to believe in a more objective study of inductive biases, possibly dismissing the applied premise. Surprise 2 suggests either of the following options: Option 2.1. Much of generalization-like behavior of machines is not generalization at all. It might again be a cognitive flaw that we do not expect such math skills from an LLM trained with resources larger than the whole economy of some countries participating in IMO. Considering the sheer amount and diversity of the data used to train the model, what appears as a generalization might as well be merely an interpolation, not too unexpected from the perspective of Cover and Hart’s nearest neighbor guarantee. Indeed, the existing generalization theory seems unable to explain such phenomenon at any granularity, and there doesn’t seem to be an easy resolution —> generalization theory back on the shelf. Option 2.2. It is necessary to continue the efforts anyway. Having a better theory of generalization is not just an intellectual fetish, but an absolute necessity for a society that is already seriously reliant on cybernetics. It is true that the generalization theory hasn’t succeeded, but out of necessity, it still deserves to use the public funding, must welcome less traditional/mainstream perspectives, and indulge in the peer reviewed work of adjacent schools of thought, in hopes of a resolution. My personal choices, if not clear yet, are 1.2 and 2.2. Causal inference is the formal study of a very nontrivial inductive bias that enables some of the intelligent human behaviors. My job has been (and hopefully continues to be) to build from the existing causal transportability theory to understand the role of causal inductive biases in generalization, developing both positive and negative results. Fortunate to be alive in this exciting time, I am not going to put it back on the shelf, and would bet my life on generalization theory as a collective human effort becoming a successful project, soon!

English

0

1

6

668

Samuel Schapiro@samschapiro·30 Eyl

1/N Large language models (LLMs) have been widely adopted for closed-ended tasks like reasoning, but can they truly be creative? 📚 Excited to announce our new work — Combinatorial Creativity: A New Frontier in Generalization Abilities. 📝 Paper arxiv.org/abs/2509.21043

English

1

6

11

3.1K

Samuel Schapiro retweetledi

Alexi Gladstone@AlexiGlad·1 Eki

*Human-Like* Creativity is perhaps the most out-of-reach task for modern LLMs I'm super excited to share our new work evaluating LLMs with a creativity framework! We develop a synthetic creativity task to measure LLMs' capabilities in generating novel, creative, combinations, and benchmark current LLMs

Samuel Schapiro@samschapiro

1/N Large language models (LLMs) have been widely adopted for closed-ended tasks like reasoning, but can they truly be creative? 📚 Excited to announce our new work — Combinatorial Creativity: A New Frontier in Generalization Abilities. 📝 Paper arxiv.org/abs/2509.21043

English

0

5

14

1.8K

Samuel Schapiro retweetledi

Sumuk@sumukx·30 Eyl

turns out making models very robust has the undesired side effect of making them far less creative than they otherwise would be, making outputs feel mode collapsed and mundane (a novelty-utility tradeoff) this paper aims to mathematically define and quantify why that is (!)

Samuel Schapiro@samschapiro

1/N Large language models (LLMs) have been widely adopted for closed-ended tasks like reasoning, but can they truly be creative? 📚 Excited to announce our new work — Combinatorial Creativity: A New Frontier in Generalization Abilities. 📝 Paper arxiv.org/abs/2509.21043

English

0

1

2

298

Samuel Schapiro@samschapiro·30 Eyl

6/N A huge thanks to all collaborators: @sumukx , @AlexiGlad, Jonah Black, @compulsi0n, @dilekhakkanitur, @lrvarshney Stay tuned for more exciting work about the nature of creativity and how we can make AI models more creative! 📝Read more: arxiv.org/abs/2509.21043

English

0

4

120

Samuel Schapiro@samschapiro·30 Eyl

5/N ... (b) models that are too deep and *narrow*, which suffer from restricted representational capacity -- and may be unable to *associate semantically distant concepts* needed for novel combinations.

English

1

0

3

106

Samuel Schapiro retweetledi

Samuel Schapiro@samschapiro·2 Eyl

Read more about how @spiralworks_ai plans to change science forever: samuelschapiro.substack.com/p/towards-a-ne…