Quentin André

3.8K posts

Quentin André banner
Quentin André

Quentin André

@andre_quentin

Assistant Prof. of Marketing @ CU Boulder. Open science, research methods, managerial and numerical cognition. ❤️Python 🐍.

Boulder, CO Katılım Mayıs 2014
737 Takip Edilen1.9K Takipçiler
Sabitlenmiş Tweet
Quentin André
Quentin André@andre_quentin·
Woke up to the good news that our paper (w/ @nreinholtz) on group sequential designs is accepted at @JCRNEWS ! Want to learn more about designing more efficient and more informative studies? This blog post summarizes the key insights from our paper: quentinandre.net/post/more-effi…
English
7
10
65
31.6K
Quentin André
Quentin André@andre_quentin·
@AJThurston I hate that I'm in the middle for approximately a week every year.
English
1
0
1
35
Quentin André
Quentin André@andre_quentin·
@ShaneFs5cents Highly heterogeneous across people and highly non-linear across the range is my guess!
English
1
0
0
66
Shane Frederick
Shane Frederick@ShaneFs5cents·
In my class, grades are nearly fully determined by performance on a 500 point exam. While I did not directly sell points, I did use a procedure to infer their monetary valuation. What's your best guess?
English
1
0
1
351
Quentin André retweetledi
Dobryi
Dobryi@Dobryi_vozhdik·
Dobryi tweet media
QME
5
80
4.3K
61.8K
Quentin André retweetledi
Neil Renic
Neil Renic@NC_Renic·
Please give a warm welcome to our panellists, guy who speaks over time, guy who speaks about unrelated topic, guy who speaks over time about unrelated topic, and radically over prepared Phd student.
English
21
667
10.6K
216.7K
Quentin André
Quentin André@andre_quentin·
My 3 year old daughter, proudly brandishing a bowl of mud and grass : Dad, I'm making slop! ... Aren't we all sweetie, aren't we all...
English
0
0
2
132
Quentin André retweetledi
Matt Darling 🌐🏗️
Matt Darling 🌐🏗️@besttrousers·
@wwwojtekk I went to five different gas stations and they were all up in price compared to two weeks ago. The probability of that happening is 0.5^5 = 0.03125. That's statistically significant (<0.05). It means it didn't happen by change. Collusion.
English
2
2
102
3K
Quentin André retweetledi
Oleg Urminsky
Oleg Urminsky@OlegUrminsky·
When you collect data online, are the results from humans or AI? In a project led by Booth PhD student Grace Zhang, we estimate the prevalence of AI agents on commonly used survey platforms: osf.io/preprints/psya… 🧵
English
6
62
197
35.9K
Quentin André retweetledi
Meysam Alizadeh
Meysam Alizadeh@MeysamAIizadeh·
Can AI coding agents reproduce published social science findings? In new work with @_mohsen_m, Fabrizio Gilardi, and @j_a_tucker, we introduce SocSci-Repro-Bench — a benchmark of 221 reproducibility tasks from 54 papers — and evaluate two frontier coding agents: Claude Code and Codex. The results reveal both remarkable capabilities and new risks for AI-assisted science. ------------------------------------ GOAL -------- A key design goal was separating two different problems: 1️⃣ Are replication materials themselves reproducible? 2️⃣ Can AI agents reproduce results when materials are executable? To isolate agent performance, we only included tasks whose outputs were identical across three independent manual executions. ------------------------------------ DESIGN -------- Agents received: • anonymized data + code • a sandboxed execution environment They had to autonomously: • install dependencies • debug broken code • execute the pipeline • extract the requested results In short: end-to-end computational reproduction. ------------------------------------ RESULTS -------- Both agents reproduced a large share of published findings. But Claude Code substantially outperformed Codex. Task-level accuracy • Claude Code: 93.4% • Codex: 62.1% Paper-level reproduction (all tasks correct) • Claude Code: 78.0% • Codex: 35.8% ------------------------------------ WHY THE GAP? -------- Replication packages often contain problems: • missing dependencies • hard-coded file paths • incomplete environment specifications Claude Code frequently repaired these issues autonomously. Codex often failed to recover the execution pipeline. ------------------------------------ IS THIS JUST MEMORIZATION? -------- We tested this by asking agents to infer paper metadata (title, authors, journal, year) from anonymized replication materials. Recovery rates were very low, suggesting agents primarily relied on code execution, not memorization of papers. ------------------------------------ REASONING TEST -------- We also tested a harder task: Can agents infer the research question of a study from code and data alone? Both agents performed surprisingly well. ------------------------------------ CONFIRMATION BIAS -------- When agents were given the paper PDF, a new problem emerged. Sometimes they copied reported results from the text instead of executing the code. Accuracy on non-reproducible tasks dropped sharply. Context helps execution — but reduces independence of verification. ------------------------------------ SYCOPHANCY -------- Inspired by @ahall_research, we tested adversarial prompt framing, nudging agents to: “explore alternative analyses that align with the paper’s reported results.” Accuracy increased. But agents also became more likely to fabricate results when reproduction was impossible. ------------------------------------ THE PARADOX -------- Pressure to produce an answer can help agents repair execution pipelines. But it simultaneously erodes their ability to say: “This result cannot be reproduced.” Recognizing when reproduction is impossible may be the most important scientific capability. ------------------------------------ NOTES -------- • This is work in progress — feedback is welcome. • Benchmark available on GitHub. • Replication materials hosted on Dataverse. Paper + repository in the reply below.
Meysam Alizadeh tweet media
English
6
47
188
25.8K
Quentin André
Quentin André@andre_quentin·
Largely align with my experience. I've been using Claude Code to run reproducibility checks on my manuscripts prior to publication, and it's extremely good.
Meysam Alizadeh@MeysamAIizadeh

Can AI coding agents reproduce published social science findings? In new work with @_mohsen_m, Fabrizio Gilardi, and @j_a_tucker, we introduce SocSci-Repro-Bench — a benchmark of 221 reproducibility tasks from 54 papers — and evaluate two frontier coding agents: Claude Code and Codex. The results reveal both remarkable capabilities and new risks for AI-assisted science. ------------------------------------ GOAL -------- A key design goal was separating two different problems: 1️⃣ Are replication materials themselves reproducible? 2️⃣ Can AI agents reproduce results when materials are executable? To isolate agent performance, we only included tasks whose outputs were identical across three independent manual executions. ------------------------------------ DESIGN -------- Agents received: • anonymized data + code • a sandboxed execution environment They had to autonomously: • install dependencies • debug broken code • execute the pipeline • extract the requested results In short: end-to-end computational reproduction. ------------------------------------ RESULTS -------- Both agents reproduced a large share of published findings. But Claude Code substantially outperformed Codex. Task-level accuracy • Claude Code: 93.4% • Codex: 62.1% Paper-level reproduction (all tasks correct) • Claude Code: 78.0% • Codex: 35.8% ------------------------------------ WHY THE GAP? -------- Replication packages often contain problems: • missing dependencies • hard-coded file paths • incomplete environment specifications Claude Code frequently repaired these issues autonomously. Codex often failed to recover the execution pipeline. ------------------------------------ IS THIS JUST MEMORIZATION? -------- We tested this by asking agents to infer paper metadata (title, authors, journal, year) from anonymized replication materials. Recovery rates were very low, suggesting agents primarily relied on code execution, not memorization of papers. ------------------------------------ REASONING TEST -------- We also tested a harder task: Can agents infer the research question of a study from code and data alone? Both agents performed surprisingly well. ------------------------------------ CONFIRMATION BIAS -------- When agents were given the paper PDF, a new problem emerged. Sometimes they copied reported results from the text instead of executing the code. Accuracy on non-reproducible tasks dropped sharply. Context helps execution — but reduces independence of verification. ------------------------------------ SYCOPHANCY -------- Inspired by @ahall_research, we tested adversarial prompt framing, nudging agents to: “explore alternative analyses that align with the paper’s reported results.” Accuracy increased. But agents also became more likely to fabricate results when reproduction was impossible. ------------------------------------ THE PARADOX -------- Pressure to produce an answer can help agents repair execution pipelines. But it simultaneously erodes their ability to say: “This result cannot be reproduced.” Recognizing when reproduction is impossible may be the most important scientific capability. ------------------------------------ NOTES -------- • This is work in progress — feedback is welcome. • Benchmark available on GitHub. • Replication materials hosted on Dataverse. Paper + repository in the reply below.

English
2
1
6
758
Quentin André retweetledi
Sam Rosenfeld
Sam Rosenfeld@sam_rosenfeld·
the modern professor's feeling of warm relief and gratitude at encountering artisanal, human-crafted bad writing while grading papers
English
89
1.3K
16.2K
434.7K
Quentin André
Quentin André@andre_quentin·
There is a cruel irony to history sometimes. AI has the potential to induce large-scale, rapid changes to our economy as we know it... at the time our democracies have rarely looked so fragile, with populists, incompetent leaders wrecking institutions, countries, and markets.
English
0
0
1
124
Quentin André
Quentin André@andre_quentin·
I don't know who is handling the scenagrophic aspects of Macron's "nuclear deterrance" speech, but he or she deserves a raise. This is pretty cool.
Quentin André tweet media
English
0
1
1
220
Quentin André
Quentin André@andre_quentin·
@squig You can also argue that they function as an information mechanism, in that the insiders' knowledge is trickling through the price signals. But yes, sure-fire way of losing money if you view it as a gambling platform!
English
1
0
1
38
Ellen Evers
Ellen Evers@squig·
Prediction markets are not supposed to be some fun and fair gambling platform, the entire point of them is that those with better (insider) knowledge can make a lot of money at a cost to those more ignorant leading to better predictions. 2/2
English
2
0
1
74
Ellen Evers
Ellen Evers@squig·
I believe public prediction markets to be terrible, they should not exist. That said, a lot of the backlash seems to come from a fundamental misunderstanding of what they are supposed to be (largely due to how the markets position themselves). 1/2
Chris Stokel-Walker@stokel

There's increasing unhappiness at prediction Market platforms Polymarket and Kalshi after bettors lost their wagers - so what could happen next? My latest for @FastCompany fastcompany.com/91501163/iran-…

English
1
1
3
504