Gary King

5.2K posts

Gary King banner
Gary King

Gary King

@kinggary

Harvard Professor (social scientist, statistician). Co-founder Crimson Hexagon (now Brandwatch), Learning Catalytics, Perusall, Thresher (now Two Six Tech),...

Cambridge, MA Katılım Nisan 2009
7.8K Takip Edilen34.3K Takipçiler
Gary King
Gary King@kinggary·
Please help me congratulate Harvard's newest Ph.D., María Ballesteros @m_ballesteross! ... A photo with a few of her fans (& examiners).
Gary King tweet media
English
15
6
110
21.9K
Gary King
Gary King@kinggary·
New paper: "Who's to Blame for Survey Instability: Respondents with Nonexistent Preferences or Researchers with Flawed Measures?" with @LibbyJenke. Comments welcome! GaryKing.org/instability
Gary King tweet media
English
3
66
206
16.2K
Gary King
Gary King@kinggary·
If you're asking a question with a correct answer, then everyone should get the same answer. But if you're asking for creative or diverse thoughts, out-of-the-box concepts, or contrarian or heterodox ideas, then you raise a real problem. Here's a solution: GaryKing.org/quest
English
0
0
0
881
SuspendedCap
SuspendedCap@ContrarianCurse·
The speed in which things move through Fintwit and the market now is insane. Moves are priced in 1/10th the time of pre LLM
English
10
6
236
38.9K
Gary King
Gary King@kinggary·
Thanks everyone at @JohnsHopkins for coming to my talk & for the fabulous conversations. Great to see all the wonderful changes (with more in progress) in the social sciences there... Slides from my talk on "Who’s to Blame for Survey Instability: Respondents With Random Preferences or Researchers With Flawed Measures": gking.harvard.edu/presentations/…
English
1
22
104
10.1K
Gary King
Gary King@kinggary·
@SeanLangenfeld You might have a look at this article which explains "why propensity scores should not be used for matching" j.mp/psnot It really is not ambiguous.
English
1
5
20
1.8K
Sean Langenfeld
Sean Langenfeld@SeanLangenfeld·
I asked chatGPT to explain to me why biostatisticians are so critical of propensity score matching, and I thought the answer was very organized and straightforward. I always summarized it as false confidence, sample bias, and ongoing baseline differences in unmeasured variables. But, I am the farthest thing from being a statistician. Here's the better answer, though: "Great question. Propensity score matching (PSM) is widely used, but many biostatisticians are indeed critical of it—not because it’s useless, but because it’s often misused, misunderstood, or oversold. Here are the main reasons, in a structured way. 1. It cannot fix unmeasured confounding Core limitation: Propensity scores only balance observed covariates. If an important confounder is not measured or is poorly measured, PSM does nothing to fix that. This is especially problematic in retrospective clinical data (e.g., registry, EHR), where: Disease severity, frailty, physician judgment, social factors, etc. are often missing or crude. In surgery and oncology (which you deal with a lot), this is a huge issue: “who gets which treatment” is often driven by factors not captured in the dataset. So PSM can produce beautifully balanced tables and still be badly biased. 2. It can increase bias if done poorly Counterintuitively, matching can make things worse: If the propensity model is: Misspecified Omits important confounders Uses bad functional forms Then matching can: Amplify imbalance in key covariates Induce collider bias Create a sample that is less representative and more biased This is known as bias amplification. 3. It throws away data (often a lot of it) Typical 1:1 matching often: Discards: Many treated patients Many controls Especially if groups are very different initially. Consequences: Lower statistical power Larger standard errors Estimates apply only to a restricted subpopulation (the “matched” patients) So: You trade sample size and generalizability for “balance”. And many papers forget to state that they changed the estimand. 4. People confuse “balance” with “causality” A very common mistake: “The groups are balanced, therefore the effect is causal.” No. Balance on observed variables ≠ exchangeability You can balance 100 variables and still be confounded by 1 unmeasured one. This leads to false confidence in the result. 5. The propensity model is often built incorrectly Common problems: Using stepwise variable selection Using c-statistic to judge PS quality (wrong metric) Including post-treatment variables Omitting weak but important confounders Not checking standardized mean differences properly PS models should: Include all confounders, not just predictors of treatment Optimize covariate balance, not prediction accuracy Many clinical papers get this wrong. 6. It answers a different question than people think After matching, you are no longer estimating: “What is the effect in the whole population?” You are estimating: “What is the effect in the subset of patients who could plausibly have received either treatment?” This is often: Not stated Not understood Not what the clinician thinks they’re reading 7. There are often better methods In many settings, these are statistically superior: Doubly robust methods Inverse probability weighting Targeted maximum likelihood (TMLE) Modern outcome regression with flexible models Causal forests / g-methods These: Use all the data Can achieve better bias–variance tradeoff Are less sensitive to arbitrary matching decisions 8. Matching hides modeling decisions Matching involves many degrees of freedom: Caliper size With or without replacement Ratio (1:1, 1:2, 1:5…) Which variables go in PS How ties are handled Each choice: Can change the result Is rarely pre-specified Enables subtle p-hacking 9. In medical literature, it is often used as a “ritual” Unfortunately: PSM has become a checkbox: “Do PSM → claim causal inference”. Many statisticians view this as methodological cargo cult science. The fair, balanced conclusion Propensity scores are: ✅ Useful when done carefully ✅ Better than crude regression in some settings ❌ Not a magic fix for confounding ❌ Often misapplied in clinical research ❌ Frequently over-interpreted A practical rule of thumb If: Treatment choice is strongly influenced by unrecorded clinical judgment Or frailty, anatomy, tumor resectability, performance status are not well captured Then: PSM probably does not rescue the study from confounding by indication."
English
2
14
38
9.6K
Gary King
Gary King@kinggary·
New paper:"Experimental Evidence on the (Limited) Influence of Reputable Media Outlets" w/Bharat Anand, Kiran Misra, Sascha Riaz at GaryKing.org/reputable
Gary King tweet media
English
0
4
5
2K
Gary King
Gary King@kinggary·
Big congratulations to Chris Kenny, Harvard's newest PhD, with a few of his fans.
Gary King tweet media
English
0
2
76
21.1K
Gary King
Gary King@kinggary·
@rmkubinec @namalhotra @carlislerainey @matt_motta Our behavioral models (our observation mechanisms) can be tremendously important. Economists originally liked it because they can still get (stochastic) rationality, but only by assuming humans have no stable preferences. much evidence rejects this model
English
0
0
2
88
Gary King
Gary King@kinggary·
@namalhotra @matt_motta Thanks the comment, Neil. The random utility model is an assumption (humans have random preferences & never make mistakes in survey responses) not supported by the evidence. This often doesn't matter but does here. Have a look at our supplementary appendix, which discusses this.
English
1
0
5
256
Neil Malhotra
Neil Malhotra@namalhotra·
@matt_motta I may just be really dumb but I don't understand this paper. Normally, one would write out a random utility choice model (e.g, Luce/Thurstone), which have error terms. But that doesn't mean you inflate up estimates as if it was a compliance problem.
English
2
0
3
477