Michelle Yin

21 posts

Michelle Yin

Michelle Yin

@MichelleYinPhD

Katılım Nisan 2026
11 Takip Edilen5 Takipçiler
Michelle Yin
Michelle Yin@MichelleYinPhD·
Two more papers and several policy briefs on this topic are coming. But if there is one takeaway right now, it is what I told the Wall Street Journal: “I personally would not rely on just one measure to say, ‘Oh, I should change my job,’ or ‘I should change my kid’s major.’”
Michelle Yin@MichelleYinPhD

I care about this because I have sat across the table from workers whose career decisions depend on what researchers like me put into the world. If those numbers are not credible, we are failing the people we are supposed to serve.

English
0
0
0
6
Michelle Yin
Michelle Yin@MichelleYinPhD·
I care about this because I have sat across the table from workers whose career decisions depend on what researchers like me put into the world. If those numbers are not credible, we are failing the people we are supposed to serve.
Michelle Yin@MichelleYinPhD

AI companies are profit-driven and their models reflect their training data, and their design choices. Researchers who use these proprietary outputs as scientific instruments have an obligation to verify what they produce before families, communities, and governments act on it.

English
0
0
0
12
Michelle Yin
Michelle Yin@MichelleYinPhD·
AI companies are profit-driven and their models reflect their training data, and their design choices. Researchers who use these proprietary outputs as scientific instruments have an obligation to verify what they produce before families, communities, and governments act on it.
Michelle Yin@MichelleYinPhD

When I discovered the instability in AI exposure scores, I was working with workforce programs in Maine and Virginia trying to help real people navigate a changing labor market, and I realized the numbers we were relying on gave fundamentally different answers!

English
0
0
0
13
Michelle Yin
Michelle Yin@MichelleYinPhD·
When I discovered the instability in AI exposure scores, I was working with workforce programs in Maine and Virginia trying to help real people navigate a changing labor market, and I realized the numbers we were relying on gave fundamentally different answers!
Michelle Yin@MichelleYinPhD

I want to share why this research is personal to me, not just professional. I came to this country as an immigrant and built my career as a labor economist because I believe how we measure work shapes how we value workers.

English
0
0
0
15
Michelle Yin
Michelle Yin@MichelleYinPhD·
I want to share why this research is personal to me, not just professional. I came to this country as an immigrant and built my career as a labor economist because I believe how we measure work shapes how we value workers.
Michelle Yin@MichelleYinPhD

New Wall Street Journal piece on paper number one of a series. Same task list, four AI models. The share of US occupations flagged "high AI exposure" runs from 14% under one model to 51% under another, on identical content. A 19-fold spread. Thread. wsj.com/tech/ai/ai-mod…

English
0
0
0
17
Michelle Yin
Michelle Yin@MichelleYinPhD·
New Wall Street Journal piece on paper number one of a series. Same task list, four AI models. The share of US occupations flagged "high AI exposure" runs from 14% under one model to 51% under another, on identical content. A 19-fold spread. Thread. wsj.com/tech/ai/ai-mod…
English
0
0
0
461
Michelle Yin
Michelle Yin@MichelleYinPhD·
@arindube @ClaudiaLPersico Also, the rubric treats each task as independent. Work is not independent. It is embedded in organizations, norms, and power structures. That is partly why we argue the field needs multi-model sensitivity as a floor, not a ceiling.
English
0
0
1
6
Arin Dube
Arin Dube@arindube·
Very interesting work. My perspective: LLMs can't possibly answer that question because there are unknown unknowns. And it's not just because we don't know the technological trajectory of LLMs. It's because we have under-appreciated the economic and sociological foundation of work. And how that mediates AI use.
Claudia Persico (@claudiapersico.bsky.social)@ClaudiaLPersico

We asked four LLMs how exposed your job is to AI. They could NOT agree. Management: 15% vs 90% Legal: 10% vs 75% Healthcare: 5% vs 60% Same rubric. Same jobs. Same data. Different AI, completely different answer. New @nberpubs working paper with @MichelleYinPhD & @hoa_vuxuan! 1/

English
4
13
63
12.7K
Michelle Yin
Michelle Yin@MichelleYinPhD·
@arindube @ClaudiaLPersico Great point and thank you! The deeper issue is sociological. How work is organized, who adopts AI and why, which tasks get restructured versus automated, none of that is visible to a model rating tasks against a rubric.
English
0
0
1
5
Michelle Yin
Michelle Yin@MichelleYinPhD·
@karthiktadepall @ClaudiaLPersico @nberpubs @hoa_vuxuan (2) Even if part of the shift is real capability expansion, the scores enter the downstream literature as fixed occupational characteristics. If the measure moves with the technology, it’s not a stable treatment variable which is the core problem for causal inference.
English
0
0
1
4
Michelle Yin
Michelle Yin@MichelleYinPhD·
Two things are going wrong. First: each AI has a different calibration. Second: a feedback loop. Tasks where AI is advancing fastest generate the most training data, so newer models rate those tasks as more exposed. @hoa_vuxuan @ClaudiaLPersico
Michelle Yin@MichelleYinPhD

Here is every occupation, all 95, sorted by how much the four models disagree. Top of the chart: 87 percentage points of disagreement for a single occupation. One AI sees the job as almost fully exposed. Another sees it as barely exposed. Find your job.

English
0
0
1
21
Michelle Yin
Michelle Yin@MichelleYinPhD·
Here is every occupation, all 95, sorted by how much the four models disagree. Top of the chart: 87 percentage points of disagreement for a single occupation. One AI sees the job as almost fully exposed. Another sees it as barely exposed. Find your job.
Michelle Yin tweet media
Michelle Yin@MichelleYinPhD

Then we asked: does this matter for the conclusions economists are actually drawing? We plugged each model’s scores into a standard labor economics analysis. With one model’s scores: significant job losses. With another’s: no detectable effect. The entire finding flipped.

English
0
0
1
30
Michelle Yin
Michelle Yin@MichelleYinPhD·
These scores are not academic exercises. The ILO uses them. The IMF uses them. The BLS uses them. Acemoglu (2025), Brynjolfsson et al. (2025), and Eisfeldt et al. (2023) are built on them. Nobody was checking whether a different model would give a different answer.
Michelle Yin@MichelleYinPhD

Then we asked: does this matter for the conclusions economists are actually drawing? We plugged each model’s scores into a standard labor economics analysis. With one model’s scores: significant job losses. With another’s: no detectable effect. The entire finding flipped.

English
0
0
1
5
Michelle Yin
Michelle Yin@MichelleYinPhD·
Then we asked: does this matter for the conclusions economists are actually drawing? We plugged each model’s scores into a standard labor economics analysis. With one model’s scores: significant job losses. With another’s: no detectable effect. The entire finding flipped.
Michelle Yin tweet media
Michelle Yin@MichelleYinPhD

We replicated the most widely used AI exposure rubric (Eloundou et al. 2024) with four frontier models: GPT-4, ChatGPT-5, Gemini 2.5, and Claude 4.5. Same instructions. Same O*NET task data. Same pipeline. Mean exposure ranged from 14% to 51%. A 3.6x gap on identical jobs.

English
0
0
1
60
Michelle Yin
Michelle Yin@MichelleYinPhD·
We replicated the most widely used AI exposure rubric (Eloundou et al. 2024) with four frontier models: GPT-4, ChatGPT-5, Gemini 2.5, and Claude 4.5. Same instructions. Same O*NET task data. Same pipeline. Mean exposure ranged from 14% to 51%. A 3.6x gap on identical jobs.
Michelle Yin@MichelleYinPhD

We asked four LLMs how exposed your job is to AI. They could NOT agree. Management: 15% vs 90% Legal: 10% vs 75% Healthcare: 5% vs 60% Same rubric. Same jobs. Same data. Different AI, completely different answer. New @NBER working paper!

English
0
0
1
10