Michelle Yin (@MichelleYinPhD) - Twitter Profili

Michelle Yin@MichelleYinPhD·11 May

Read the full paper here: nber.org/papers/w35110 coauthored with @ClaudiaLPersico @hoa_vuxuan

New Wall Street Journal piece on paper number one of a series. Same task list, four AI models. The share of US occupations flagged "high AI exposure" runs from 14% under one model to 51% under another, on identical content. A 19-fold spread. Thread. wsj.com/tech/ai/ai-mod…

English

0

1

437

Michelle Yin@MichelleYinPhD·11 May

Two more papers and several policy briefs on this topic are coming. But if there is one takeaway right now, it is what I told the Wall Street Journal: “I personally would not rely on just one measure to say, ‘Oh, I should change my job,’ or ‘I should change my kid’s major.’”

Michelle Yin@MichelleYinPhD

I care about this because I have sat across the table from workers whose career decisions depend on what researchers like me put into the world. If those numbers are not credible, we are failing the people we are supposed to serve.

English

0

6

Michelle Yin@MichelleYinPhD·11 May

I care about this because I have sat across the table from workers whose career decisions depend on what researchers like me put into the world. If those numbers are not credible, we are failing the people we are supposed to serve.

Michelle Yin@MichelleYinPhD

AI companies are profit-driven and their models reflect their training data, and their design choices. Researchers who use these proprietary outputs as scientific instruments have an obligation to verify what they produce before families, communities, and governments act on it.

English

0

12

Michelle Yin@MichelleYinPhD·11 May

AI companies are profit-driven and their models reflect their training data, and their design choices. Researchers who use these proprietary outputs as scientific instruments have an obligation to verify what they produce before families, communities, and governments act on it.

Michelle Yin@MichelleYinPhD

When I discovered the instability in AI exposure scores, I was working with workforce programs in Maine and Virginia trying to help real people navigate a changing labor market, and I realized the numbers we were relying on gave fundamentally different answers!

English

0

13

Michelle Yin@MichelleYinPhD·11 May

When I discovered the instability in AI exposure scores, I was working with workforce programs in Maine and Virginia trying to help real people navigate a changing labor market, and I realized the numbers we were relying on gave fundamentally different answers!

Michelle Yin@MichelleYinPhD

I want to share why this research is personal to me, not just professional. I came to this country as an immigrant and built my career as a labor economist because I believe how we measure work shapes how we value workers.

English

0

15

Michelle Yin@MichelleYinPhD·11 May

I want to share why this research is personal to me, not just professional. I came to this country as an immigrant and built my career as a labor economist because I believe how we measure work shapes how we value workers.

Michelle Yin@MichelleYinPhD

New Wall Street Journal piece on paper number one of a series. Same task list, four AI models. The share of US occupations flagged "high AI exposure" runs from 14% under one model to 51% under another, on identical content. A 19-fold spread. Thread. wsj.com/tech/ai/ai-mod…

English

0

17

Michelle Yin@MichelleYinPhD·11 May

New Wall Street Journal piece on paper number one of a series. Same task list, four AI models. The share of US occupations flagged "high AI exposure" runs from 14% under one model to 51% under another, on identical content. A 19-fold spread. Thread. wsj.com/tech/ai/ai-mod…

English

0

461

Michelle Yin@MichelleYinPhD·29 Nis

@nberpubs @hoa_vuxuan @ClaudiaLPersico Thank you! Adding my handle: @MichelleYinPhD

English

0

8

NBER@nberpubs·29 Nis

Using large language models to self-assess occupational exposure is prone to bias, from Michelle Yin, @hoa_vuxuan, and @ClaudiaLPersico nber.org/papers/w35110

English

2

13

27

6.9K

Michelle Yin@MichelleYinPhD·29 Nis

@nberpubs @ClaudiaLPersico @hoa_vuxuan Thank you for the shoutouts! Adding my handle here @MichelleYinPhD

English

0

5

Michelle Yin@MichelleYinPhD·29 Nis

@arindube @ClaudiaLPersico Also, the rubric treats each task as independent. Work is not independent. It is embedded in organizations, norms, and power structures. That is partly why we argue the field needs multi-model sensitivity as a floor, not a ceiling.

English

0

1

6

Arin Dube@arindube·29 Nis

Very interesting work. My perspective: LLMs can't possibly answer that question because there are unknown unknowns. And it's not just because we don't know the technological trajectory of LLMs. It's because we have under-appreciated the economic and sociological foundation of work. And how that mediates AI use.

Claudia Persico (@claudiapersico.bsky.social)@ClaudiaLPersico

We asked four LLMs how exposed your job is to AI. They could NOT agree. Management: 15% vs 90% Legal: 10% vs 75% Healthcare: 5% vs 60% Same rubric. Same jobs. Same data. Different AI, completely different answer. New @nberpubs working paper with @MichelleYinPhD & @hoa_vuxuan! 1/

English

4

13

63

12.7K

Michelle Yin@MichelleYinPhD·29 Nis

@arindube @ClaudiaLPersico Great point and thank you! The deeper issue is sociological. How work is organized, who adopts AI and why, which tasks get restructured versus automated, none of that is visible to a model rating tasks against a rubric.

English

0

1

5

Michelle Yin@MichelleYinPhD·29 Nis

@karthiktadepall @ClaudiaLPersico @nberpubs @hoa_vuxuan (2) Even if part of the shift is real capability expansion, the scores enter the downstream literature as fixed occupational characteristics. If the measure moves with the technology, it’s not a stable treatment variable which is the core problem for causal inference.

English

0

1

4

Michelle Yin@MichelleYinPhD·29 Nis

@karthiktadepall @ClaudiaLPersico @nberpubs @hoa_vuxuan Fair point. Two reasons it’s not just capability growth: (1) the three 2026 models, tested in the same window, still disagree with each other at 57% agreement; that’s cross-sectional, not temporal.

English

1

0

2

7

Claudia Persico (@claudiapersico.bsky.social)@ClaudiaLPersico·28 Nis

We asked four LLMs how exposed your job is to AI. They could NOT agree. Management: 15% vs 90% Legal: 10% vs 75% Healthcare: 5% vs 60% Same rubric. Same jobs. Same data. Different AI, completely different answer. New @nberpubs working paper with @MichelleYinPhD & @hoa_vuxuan! 1/

Claudia Persico (@claudiapersico.bsky.social) tweet media

English

6

29

82

22.8K

Michelle Yin@MichelleYinPhD·29 Nis

@robseamans @ClaudiaLPersico @soumitrashukla9 @nberpubs @hoa_vuxuan @marthagimbel Def compliment each other. Great piece. We find the same pattern holds even within a single rubric, just swapping the rating model produces a 3.6x divergence in exposure scores on identical tasks. The instability isn’t only across metrics; it’s within them.

English

0

1

9

Rob Seamans@robseamans·29 Nis

@ClaudiaLPersico @soumitrashukla9 @nberpubs @MichelleYinPhD @hoa_vuxuan Yale budget lab has a recent blog post that is a nice complement to this. cc @marthagimbel budgetlab.yale.edu/research/labor…

English

1

11

518

Michelle Yin@MichelleYinPhD·28 Nis

Paper (free): nber.org/papers/w35110 With @hoa_vuxuan and @ClaudiaLPersico

Michelle Yin@MichelleYinPhD

Then we asked: does this matter for the conclusions economists are actually drawing? We plugged each model’s scores into a standard labor economics analysis. With one model’s scores: significant job losses. With another’s: no detectable effect. The entire finding flipped.

English

0

1

41

Michelle Yin@MichelleYinPhD·28 Nis

Two things are going wrong. First: each AI has a different calibration. Second: a feedback loop. Tasks where AI is advancing fastest generate the most training data, so newer models rate those tasks as more exposed. @hoa_vuxuan @ClaudiaLPersico

Michelle Yin@MichelleYinPhD

Here is every occupation, all 95, sorted by how much the four models disagree. Top of the chart: 87 percentage points of disagreement for a single occupation. One AI sees the job as almost fully exposed. Another sees it as barely exposed. Find your job.

English

0

1

21

Michelle Yin@MichelleYinPhD·28 Nis

Here is every occupation, all 95, sorted by how much the four models disagree. Top of the chart: 87 percentage points of disagreement for a single occupation. One AI sees the job as almost fully exposed. Another sees it as barely exposed. Find your job.

Michelle Yin@MichelleYinPhD

Then we asked: does this matter for the conclusions economists are actually drawing? We plugged each model’s scores into a standard labor economics analysis. With one model’s scores: significant job losses. With another’s: no detectable effect. The entire finding flipped.

English

0

1

30

Michelle Yin@MichelleYinPhD·28 Nis

These scores are not academic exercises. The ILO uses them. The IMF uses them. The BLS uses them. Acemoglu (2025), Brynjolfsson et al. (2025), and Eisfeldt et al. (2023) are built on them. Nobody was checking whether a different model would give a different answer.

Michelle Yin@MichelleYinPhD

Then we asked: does this matter for the conclusions economists are actually drawing? We plugged each model’s scores into a standard labor economics analysis. With one model’s scores: significant job losses. With another’s: no detectable effect. The entire finding flipped.

English

0

1

5

Michelle Yin@MichelleYinPhD·28 Nis

Then we asked: does this matter for the conclusions economists are actually drawing? We plugged each model’s scores into a standard labor economics analysis. With one model’s scores: significant job losses. With another’s: no detectable effect. The entire finding flipped.

Michelle Yin@MichelleYinPhD

We replicated the most widely used AI exposure rubric (Eloundou et al. 2024) with four frontier models: GPT-4, ChatGPT-5, Gemini 2.5, and Claude 4.5. Same instructions. Same O*NET task data. Same pipeline. Mean exposure ranged from 14% to 51%. A 3.6x gap on identical jobs.

English

0

1

60

Michelle Yin@MichelleYinPhD·28 Nis

We replicated the most widely used AI exposure rubric (Eloundou et al. 2024) with four frontier models: GPT-4, ChatGPT-5, Gemini 2.5, and Claude 4.5. Same instructions. Same O*NET task data. Same pipeline. Mean exposure ranged from 14% to 51%. A 3.6x gap on identical jobs.

Michelle Yin@MichelleYinPhD

We asked four LLMs how exposed your job is to AI. They could NOT agree. Management: 15% vs 90% Legal: 10% vs 75% Healthcare: 5% vs 60% Same rubric. Same jobs. Same data. Different AI, completely different answer. New @NBER working paper!

English

0

1

10

Michelle Yin

Keşfet