Shaoshi Zhang (@ZShaoshi) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

For years, we've known that running a standard t-test on cross-validation folds violates sample independence. We wanted to see how widespread this issue actually is. The result? 97% of the studies used an invalid statistical test. 🧵👇

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

1

5

11

3.4K

Shaoshi Zhang retweetledi

Tal Golan@TalGolanNeuro·3d

This looks like a straightforward, highly applicable solution to the long-standing problem of valid inference for K-fold CV performance differences. The trade-off is smaller training sets from the split-half step and having to rerun K-fold CV many times.

Thomas Yeo@bttyeo

So we propose SHARP, which involves repeated split-half to generate pairs of independent statistics. There are still 3 unknowns — mean, variance, between-repetition correlation — but the independent pairs provide a 3rd information source to estimate all 3 unknowns. 7/N

English

2

6

33

3.6K

Shaoshi Zhang retweetledi

Imaging Neuroscience@ImagingNeurosci·3d

New paper in Imaging Neuroscience by Ru Kong, B.T. Thomas Yeo, et al: Network-based near-scalp personalized brain stimulation targets doi.org/10.1162/IMAG.a…

English

0

5

10

1.1K

Shaoshi Zhang retweetledi

Thomas Yeo@bttyeo·3d

Here's bonus slides on cross-validation tests, separate from our preprint. Covering: 1. paired (sign-flip) permutation test 2. label-swap permutation test 3. sample-level vs fold-averaged stats 4. a common misapplication of the corrected t-test 5. three bootstrap variants 1/N

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

1

25

41

7.2K

Shaoshi Zhang retweetledi

Francisco Pereira@fpereira·3d

@bttyeo @tianchuzeng @kkli20111 @ZShaoshi @ten_photos This is fantastic! I'm glad to have something to point people to in reviews beyond Demšar, 2006 (and Benavoli 2017 for the Bayesian perspective).

English

1

2

5

509

Shaoshi Zhang retweetledi

Shan Siddiqi @shansiddiqi.bsky.social@shansiddiqi·4d

Apparently I was doing cross-validation wrong. Thanks @bttyeo and @ZShaoshi for helping us fix it.

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

0

3

15

3.8K

Shaoshi Zhang retweetledi

Gary Marcus@GaryMarcus·4d

Biomedical AI may be headed for a replication crisis. (This work below is not about AI-generated reports; it’s about studies of biomedicine that use ML in their methods, and how they are evaluted.)

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

9

5

54

10.8K

Shaoshi Zhang retweetledi

Jake Vogel@_JakeVogel_·4d

Omg I've been commenting about this in manuscript reviews for years. Thank goodness there's actually a paper to cite now!! Thanks @bttyeo !

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

2

3

18

4.7K

Shaoshi Zhang retweetledi

Rajan Kashyap@Rajankashya·4d

Eye opener 👀

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

Norsk

0

3

5

880

Shaoshi Zhang retweetledi

Lijun AN | 安丽军@anlijuncn·4d

Proud to participate in this study! We should keep rigorous in AI-Biomedical research, we also observe some concerning trends in AI+biomarker studies… Congratulations @tianchuzeng Tian Fang and @ZShaoshi

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

1

5

12

1.9K

Shaoshi Zhang retweetledi

Juan (Helen) Zhou@HelenJuanZhou·4d

Important work. Worth to take a look if you are doing AI in biomedical research.

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

0

5

17

2.1K

Shaoshi Zhang retweetledi

Thomas Yeo@bttyeo·4d

Once again, @ten_photos came to the rescue - we prayed to him for a better statistical test for k-shot learning (since the corrected t-test is overly conservative in that scenario), and he answered our prayers with a new test that also covers classical cross-validation.

Thomas Yeo@bttyeo

So we propose SHARP, which involves repeated split-half to generate pairs of independent statistics. There are still 3 unknowns — mean, variance, between-repetition correlation — but the independent pairs provide a 3rd information source to estimate all 3 unknowns. 7/N

English

1

10

17

2.7K

Shaoshi Zhang retweetledi

Sina Mansour L.@Sina_Mansour_L·4d

@bttyeo @tianchuzeng @kkli20111 @ZShaoshi @ten_photos Can't stress this enough 👇 If you use ML to compare predictive models in your research (neuroscience, genetics, you name it), this paper is a must read! 👀 The majority of work in this space (mine included 🙋) misses critical nuances when reporting comparative stats.

English

0

2

8

932

Shaoshi Zhang retweetledi

Dhurandhar B@bornspectator42·4d

My quibble: This is traditional ML *not* AI in the generative sense it means now till eternity. But yeah this is a thing. Metric chasing brought this on. Reviewers reward higher metric values & not well cross-validated results. We've been told AuC<0.8 not worth submitting. 🙄

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

1

3

652

Shaoshi Zhang retweetledi

Crémieux@cremieuxrecueil·4d

Oh my god, almost no biomedical AI papers had proper validations. This feels like field-wide malpractice.

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

7

29

391

53K

Shaoshi Zhang retweetledi

Tianchu@tianchuzeng·4d

So glad this is finally public. Grateful to my wonderful co-authors for the long journey.

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

0

5

9

1.3K

Shaoshi Zhang@ZShaoshi·4d

It’s incredible to see this study come to fruition! Shout out to the amazing @tianchuzeng and @kkli20111 who spearheaded this work and huge thank you to all other coauthors!

English

0

2

81

Shaoshi Zhang@ZShaoshi·4d

For years, we've known that running a standard t-test on cross-validation folds violates sample independence. We wanted to see how widespread this issue actually is. The result? 97% of the studies used an invalid statistical test. 🧵👇

Thomas Yeo@bttyeo

In a meta-analysis of 210 biomedical AI studies that statistically compared models under cross-validation, 97% used invalid statistical tests. Here's our new preprint doi.org/10.64898/2026.… led by @tianchuzeng @kkli20111 @ZShaoshi @ten_photos 1/N

English

1

5

11

3.4K

Shaoshi Zhang retweetledi

Hesheng Liu@hesheng3·6d

Lesion network mapping (LNM) has been powerful in linking symptoms and brain functional circuits, but ongoing debates highlight that it is still hard to isolate symptom-specific effects. We came up with a new method, robust LNM (rLNM) — a unified framework combining null models and selective specificity to reveal reliable, symptom-specific networks from background structure. biorxiv.org/content/10.648… @bttyeo @foxmdphd @ndosenbach @club_scan

English

5

41

113

20.1K

Shaoshi Zhang retweetledi

Nico Dosenbach@ndosenbach·23 Nis

Function & cytoarchitecture don't overlap ... they're orthogonal. Prefrontal cortex is tiled with chains of functional patches mostly known from face processing. Multi-modal parcellations are wrong ... & other insights hidden by group-averaging fMRI data: bsky.app/profile/gordon…

English

2

35

104

13.1K

Shaoshi Zhang retweetledi

Shan Siddiqi @shansiddiqi.bsky.social@shansiddiqi·22 Nis

We often treat functional connectivity as if it’s an established measure of TMS target engagement. I’m not so sure. Check out our new paper by Samantha Baldi in @npp_journal on rsfMRI data from a head-to-head TMS trial. free: rdcu.be/feIya full: lnkd.in/eSdthbSg

Shan Siddiqi @shansiddiqi.bsky.social tweet media

English

2

16

36

4.4K

Shaoshi Zhang

Keşfet