Judith Bernett

26 posts

Judith Bernett

Judith Bernett

@judith_bernett

Bioinformatics PhD student at TUM, find me on bluesky: https://t.co/tJTwov7PMr

Munich, Germany Katılım Kasım 2015
103 Takip Edilen91 Takipçiler
Judith Bernett
Judith Bernett@judith_bernett·
Very proud to present our latest work: 7 guiding questions to avoid data leakage in biological machine learning applications ✨🔍 We hope that reflecting on these questions helps researchers to identify issues or shortcuts leading to overly optimistic performance estimates. 📈🧑‍🔬
Nature Methods@naturemethods

A Perspective from @itisalist @judith_bernett @RomanJoeres @ok55991 @FloHasee @dg_grimm @bit_tumcs & @dbblumenthal discusses the issue of data leakage in machine learning models and presents 7 questions to identify and avoid problems as a result. nature.com/articles/s4159…

English
0
3
7
499
Judith Bernett
Judith Bernett@judith_bernett·
So happy to announce that my paper with @itisalist and @dbblumenthal "Cracking the black box of deep sequence-based protein–protein interaction prediction" is finally published at Briefings in Bioinformatics doi.org/10.1093/bib/bb… ! So what is it about? 1/13 🧵
Judith Bernett tweet media
English
4
5
27
1.9K
Judith Bernett
Judith Bernett@judith_bernett·
@lipido @itisalist @dbblumenthal @hlfernandez Thank you, that's great to hear! In our tests, Topsy-Turvy was the method with the highest performance on our gold standard dataset. Since then, some models have been published that beat its performance, e.g., 10.1101/2023.11.09.566187 or TUnA (10.1101/2024.02.19.581072, 65% Acc)
English
2
0
2
47
Judith Bernett
Judith Bernett@judith_bernett·
What is the takeaway? 📈High acc. can be reached with simple methods for known proteins -> Know your prediction task and try baselines first! 🔮Current seq.-based methods aren't made for predicting the "dark interactome" ✅We made a leakage-free dataset for future development
English
0
0
0
140
Judith Bernett
Judith Bernett@judith_bernett·
12/13 🧵Because this strategy rendered most datasets too small for proper DL, we designed a larger gold standard training (163,192)/val (59,260)/test (52,048) dataset using the same partitioning strategy. The best method achieved 56% accuracy on it. doi.org/10.6084/m9.fig…
English
1
0
0
143
Judith Bernett
Judith Bernett@judith_bernett·
Happy and excited to finally share this project! We show conclusively that high accuracies of deep learning-based PPI prediction models are exclusively due to data leakage via sequence similarities and node degree information. biorxiv.org/content/10.110… @itisalist @dbblumenthal
Judith Bernett tweet media
English
2
17
52
8.1K