Frances Ding (@FrancesDing) - Twitter Profili | Zamantika Mersobahis Locabet

Frances Ding@FrancesDing·8 Tem

Our framework lets you: ✅ Benchmark any protein sequence model with just a few lines of code ✅ Get interpretable Elo ratings showing which species your model favors ✅ Compare against baselines in our paper ✅ Add your model results to expand the benchmark Happy benchmarking!

English

0

97

Frances Ding@FrancesDing·8 Tem

Really excited by the response to our protein model species bias paper! Based on researcher interest and requests, we're releasing our benchmarking framework to make it easy to evaluate bias on any protein model of interest --> github.com/francesding/pr…

Frances Ding@FrancesDing

Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode? In a new paper (w/ @JacobSteinhardt) we find that pLM likelihoods have a strong species bias! 1/

English

1

0

12

722

Frances Ding@FrancesDing·3 Nis

@miangoar @JacobSteinhardt Hug et al. is super interesting! Unfortunately their tree did not include all the species we studied, so that's why we created our own to visualize our dataset in particular.

English

0

1

26

Frances Ding@FrancesDing·3 Nis

@miangoar @JacobSteinhardt Thanks! To create our phylogenetic tree we used timetree.org to get estimates of the time to last common ancestor between each pair of species we studied. Then we used hierarchical clustering to turn those estimates into a full tree.

English

1

0

1

62

Frances Ding@FrancesDing·13 Mar

Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode? In a new paper (w/ @JacobSteinhardt) we find that pLM likelihoods have a strong species bias! 1/

English

9

56

237

40.7K

Frances Ding@FrancesDing·13 Mar

@Juli_Bla Thank you! And thanks for the pointer, that's exciting work!

English

0

235

Jude@Juli_Bla·13 Mar

@FrancesDing Loving your paper! Especially as it goes hand in hand with: biorxiv.org/content/10.110…

English

1

0

2

48

Frances Ding@FrancesDing·13 Mar

@biorxiv_bioinfo You can find a tweetorial overview of the paper here!: x.com/FrancesDing/st…

Frances Ding@FrancesDing

Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode? In a new paper (w/ @JacobSteinhardt) we find that pLM likelihoods have a strong species bias! 1/

English

0

3

146

bioRxiv Bioinfo@biorxiv_bioinfo·12 Mar

Protein language models are biased by unequal sequence sampling across the tree of life biorxiv.org/cgi/content/sh… #biorxiv_bioinfo

English

2

23

85

14.4K

Frances Ding@FrancesDing·13 Mar

Thanks for reading! For more details, check out the full paper here: biorxiv.org/content/10.110… 14/14

English

1

9

1K

Frances Ding@FrancesDing·13 Mar

More broadly, how should we structure and curate databases of biological data, which are not only repositories of knowledge, but now serve to define distributions over data? 13/

English

1

0

4

915

Frances Ding

Keşfet