Frances Ding
27 posts

Frances Ding
@FrancesDing
PhD student in EECS, UC Berkeley. ML fairness and interpretability. ML for protein design.
Katılım Haziran 2018
117 Takip Edilen271 Takipçiler

Really excited by the response to our protein model species bias paper! Based on researcher interest and requests, we're releasing our benchmarking framework to make it easy to evaluate bias on any protein model of interest --> github.com/francesding/pr…
Frances Ding@FrancesDing
Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode? In a new paper (w/ @JacobSteinhardt) we find that pLM likelihoods have a strong species bias! 1/
English

@miangoar @JacobSteinhardt Hug et al. is super interesting! Unfortunately their tree did not include all the species we studied, so that's why we created our own to visualize our dataset in particular.
English

@miangoar @JacobSteinhardt Thanks! To create our phylogenetic tree we used timetree.org to get estimates of the time to last common ancestor between each pair of species we studied. Then we used hierarchical clustering to turn those estimates into a full tree.
English

Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode?
In a new paper (w/ @JacobSteinhardt) we find that pLM likelihoods have a strong species bias!
1/

English

@Juli_Bla Thank you! And thanks for the pointer, that's exciting work!
English

@FrancesDing Loving your paper! Especially as it goes hand in hand with: biorxiv.org/content/10.110…
English

@biorxiv_bioinfo You can find a tweetorial overview of the paper here!: x.com/FrancesDing/st…
Frances Ding@FrancesDing
Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode? In a new paper (w/ @JacobSteinhardt) we find that pLM likelihoods have a strong species bias! 1/
English

Protein language models are biased by unequal sequence sampling across the tree of life biorxiv.org/cgi/content/sh… #biorxiv_bioinfo
English

Thanks for reading! For more details, check out the full paper here:
biorxiv.org/content/10.110…
14/14
English