Arpan Sarkar

10 posts

Arpan Sarkar

Arpan Sarkar

@arpsark

PhD candidate in the Eddy Lab @Harvard interested in protein ML for modeling protein domains and predicting homology

شامل ہوئے Mayıs 2024
243 فالونگ41 فالوورز
Arpan Sarkar ری ٹویٹ کیا
Liana Merk
Liana Merk@BacteriYay·
(1/7) Very excited to share my first PhD preprint on the interactions of two of my favorite mobile genetic elements: phages and group II introns! biorxiv.org/content/10.110…
English
1
7
41
4.6K
Arpan Sarkar ری ٹویٹ کیا
Dr. Jenny Chen
Dr. Jenny Chen@jjennychenn·
Super excited to share my postdoc work investigating how mating and parental behaviors evolve using wild species of mice combined with single nucleus RNA-sequencing of the hypothalamus 🐭🧠🧬! biorxiv.org/content/10.110…
English
3
23
70
12.5K
Arpan Sarkar
Arpan Sarkar@arpsark·
@sokrypton It's also important to look at the homology between any domain in any seq in test to any domain in any seq in training, and I think we do a good job of assessing performance at various domain-level max % identity thresholds in our work on PSALM (biorxiv.org/content/10.110…)
English
0
2
5
311
Sergey Ovchinnikov
Sergey Ovchinnikov@sokrypton·
📈 One standard graph in all bio deep learning papers should be: max similarity to anything in training set vs performance. (Reviewers shouldn't have to guess if there might be overfitting issues).
English
12
26
235
24K
Arpan Sarkar
Arpan Sarkar@arpsark·
We're excited to share PSALM -- keep an eye out for subsequent releases!
English
0
0
2
142
Arpan Sarkar
Arpan Sarkar@arpsark·
PSALM annotates sequences with greater sensitivity and specificity than profile HMM-based methods (on identical training sets). PSALM has a very low residue-level FPR, benefits strongly from additional examples, and annotates clans even at low percent identity to training data
Arpan Sarkar tweet media
English
1
0
3
249
Arpan Sarkar
Arpan Sarkar@arpsark·
PSALM uses a hierarchical approach that considers both individual protein domain families and clans (determined by Pfam). Modeling clans is an interpretable intermediate step that helps identify functional regions that lack clear family-level annotations
Arpan Sarkar tweet media
English
1
0
2
155
Arpan Sarkar
Arpan Sarkar@arpsark·
We propose PSALM, which extends ESM-2 to predict *residue*-level protein sequence annotations. PSALM accurately annotates domain boundaries, multi-domain proteins, and even domains that are currently unannotated in sequence databases
Arpan Sarkar tweet media
English
1
0
2
198