Michael (Jin Sub) Lee

7 posts

Michael (Jin Sub) Lee

Michael (Jin Sub) Lee

@mjslee0921

Research @ Biohub | prev. @EvoScaleAI @UofT @ETH_BSSE @yonsei_u

Toronto, Ontario Katılım Ekim 2021
121 Takip Edilen53 Takipçiler
Michael (Jin Sub) Lee
Michael (Jin Sub) Lee@mjslee0921·
@001TMF @alexrives Unfortunately we were unable to access Protenix-v2 weights from their official repository - we would love to add this baseline when it becomes openly available!
English
1
0
1
65
Alex Rives
Alex Rives@alexrives·
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
GIF
English
74
446
1.6K
591.5K
Michael (Jin Sub) Lee retweetledi
biohub
biohub@biohub·
Proteins are the machinery of life. Scientists have cataloged billions of protein sequences—but their biology is still mostly unknown. Today we're releasing a world model of protein biology: a scientific engine for prediction, design, and discovery that consists of ESMFold2, ESMC, and ESM Atlas. Together, they're helping to open up a new way for researchers to design proteins and speed up scientific discovery. Our mission is to cure or prevent disease. To do that, we need to accelerate science. That's why we're releasing all three openly. bit.ly/3PGf1dk
English
9
61
338
69.9K
Michael (Jin Sub) Lee
Michael (Jin Sub) Lee@mjslee0921·
So happy to see this finally out! It was a wild ride 😆 Everything is open-sourced, so try it out and let me know how it works :)
Alex Rives@alexrives

Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.

English
1
0
10
1K
Michael (Jin Sub) Lee retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Design of peptides with non-canonical amino acids using flow matching 1. This groundbreaking study introduces NCFlow, a novel flow-based generative model that can incorporate any arbitrary non-canonical amino acid (ncAA) into a given protein, significantly expanding the chemical space for peptide engineering and drug discovery. 2. NCFlow addresses the limitations of traditional protein design tools by pretraining on millions of small molecule structures and protein-ligand complexes, and then finetuning on native non-canonical amino acids found in the Protein Data Bank. This approach enhances the model’s ability to accurately predict the structure of unseen non-canonical amino acids. 3. The study presents an innovative peptide design pipeline that uses deep mutational scanning and a combination of deep learning-based and molecular dynamics-based alchemical binding free energy calculations to identify improved peptide variants. This method has been validated on four protein-peptide complex test cases, demonstrating significant improvements in binding affinity. 4. NCFlow outperforms AlphaFold3-based methods in structure prediction of non-canonical amino acids, showcasing its potential for integrating into existing protein design platforms to enhance properties beyond what is achievable with standard amino acids. 5. The authors highlight the scarcity of ncAA data in the Protein Data Bank and propose effective strategies to augment training data, including the use of small molecule structures and protein-ligand complexes. This approach not only improves model performance but also generalizes to ncAAs not found in the PDB. 6. The study demonstrates that incorporating non-canonical amino acids can significantly improve binding affinity by up to -7.0 kcal/mol, highlighting the potential of NCFlow for developing novel peptide drugs with enhanced properties. @mjslee0921 📜Paper: biorxiv.org/content/10.110… #ProteinDesign #NonCanonicalAminoAcids #PeptideEngineering #DrugDiscovery #DeepLearning #MolecularDynamics #FlowMatching
Biology+AI Daily tweet media
English
1
10
58
2.5K