Michael (Jin Sub) Lee (@mjslee0921) - Twitter Profili

Go design binders now!!

We’re excited to share the full binder design protocol. Check it out here: github.com/Biohub/esm/blo…. The notebook includes support for @modal to easily scale up binder generation. Give it a try and let us know how it works! You can read more about ESMFold2, ESMC, ESM Atlas, and the full results in the paper here: biohub.ai/papers/esm_pro….

English

0

1

43

Michael (Jin Sub) Lee@mjslee0921·28 May

@001TMF @alexrives Unfortunately we were unable to access Protenix-v2 weights from their official repository - we would love to add this baseline when it becomes openly available!

English

1

0

1

65

Tristan Farmer@001TMF·27 May

@alexrives Why protenix-v1 and not protenix-v2?

English

1

0

2

248

Alex Rives@alexrives·27 May

Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.

GIF

English

74

446

1.6K

591.5K

Michael (Jin Sub) Lee retweetledi

biohub@biohub·27 May

🎙️@alexrives on 'AI for Science' with @latentspacepod breaking down our world model of protein biology: ESMFold2, ESMC, and ESM Atlas. latent.space/p/esmfold2

English

1

8

35

14.3K

Michael (Jin Sub) Lee retweetledi

biohub@biohub·27 May

Proteins are the machinery of life. Scientists have cataloged billions of protein sequences—but their biology is still mostly unknown. Today we're releasing a world model of protein biology: a scientific engine for prediction, design, and discovery that consists of ESMFold2, ESMC, and ESM Atlas. Together, they're helping to open up a new way for researchers to design proteins and speed up scientific discovery. Our mission is to cure or prevent disease. To do that, we need to accelerate science. That's why we're releasing all three openly. bit.ly/3PGf1dk

English

9

61

338

69.9K

Michael (Jin Sub) Lee@mjslee0921·27 May

So happy to see this finally out! It was a wild ride 😆 Everything is open-sourced, so try it out and let me know how it works :)

Alex Rives@alexrives

Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.

English

1

0

10

1K

Michael (Jin Sub) Lee retweetledi

bioRxiv Biophysics@biorxiv_biophys·2 Ara

Making invisible excited state protein structures visible by combining NMR and machine learning biorxiv.org/content/10.110… #biorxiv_biophys

English

1

3

8

2.1K

Michael (Jin Sub) Lee retweetledi

Biology+AI Daily@BiologyAIDaily·1 Ağu

Design of peptides with non-canonical amino acids using flow matching 1. This groundbreaking study introduces NCFlow, a novel flow-based generative model that can incorporate any arbitrary non-canonical amino acid (ncAA) into a given protein, significantly expanding the chemical space for peptide engineering and drug discovery. 2. NCFlow addresses the limitations of traditional protein design tools by pretraining on millions of small molecule structures and protein-ligand complexes, and then finetuning on native non-canonical amino acids found in the Protein Data Bank. This approach enhances the model’s ability to accurately predict the structure of unseen non-canonical amino acids. 3. The study presents an innovative peptide design pipeline that uses deep mutational scanning and a combination of deep learning-based and molecular dynamics-based alchemical binding free energy calculations to identify improved peptide variants. This method has been validated on four protein-peptide complex test cases, demonstrating significant improvements in binding affinity. 4. NCFlow outperforms AlphaFold3-based methods in structure prediction of non-canonical amino acids, showcasing its potential for integrating into existing protein design platforms to enhance properties beyond what is achievable with standard amino acids. 5. The authors highlight the scarcity of ncAA data in the Protein Data Bank and propose effective strategies to augment training data, including the use of small molecule structures and protein-ligand complexes. This approach not only improves model performance but also generalizes to ncAAs not found in the PDB. 6. The study demonstrates that incorporating non-canonical amino acids can significantly improve binding affinity by up to -7.0 kcal/mol, highlighting the potential of NCFlow for developing novel peptide drugs with enhanced properties. @mjslee0921 📜Paper: biorxiv.org/content/10.110… #ProteinDesign #NonCanonicalAminoAcids #PeptideEngineering #DrugDiscovery #DeepLearning #MolecularDynamics #FlowMatching

English

1

10

58

2.5K

Michael (Jin Sub) Lee

Keşfet