Yu Zhang

122 posts

Yu Zhang

@October

Postdoctoral Research Fellow @ Collins Lab, @wyssinstitute, @broadinstitute & @MIT_IMES | AI in drug discovery and synthetic biology

Cambridge, MA Inscrit le Haziran 2008

563 Abonnements160 Abonnés

Yu Zhang retweeté

Pushmeet Kohli@pushmeet·17 Mar

At @GoogleDeepMind, we believe AI is the ultimate catalyst for science. 🧬 The best example of this has been the AlphaFold database (AFDB) of protein structure predictions which has been used free of cost by more than 3.3 millions researchers across the world! Today, in collaboration with @emblebi, @Nvidia and @SeoulNatlUni, we are expanding the database by adding millions of AI-predicted protein complex structures to the AlphaFold Database. To maximise global health impact, we’ve prioritised proteins that are important for understanding human health and disease, including homodimers from 20 of the most studied organisms, including humans, as well as the @WHO’S bacterial priority pathogens list. Read more here: embl.org/news/science-t…

English

403

2.5K

163.2K

Yu Zhang@October·20 Ara

mols2img: paste SMILES/CXSMILES or upload a CSV → instant RDKit structure sheets you can print/save as PDF. Live app: yuzhang-io.github.io/mols2img/

English

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·22 Eyl

AI-Guided Design of Cyclic Peptide Binders Targeting TREM2 Using CycleRFdiffusion and Experimental Validation 1. A novel study presents an AI-driven pipeline for designing cyclic peptide binders targeting TREM2, a key receptor in neurodegenerative diseases like Alzheimer's. The integration of CycleRFdiffusion, ProteinMPNN, and HighFold enables the generation and screening of 1,500 peptide–target complexes, identifying four promising candidates. 2. The study demonstrates the first proof-of-concept for AI-designed cyclic peptides capable of binding TREM2. TP4, one of the designed peptides, consistently showed binding activity in spectral shift, microscale thermophoresis, and surface plasmon resonance assays, with a sub-millimolar binding affinity. 3. TP4 exhibited favorable pharmacokinetic properties, including stability in human plasma and simulated intestinal fluid, moderate metabolic stability in rat liver microsomes, and measurable permeability across Caco-2 cells. These characteristics support its potential as a lead scaffold for further optimization. 4. The innovative CycleRFdiffusion model, tailored for cyclic peptide backbone generation, addresses the unique challenges of designing cyclic peptides. This model, combined with ProteinMPNN for sequence design and HighFold for structural prediction, forms a robust platform for cyclic peptide discovery. 5. The study highlights the potential of AI-driven design to expand therapeutic modalities beyond antibodies and small molecules, offering a systematic approach to explore cyclic peptides as immunomodulators for CNS targets. This work paves the way for the development of novel treatments for neurodegenerative diseases. 📜Paper: biorxiv.org/content/10.110… #AIDrivenDesign #CyclicPeptides #TREM2 #AlzheimersDisease #NeurodegenerativeDisease #Pharmacokinetics #BiophysicalValidation

English

6.4K

Yu Zhang retweeté

NVIDIA Healthcare@NVIDIAHealth·9 Eyl

🔥nvMolKit landed today🔥 Morgan Fingerprinting, Tanimoto/Cosine similarity and MMFF geometry optimization and conformer generation on GPU, 10-3000x faster. Screen millions of SMILES before coffee & upsize your QSAR pipelines. 🚀 Which dataset operation will you accelerate first? #GPU #cheminformatics #drugdiscovery

English

155

73.1K

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·20 Ağu

LABind: Identifying Protein Binding Ligand-Aware Sites via Learning Interactions Between Ligand and Protein @NatureComms 1. LABind is a novel structure-based method that predicts protein-ligand binding sites in a ligand-aware manner, utilizing a graph transformer and cross-attention mechanism to capture binding patterns and learn distinct interactions between proteins and ligands. This approach significantly improves the prediction accuracy for both seen and unseen ligands compared to existing methods. 2. The study addresses the limitations of traditional experimental methods and existing computational approaches by proposing a unified model that incorporates ligand information explicitly. LABind outperforms other advanced methods on multiple benchmark datasets, demonstrating its superior ability to generalize to new ligands and maintain robust performance even with predicted protein structures. 3. LABind’s architecture includes a graph converter module that encodes protein structures into graphs, capturing spatial features essential for binding site prediction. The cross-attention mechanism allows the model to effectively integrate ligand properties, enhancing its ability to distinguish between different ligands and their corresponding binding sites. 4. The application of LABind extends beyond binding site prediction to tasks like binding site center localization and molecular docking. It shows strong potential in improving docking accuracy and can be applied to proteins without experimentally determined structures by leveraging predicted structures from tools like ESMFold. 5. The study includes comprehensive experiments and ablation studies that validate the importance of each component in LABind, such as the protein representation and ligand features. The visualization of residue representations highlights how LABind captures crucial information about protein-ligand interactions, leading to more accurate predictions. 6. LABind demonstrates practical applicability by accurately predicting binding sites for the SARS-CoV-2 NSP3 macrodomain with unseen ligands, showcasing its potential in real-world scenarios. This method provides a valuable tool for understanding protein functions and aiding drug design efforts. 📜Paper: nature.com/articles/s4146… 💻Code: github.com/ljquanlab/LABi… #ProteinLigandBinding #ComputationalBiology #MachineLearning #DrugDiscovery #Bioinformatics

English

3.9K

Yu Zhang retweeté

Cell@CellCellPress·16 Ağu

Now online! A generative deep learning approach to de novo antibiotic design dlvr.it/TMWc69

English

7.1K

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·15 Ağu

A Generative Deep Learning Approach to De Novo Antibiotic Design @CellCellPress 1. A new generative AI framework has been developed for designing de novo antibiotics, yielding lead compounds with selective antibacterial activity, distinct mechanisms of action, and in vivo efficacy against multidrug-resistant strains of N. gonorrhoeae and S. aureus. This innovative approach could significantly aid in combating the antimicrobial resistance crisis. 2. The study utilized a fragment-based method to screen over 10^7 chemical fragments in silico against N. gonorrhoeae or S. aureus, expanding promising fragments using genetic algorithms and variational autoencoders. Additionally, an unconstrained de novo compound generation approach was employed, showcasing the potential of AI in exploring vast chemical spaces. 3. Out of 24 synthesized compounds, seven demonstrated selective antibacterial activity. Two lead compounds, NG1 and DN1, exhibited unique modes of action and efficacy against multidrug-resistant strains in mouse models. NG1 showed bactericidal efficacy against N. gonorrhoeae, while DN1 displayed broad-spectrum activity against Gram-positive bacteria. 4. The mechanism of action for NG1 was investigated, revealing that it may act by decreasing membrane fluidity and compromising membrane integrity in N. gonorrhoeae. This was supported by experimental results showing increased membrane permeability and morphological changes in treated cells. Furthermore, NG1 exhibited low toxicity and was effective in a mouse model of N. gonorrhoeae vaginal infection. 5. The study also explored the design of compounds active against S. aureus using a similar fragment-based approach. One of the synthesized compounds, EN1, showed activity against both methicillin-susceptible and methicillin-resistant S. aureus strains. Additionally, the de novo design approach generated compounds without the need for specific fragments as starting points, further expanding the chemical space explored. 6. The generative AI models used in this study demonstrated the ability to produce realistic and synthesizable compounds with promising antibacterial properties. The platform enables the efficient exploration of uncharted regions of chemical space, providing a valuable tool for antibiotic discovery. Future work could focus on optimizing the lead compounds and exploring additional chemical starting points to enhance the diversity and efficacy of generated antibiotics. 📜Paper: cell.com/cell/abstract/… #AntibioticDesign #GenerativeAI #DeepLearning #AntimicrobialResistance #DrugDiscovery

English

2.8K

Yu Zhang retweeté

Eric Topol@EricTopol·14 Ağu

An important application of generative A.I. is facilitating discovery to override antimicrobial resistance with newly designed antibiotics, as demonstrated here for in vivo effectiveness vs S. aureus and N. gonorrhoeae @MIT @broadinstitute @wyssinstitute @MITdeptofBE cell.com/cell/fulltext/…

English

104

18.5K

Yu Zhang retweeté

James Pethokoukis ⏩️⤴️@JimPethokoukis·15 Ağu

With help from artificial intelligence, MIT researchers have designed novel antibiotics that can combat two hard-to-treat infections: drug-resistant Neisseria gonorrhoeae and multi-drug-resistant Staphylococcus aureus (MRSA). news.mit.edu/2025/using-gen…

English

2.4K

Yu Zhang retweeté

Eric Betzig@Eric_Betzig·14 Ağu

MOSAIC lattice light sheet xy projection over 2+ hrs in human retinal pigment epithelial cells of endoplasmic reticulum remodeling (cyan) and transport of vesicles (yellow) containing β4-galactosyltransferase....

English

498

28.9K

Yu Zhang retweeté

Deniz Kavi@kavi_deniz·14 Ağu

New Experimentally Validated De Novo Peptide Design Tool, no structure needed! PepMLM is now fully open-source, try it out on @tamarindbio today. 1/🧵

English

168

11.6K

Yu Zhang retweeté

Phare Bio@PhareBio·14 Ağu

AI vs. superbugs: who wins? A new Cell study shows how #AI can design new drugs to outsmart deadly bacteria. “We’re building what we think is the most novel and robust pipeline of antibiotics in the world,” says Dr. @AkhilaKosaraju, Phare Bio CEO. 🔗 spectrum.ieee.org/ai-drug-design…

English

118

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·7 Ağu

MD-LLM-1: A Large Language Model for Molecular Dynamics 1. Researchers have developed MD-LLM-1, a novel framework that leverages large language models (LLMs) to simulate molecular dynamics. This approach enables the prediction of protein conformational states not seen during training, offering a powerful new tool for exploring protein dynamics with reduced computational resources. 2. MD-LLM-1 is based on the Mistral 7B architecture, fine-tuned using Low-Rank Adaptation (LoRA). The model uses a unique tokenization scheme called FoldToken to represent protein structures as sequences of discrete numerical tokens, which can be processed by LLMs. This method allows the model to learn the temporal evolution of protein conformations. 3. The study demonstrates that MD-LLM-1 can discover low-population states in proteins such as T4 lysozyme and Mad2. For example, training on the native state of T4 lysozyme enabled the model to predict its excited state, and vice versa. This bidirectional cross-state discovery highlights the model’s ability to learn fundamental conformational relationships. 4. The model’s ability to bypass kinetic barriers and explore conformational landscapes that are difficult to sample with traditional MD simulations is a significant innovation. This capability could accelerate the study of long-timescale processes and rare conformational transitions in proteins. 5. MD-LLM-1 represents a step towards integrating deep learning with molecular dynamics. While the current implementation is system-specific, future work could focus on developing more generalizable models trained on diverse protein datasets, potentially enabling the prediction of complex free energy landscapes. 📜Paper: arxiv.org/abs/2508.03709 #MolecularDynamics #LargeLanguageModels #ProteinDynamics #DeepLearning #ComputationalBiology

English

126

9.8K

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·9 Ağu

Embedding is (Almost) All You Need: Retrieval-Augmented Inference for Generalizable Genomic Prediction Tasks 1. A new study explores the use of embedding-based pipelines for genomic prediction tasks, challenging the necessity of task-specific fine-tuning of large pre-trained DNA language models. The research shows that fixed embeddings combined with lightweight classifiers can achieve competitive performance while significantly reducing inference time and carbon emissions. 2. The study demonstrates that retrieval-augmented methods, which leverage fixed transformer embeddings and lightweight sequence features, outperform fine-tuning in several tasks with different data distributions. This suggests that embedding extraction is a strong baseline and a more generalizable alternative to fine-tuning for diverse genomic contexts. 3. The proposed framework combines pretrained transformer embeddings (DNABERT-2, Nucleotide Transformer, HyenaDNA) with biologically motivated features such as GC content, z-curve components, and AT/GC ratio. It uses FAISS L2 indexing for fast k-nearest-neighbor retrieval and weighted voting for classification, eliminating the need for full model fine-tuning. 4. In independent evaluations, embedding-based methods achieved up to 10× higher carbon efficiency while maintaining competitive accuracy. For example, in enhancer classification, embedding with zCurve using HyenaDNA achieved 0.68 accuracy with an 88% reduction in inference time and over 8× lower carbon emissions compared to fine-tuning. 5. The study also highlights the generalizability of the embedding-based approach across nine genomic tasks, showing consistent performance without task-specific training. This suggests that pre-trained embeddings capture discriminative features that are transferable across different datasets and tasks. 6. The research concludes that embedding-based pipelines offer a sustainable and efficient alternative to fine-tuning for genomic sequence classification. Future work may explore hybrid models that combine task-specific fine-tuning with fixed embeddings to balance adaptability and efficiency. 📜Paper: arxiv.org/abs/2508.04757… #Genomics #AI #GreenAI #Bioinformatics #Embeddings #GenomicPrediction

English

2.3K

Yu Zhang retweeté

Steven Salzberg 💙💛@StevenSalzberg1·4 Ağu

Uh oh: deep learning models in genomics lose badly to very simple linear models. Could they have been over-hyped? (new paper by @s_anders_m @const_ae and Wolfgang Huber) nature.com/articles/s4159…

English

363

26.6K

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·26 Tem

Generative Design of High-Affinity Peptides Using BindCraft 1. A groundbreaking study leverages BindCraft, an AlphaFold-based platform, to design high-affinity peptide ligands directly from protein structures. This approach marks a significant step forward in de novo peptide design, offering a powerful alternative to traditional screening methods. 2. The study demonstrates BindCraft’s ability to generate functional peptides for challenging protein targets. For MDM2, a well-known oncoprotein, BindCraft produced 70 unique peptides, with 15 synthesized and 7 showing nanomolar binding affinities. This success rate is remarkable, especially considering the complexity of the target. 3. BindCraft’s structural predictions enable rational optimization of peptides. In a notable example, the researchers designed a stapled variant of a WDR5 binder, achieving a 6-fold improvement in potency. This highlights the platform’s potential for enhancing peptide stability and binding affinity through structure-guided modifications. 4. The study also explores BindCraft’s performance on WDR5, a protein with two distinct binding sites. While no validated hits were found for the WIN site, six peptides showed sub-micromolar binding affinities for the MYC site. This underscores BindCraft’s versatility in targeting different protein interfaces. 5. Unlike traditional methods, BindCraft allows users to define specific binding sites, enabling targeted inhibitor design rather than relying on random selection. This feature is particularly valuable for researchers with limited screening capacity, as it increases the likelihood of identifying high-affinity binders with fewer candidates. 6. The study concludes that BindCraft is a mature and accessible tool for peptide-based drug discovery. Despite some limitations, its high hit rates and ease of use make it a promising alternative to display technologies like phage or mRNA display, especially for initial hit discovery. 📜Paper: biorxiv.org/content/10.110… #BindCraft #PeptideDesign #AlphaFold #DrugDiscovery #StructuralBiology #DeNovoDesign

English

116

6.4K

Yu Zhang retweeté

Marios Georgakis@MariosGeorgakis·18 Tem

While UK Biobank enabled access to population-based proteomics at scale, most omics studies in disease-focused cohorts still suffer from small sample sizes. The Global Neurodegeneration Proteomics Consortium brought together 35,000 serum, plasma, and CSF samples from neurodegeneration-focused cohorts. They released 4 papers at @NatureMedicine earlier this week👇

English

189

21.5K

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·20 Ara

Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation 1. This paper introduces E3WAE, a novel E(3)-equivariant Wasserstein autoencoder designed for explicit control in 3D molecule generation. It factors the latent space into two disentangled aspects: molecular properties and structural context. 2. The standout feature of E3WAE is its dual capability: property-targeting generation, enabling precise design of molecular properties, and context-preserving generation, which allows optimization of certain properties without altering the molecule’s structural core. 3. To overcome challenges in equivariant 3D molecule generation, E3WAE employs a new coordinate loss function with structural alignment, enabling auto-regressive de-novo molecule generation without external references. 4. Experimental results demonstrate E3WAE's superiority in generating drug-like molecules with desired attributes, outperforming state-of-the-art models like EDM and HierDiff in property-targeting tasks across benchmarks like GEOM-Drugs and CrossDocked2020. 5. The disentangled latent space of E3WAE allows targeted control, crucial for real-world drug discovery applications, such as improving synthetic accessibility while maintaining binding affinity to target proteins. 6. A t-SNE visualization confirms effective disentanglement of latent spaces, showcasing E3WAE’s ability to independently manipulate molecular properties and structural contexts. 7. The model leverages fragment-based generation with E(3)-equivariant encoders and decoders, balancing computational efficiency with comprehensive molecular representation. 8. E3WAE opens new avenues for explicit control in other applications, such as protein design, by extending the disentangled representation framework to broader contexts. @leetx1010 @LuoYouzhi 📜Paper: arxiv.org/abs/2412.15086 #MoleculeGeneration #DrugDiscovery #AI

English

1.3K

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·20 Ara

EFFICIENT FINE-TUNING OF SINGLE-CELL FOUNDATION MODELS ENABLES ZERO-SHOT MOLECULAR PERTURBATION PREDICTION @genentech 1. A breakthrough for drug discovery: the study introduces scDCA, a novel drug-conditional adapter, enabling single-cell foundation models (FMs) to predict cellular responses to drugs. It excels in few-shot and zero-shot scenarios, even for unseen cell lines. 2. Why it matters: scDCA uses less than 1% of the trainable parameters of the original FM, ensuring efficiency while preserving the rich biological knowledge from pre-training. This innovation allows precise predictions even with scarce training data. 3. Key advantage: Unlike existing methods limited to seen drugs or drug-cell combinations, scDCA generalizes across new drugs, unseen cell lines, and combinations, offering state-of-the-art performance. 4. How it works: By integrating molecule embeddings into trainable adapters within frozen transformer layers, scDCA bridges the gap between gene expressions and chemical modalities without overfitting. 5. Real-world potential: scDCA supports applications in personalized medicine and virtual screening by accurately predicting transcriptional changes due to drug perturbations at single-cell resolution. 6. Robust evaluation: The model demonstrates superior accuracy in unseen drug and cell line tasks, achieving 31% higher performance than traditional fine-tuning in zero-shot cell line predictions. 7. Evidence-based performance: scDCA's predictions align closely with experimental uncertainty, ensuring reliability across diverse datasets and molecular targets. 8. Challenges addressed: The method overcomes the scarcity of single-cell perturbation data by leveraging extensive pre-training and efficient fine-tuning, paving the way for scalable drug testing. @tbyanc @gabo_scalia @kangway @jchuetter 📜Paper: arxiv.org/abs/2412.13478… #SingleCell #DrugDiscovery #AI

English

3.1K

Yu Zhang retweeté

Biology+AI Daily@BiologyAIDaily·23 Eki

Generative AI for Drug Discovery: A GPT-2 and LSTM Based Models for Designing EGFR Inhibitors • This study explores the use of GPT-2 and LSTM architectures to generate novel inhibitors targeting the Epidermal Growth Factor Receptor (EGFR), a key therapeutic target in cancer. • Approximately 500,000 bioactive molecules from the ChEMBL database were used to train the models, focusing on the generation of SMILES strings representing drug-like compounds. • Post-generation filtering ensured only valid drug candidates were retained, applying Lipinski’s rule of five, Quantitative Estimate of Drug-likeness (QED), and Synthetic Accessibility Scores (SAS). • LSTM outperformed GPT-2 with a 90.98% validity rate compared to GPT-2’s 52.27%, producing more chemically accurate molecules with high binding affinities. • Both models generated unique structures, with LSTM achieving a nearly perfect uniqueness rate of 99.88%, indicating its robustness for diverse molecule generation. • Docking studies performed with AutoDock Vina on EGFR’s kinase domain (PDB ID: 1M17) revealed strong binding affinities for LSTM-generated molecules, ranging from -9.4 to -10.4 kcal/mol. • LSTM’s superior handling of chemical dependencies makes it ideal for tasks requiring accurate sequential predictions, such as SMILES generation. • This research highlights the potential of generative AI to streamline the drug discovery process, offering efficient tools to discover new inhibitors for EGFR-driven cancers. 📜Paper: biorxiv.org/content/10.110…

English

984

Découvrir

@GoogleDeepMind @emblebi @Nvidia @SeoulNatlUni @WHO @NatureComms @CellCellPress @MIT