Naail Kashif-Khan

532 posts

Naail Kashif-Khan

@NKhan212

Making proteins in the lab and with AI 🔬💻 Wielder of petri dish and keyboard 🧫⌨️ Love heavy metal and the Arsenal 🎸🔴

London, England Katılım Ağustos 2015

356 Takip Edilen220 Takipçiler

Naail Kashif-Khan retweetledi

Biology+AI Daily@BiologyAIDaily·10 Eyl

Optimizing Molecular Glues Using Free Energy Perturbation and Cofolding Methods 1. This study presents a comprehensive evaluation of Free Energy Perturbation (FEP) and Boltz-2 for predicting the binding affinity of molecular glues to protein complexes. The results show that FEP outperforms Boltz-2 in terms of correlation and RMSE, highlighting the need for more accurate high-throughput methods. 2. Molecular glues are small molecules that induce protein-protein interactions, offering access to new biology and protein targets. However, their rational design and optimization are challenging due to the dynamic nature of their binding sites. This study addresses this challenge by providing a detailed comparison of computational methods. 3. The study assessed 93 compounds across six diverse target/effector complexes, yielding 140 unique protein-compound measurements. This large-scale evaluation provides valuable insights into the capabilities and limitations of FEP and Boltz-2 in the context of molecular glue optimization. 4. FEP demonstrated good absolute predictability with RMSE values within 0.3-1.25 kcal/mol and strong correlations, making it a valuable tool for molecular glue optimization despite its higher computational cost. In contrast, Boltz-2 exhibited poor absolute predictability and generally poor correlations. 5. The poor performance of Boltz-2 suggests it is not suitable for high-throughput screening of molecular glues. This highlights the need for more accurate, high-throughput machine learning methods for pre-FEP screening to accelerate the discovery of molecular glues. 6. The study underscores the importance of accurate computational methods in the challenging field of molecular glue optimization. The findings provide a foundation for future research aimed at developing more efficient and accurate tools for drug discovery. 📜Paper: doi.org/10.26434/chemr… #MolecularGlues #FreeEnergyPerturbation #Boltz2 #DrugDiscovery #ComputationalBiology

English

179

8.7K

Naail Kashif-Khan@NKhan212·8 Kas

@liambai21 Yup folds fine with the ESM Atlas API, so must be the visualizer being difficult!

English

121

Liam Bai@liambai21·8 Kas

Ah if 166aa doesn’t work something is definitely up. I just tried a sequence and it worked so it might be a transient issue. The API we’re using is the same as ESMAtlas so I’m curious if the sequence can be folded there. If that errors out (has happened to me before) then it’s definitely an issue with the API. Otherwise it’s probably a bug in our visualizer. esmatlas.com/resources?acti…

English

Liam Bai@liambai21·8 Kas

Ever wondered how a protein language model sees your favorite protein? Checkout out our SAE visualizer where you can now search any sequence for activating features.

English

170

31.8K

Naail Kashif-Khan@NKhan212·8 Kas

@liambai21 I was just testing a few of the example sequences shown, some of which are pretty long! But even trying with a 163aa sequence it still doesn't work for me, it does the loading animation and then gives just a blank space. I'll tinker around and maybe try a different browser!

English

142

Liam Bai@liambai21·8 Kas

@NKhan212 Is your sequence under 400 residues (limit for ESMFold API)? If so, I’ve also experienced temporary hiccups with the API but it usually works if you retry in a bit!

English

200

Naail Kashif-Khan@NKhan212·8 Kas

@DdelAlamo This is so cool! Not only really interesting to see what pLMs are "thinking", but super visually satisfying too 👀

English

123

Diego del Alamo@DdelAlamo·8 Kas

This is a great tool. Off the bat, feature 4000 of looks like it recognizes residues lining the inside of transmembrane beta-barrels. Using de novo designed proteins (PDBs 6X1K 6X9Z) and some natural proteins (PDB 2MLH)

Liam Bai@liambai21

Ever wondered how a protein language model sees your favorite protein? Checkout out our SAE visualizer where you can now search any sequence for activating features.

English

117

14.9K

Naail Kashif-Khan@NKhan212·29 Eki

@DdelAlamo I always just assumed they were LLM generated using the text of the paper lol

English

309

Diego del Alamo@DdelAlamo·29 Eki

I appreciate that this account exists but sometimes wonder where these summaries come from exactly

Biology+AI Daily@BiologyAIDaily

ProtSCAPE: Mapping the landscape of protein conformations in molecular dynamics 1. ProtSCAPE is a deep learning architecture designed to map protein conformations from molecular dynamics (MD) simulations, using a novel combination of learnable geometric scattering with dual attention mechanisms. 2. The model employs geometric scattering to capture both local and global protein structures, representing proteins as graphs, which are then processed by a transformer with dual attention—focusing on residues and amino acids. 3. ProtSCAPE’s latent representations are temporally coherent, allowing it to capture conformational transitions, such as phase changes between open and closed states, and stochastic switching between meta-stable conformations. 4. Unlike conventional MD trajectory analysis that often misses complex transitions, ProtSCAPE excels in generating detailed low-dimensional representations that retain structural and temporal context, enabling enhanced visualization and downstream analysis of protein dynamics. 5. ProtSCAPE effectively generalizes from short to long trajectories and from wild-type to mutant proteins, offering insights into how mutations can affect the protein conformational landscape. 6. The model can interpolate between states to reconstruct intermediate conformations, validated with case studies on proteins like MurD, which showed hinge-like transitions consistent with experimental data. 7. ProtSCAPE outperformed traditional graph-based methods (GNNs) in predicting pairwise distances and dihedral angles, demonstrating its superior ability to decode the dynamics of protein conformations. 8. This tool holds significant promise for studying complex protein functions, such as allostery, binding, and enzymatic catalysis, by providing a comprehensive view of protein motion across various temporal scales. @egbertcastro @dbhaskar92 @KrishnaswamyLab @Siddharth2814 💻Code: github.com/KrishnaswamyLa… 📜Paper: arxiv.org/abs/2410.20317 #ProteinDynamics #DeepLearning #Bioinformatics #MolecularDynamics #Transformer #ProteinConformation #MachineLearning #ComputationalBiology

English

4.8K

Naail Kashif-Khan@NKhan212·18 Eyl

@klausenhauser I guess that's my overall concern here - you can't trust (or even really know) the data that will go into these models, let alone anything else about them (no code or weights, no details on training or experiments) so who will trust these over something like ESM-2?

English

Kelvin Lau 🧬🧪💎@klausenhauser·18 Eyl

@NKhan212 If you saw their second announcement, their foundry is now a massive data generation platform. However they’ll do so many different assays that I don’t even know if they have the expertise in them. Even if they don’t analyze, if you don’t trust the generator then it’s GIGO 🗑️🚮

English

127

Naail Kashif-Khan@NKhan212·18 Eyl

So Ginkgo has just announced its own protein language model - my first question is, why would anyone want to use a closed-source proprietary pLM over one of the open-source and better understood models already out there?

SynBioBeta@SynBioBeta

@Ginkgo’s protein language model, built on @googlecloud technology, offers unprecedented insights for researchers, accelerating the development of life-saving medicines. #DrugDevelopment #ProteinLLM #AIinBiotech #GinkgoBioworks #GoogleCloud loom.ly/RGh0ORc

English

7.4K

Naail Kashif-Khan@NKhan212·3 Eyl

Why release the source code/weights yourself when skeptical scientists will reproduce it for you? 900 IQ plays from @GoogleDeepMind

Sergey Ovchinnikov@sokrypton

AlphaFold3 reproduced and params/code released. 🤩

English

1.5K

Naail Kashif-Khan@NKhan212·2 Eyl

Ah yes it's every protein scientist's favourite amino acid, the shiny golden orb one

English

175

Naail Kashif-Khan@NKhan212·25 Ağu

@egor__marin @DdelAlamo Interesting...anyone wanna make some structure problems?

English

Egor Marin@egor__marin·25 Ağu

@DdelAlamo there's Rosalind project (rosalind.info/problems/locat…), but sadly there are no structural bioinformatics problems

English

241

Diego del Alamo@DdelAlamo·25 Ağu

There should be leetcode for comp bio just to brush up our skills on all the random small bullshit we do as part of our jobs. I refuse to believe rosetta partial_thread is the best way to isolate specific subregions of a PDB file from a sequence alignment

English

2.6K

Naail Kashif-Khan@NKhan212·23 Ağu

@KevinKaichuang It's so visually satisfying!

English

103

Kevin K. Yang 楊凱筌@KevinKaichuang·23 Ağu

New favorite protein structure: alpha helix inside a beta barrel!

English

408

30.8K

Naail Kashif-Khan@NKhan212·24 Tem

People training biological "foundation" models - beware!

Eric Topol@EricTopol

On GenAI model collapse with training sets of excessive #AI-generated content nature.com/articles/s4158… nature.com/articles/d4158… @iliaishacked @yaringal @OATML_Oxford

English

460

Naail Kashif-Khan@NKhan212·15 Tem

@sokrypton This is super interesting, is there some code we can play with to experiment with this?

English

1.1K

Sergey Ovchinnikov@sokrypton·15 Tem

Ever wondered how many amino acids you can mutate to alanine and AlphaFold2 still predicts same structure? 🤔For denovo design Top7 (1QYS), single-sequence mode, it's 60%. (1/2)

English

559

124.3K

Naail Kashif-Khan@NKhan212·3 Tem

@jacopo_gab "ise" vs "ize" isn't always just about British vs American English altexta-editing.com/its-all-about-…

English

Naail Kashif-Khan@NKhan212·3 Tem

@pengzhangzhi1 @LTEnjoy Something just doesn't sit right with me - train an inverse folding model on AF2 structures (which we think might not be a great idea), and then train a big language model on those inverse folded sequences, and also on AF2 predicted structures

English

Naail Kashif-Khan@NKhan212·3 Tem

@pengzhangzhi1 @LTEnjoy This is exactly what I'm concerned about. ProteinMPNN is trained only on the PDB and has been widely experimentally validated. In contrast, ESM-IF is trained on the PDB and 12m AF2 structures and no one's got it to work in the literature yet.

English

Jin Su@LTEnjoy·1 Tem

Just evaluated the inverse folding ability of the released ESM3 (esm3_sm_open_v1) on the CATH test set (around 1100 proteins). ESM3 performed better than Saprot but surprisingly inferior to ProteinMPNN🧐. PS: The overall predicitons took ~1.5h on one A40 GPU.

English

6.5K

Naail Kashif-Khan@NKhan212·1 Tem

@LTEnjoy Cool stuff! Perhaps not so surprising given that ESM3 is trained on a bunch of predicted structures and inverse folded sequences. I reckon there's a load of junk in that data and it's probably not as good as training on just the PDB which are all "real" structures

English

482

Naail Kashif-Khan@NKhan212·28 Haz

@julian_englert @diffuse_bio @EvoscaleAI This is really cool! Protein design is very shiny and flashy but the nitty gritty lab characterization is my favourite part :) Any ideas why the non-binder from the paper looked like it binds in your experiments?

English

Julian Englert@julian_englert·27 Haz

Benchmarking AI-designed proteins in our lab New AI models for designing proteins are coming out at a faster and faster pace! Just in the past two weeks, two new models were released: @diffuse_bio's DSG-1 and @EvoscaleAI‘s ESM3. As the number of AI models increases, it becomes important for protein designers to know what model actually works best for their application. At @adaptyvbio we just launched a series of real-world benchmarks to understand how state-of-the-art protein design models perform when tested in the lab. For this first case study, we’re validating some de-novo designed binders from RFdiffusion by @UWproteindesign. Read more: adaptyvbio.com/blog/rfdiff_il…

English

115

11.2K

Naail Kashif-Khan@NKhan212·26 Haz

@jakublala It also seems that the "open" model isn't actually the best or biggest one they trained so they've definitely nerfed the publicly available stuff to keep the best ones for commercial use I'd imagine

English

Naail Kashif-Khan@NKhan212·26 Haz

@jakublala Based on this snippet from their press release I think they're going to pull a DeepMind and keep some secret sauce to themselves for licensing to pharma for drug discovery. I believe the currently available model is non commercial use only too but have to double check

English

Jakub Lála 👨🏻‍🍳🥯@jakublala·26 Haz

so how much will this esm3 api cost? any ideas? and what about the IP associated with the generations?

English

135

Keşfet

@liambai21 @DdelAlamo @klausenhauser @GoogleDeepMind @egor__marin @KevinKaichuang @elonmusk @BarackObama