Phillip Kyriakakis
12.4K posts

Phillip Kyriakakis
@breakliquid
Senior Research Scientist at Stanford Bioengineering | Taking apart and building biology.




PlasmidGPT: a generative framework for plasmid design and annotation 1/ PlasmidGPT introduces a novel framework for designing plasmid DNA using a transformer-based generative model. It leverages 153,000 plasmid sequences to create new designs with low sequence similarity to the training data but with similar structural properties. 2/ One of the most exciting features is PlasmidGPT's ability to generate plasmid sequences based on specific user inputs, enabling controlled and custom designs. This opens up new possibilities for efficient plasmid creation in biotechnology applications. 3/ The model uses sequence embeddings to predict attributes like lab origin, species, and vector type, outperforming conventional models like CNNs. It demonstrated an 81% top 1 accuracy in predicting the lab of origin, a significant improvement over previous approaches. 4/ PlasmidGPT also successfully extends its capabilities to natural plasmids, showing comparable performance to CNNs, but with much faster computational time. This could facilitate rapid analysis of both engineered and natural plasmid sequences. 5/ A fascinating aspect is its application in predicting unseen plasmids. In out-of-distribution tests, PlasmidGPT accurately predicted the lab of origin and growth strain, demonstrating robustness in handling new data. 6/ By treating DNA sequences like language, PlasmidGPT provides an intuitive, efficient tool for plasmid analysis and design. Its ability to fine-tune for specific applications (e.g., mammalian vs. bacterial plasmids) highlights its versatility. @ShaoBin_phy 💻Code: github.com/lingxusb/Plasm… 📜Paper: biorxiv.org/content/10.110…























