The Align Foundation

232 posts

The Align Foundation

@Align_Bio

Improving life science with programmable experiments. 🧬 🤖 Home of the Protein Engineering Tournament & Open Datasets Initiative

Katılım Ocak 2022

279 Takip Edilen1.7K Takipçiler

The Align Foundation@Align_Bio·1h

Benchmarking generative protein models at scale with GROQ-SEQ 📊 We built a unified benchmarking framework and experimentally tested alignment-based, language, and structure-based models on TEV protease. Structure-based models achieved up to 74% hit rates, a remarkably high level of experimental success, while computational predictions showed weak agreement with experimental activity. This collaboration with the @Deborah Marks Lab enabled large-scale experimental benchmarking of today’s leading generative protein models.🔬 🔗 Dive into the details: hubs.la/Q04gpnfg0 #ProteinEngineering #ProteinDesign #MachineLearning #SyntheticBiology #Biotech #AIinBiology #DataScience #GROQseq

English

209

The Align Foundation@Align_Bio·6 May

🧬 How Much Diversity Can Proteases Handle? GROQ-seq Has Answers We profiled 11,722 sequence-diverse protease homologs, including AI-designed minimized variants, and uncovered robust activity across surprisingly distant sequences. Even across extreme sequence divergence, protease function against the canonical TEV protease substrate persists, with distinct homologs showing activity at as low as 19% sequence identity. Datasets like this are key to unlocking the next generation of machine learning models for protein engineering 🤖 🔗 Read more: hubs.la/Q04fz4l80 #ProteinEngineering #MachineLearning #SyntheticBiology #Biotech #EnzymeEngineering #HighThroughput #DataScience #AIinBiology #GROQseq

English

7.9K

The Align Foundation@Align_Bio·28 Nis

📊 Engineering better PETases isn’t just a modeling problem, it’s a data problem. In the PETase Engineering Tournament, we partnered to develop three independent assay platforms to measure expression and activity across temperature and pH: cell-free systems, E. coli + Rapid Fire Mass Spec, and microfluidic droplets. The takeaways: → High-throughput ≠ high-quality → Realistic assays ≠ scalable assays → Generating reliable, ML-ready data is still the bottleneck Though challenging, this is the kind of groundwork needed to actually move the field forward. 🔗Full methods + learnings: hubs.ly/Q04dFDg00 💪 Many thanks to our sponsor Twist Bioscience for DNA synthesis and to Adaptyv for their valuable collaboration on assay development. #ProteinEngineering #SyntheticBiology #EnzymeEngineering #MachineLearning #Biotech #AssayDevelopment #DataScience #Bioengineering #Sustainability #PlasticRecycling

English

The Align Foundation@Align_Bio·21 Nis

👉 GROQ-seq is Live: Quantitative Protein Function at Scale Each dataset is powered by a function specific genetic circuit in E. coli: • Transcription factors → regulate DHFR expression via operator binding • T7 RNA polymerase → drives transcription from a T7 promoter • TEV protease → cleaves a split DHFR reporter to modulate growth Function → growth → sequencing All calibrated. All comparable. This is how we start making protein function datasets that are quantitative and generalizable. Dive into the data: hubs.la/Q04cNQpM0 More about each circuit: 📊Transcription factors →hubs.la/Q04cNNft0 📊T7 RNA polymerase →hubs.la/Q04cNBJ60 📊TEV protease →hubs.la/Q04cNKQ30 #SyntheticBiology #ProteinEngineering #OpenScience #AI #Biotech #ProteinML #GROQSEQ

English

1.5K

The Align Foundation@Align_Bio·14 Nis

🚀 GROQ-seq: Scale Meets Reproducibility You’ve seen The Align Foundation's recent GROQ-seq data releases on transcription factors, T7 polymerase and TEV protease (hubs.la/Q04bSt3Z0). But protein ML doesn’t just need more data. It needs data you can trust. GROQ-seq delivers both: 📈 Scale (still a critical bottleneck) 🔁 Reproducibility (what makes that scale usable) We stress-tested it: • Same sequence, different barcodes → consistent results (Spearman 0.875) • Same protocol, different facilities → strong agreement (Spearman ~0.8) • Indistinguishable measurements between sites (AUC ≈ 0.56) • Recover top variants from both sites (fold enrichment >5) The Align Foundation is delivering the difference between “more data” and usable data. Because if your measurements don’t reproduce, your models don’t generalize. GROQ-seq is built for: ✔ aggregation across datasets ✔ reliable model training ✔ real-world protein design This is how we move from fragmented assays to foundation datasets for biology. 🔗Check out all the details: hubs.la/Q04bSvsR0 #OpenScience #ProteinEngineering #BioAI #MachineLearning #Reproducibility #ProteinML #SyntheticBiology #AIforBiology #GROQSEQ

English

9.1K

The Align Foundation@Align_Bio·8 Nis

📢 HiBiT Feasibility Study: Toward Scalable Protein Expression Measurement In our new report, @Align_Bio explores how to address one of the key bottlenecks in protein engineering: reliable prediction of soluble protein expression. In collaboration with @Ginkgo Bioworks and with leading contributions from Kasia Baranowski, we evaluate the HiBiT luminescence-based reporter as a scalable, quantitative approach to measure protein expression across hosts. Tested in both E. coli and P. pastoris, the assay shows strong potential as a standardized, high-throughput method for generating reproducible expression data. This work contributes to the development of large-scale datasets required to train more accurate models aimed at predicting protein expression. 🔗 Access the report: hubs.la/Q04b2kgV0 #SyntheticBiology #ProteinEngineering #BioAI #MachineLearning #HiBiT #Ecoli #Ppastoris #OpenScience

English

630

The Align Foundation@Align_Bio·31 Mar

📢 Data Release Tuesday: Align TEV Protease Dataset 📊 We’re expanding the The Align Foundation data ecosystem again with ~30,000 high-quality GROQ-seq data points capturing TEV protease sequence–function relationships at scale. To our knowledge, this is the largest mutational dataset on TEV protease to date. Notably, no comprehensive deep mutational scanning study across the full protein has been reported in over three decades, leaving key aspects of its functional landscape unexplored. TEV protease is a cornerstone tool in biotechnology, known for its high substrate specificity, and this dataset provides a rich resource for enzyme engineering and ML-driven protein design. This release was made possible by a strong cross-team effort, with key contributions from Erika Alden DeBenedictis, @Anjali Chadha, @Dave Ross’s team at National Institute of Standards and Technology (NIST) and the DAMP Lab at Boston University 🔗 Access the dataset: hubs.la/Q048Z3890 #OpenScience #SyntheticBiology #ProteinEngineering #BioAI #MachineLearning #AlignData #GROQSEQ #Protease #TEV

English

1.4K

The Align Foundation@Align_Bio·24 Mar

📢 Public Data Release: Align T7 RNA Polymerase Dataset. 📊The data keeps coming at @Align_Bio! We’re excited to release our T7 RNA polymerase dataset, adding ~35,000 unique GROQ-seq data points to the growing Align data ecosystem, capturing sequence–function relationships across variants at scale. To our knowledge, this is the largest mutational dataset on T7 RNA polymerase to date! 🔗 Access the dataset on the Align Data Portal: hubs.la/Q04802KK0 #OpenScience #SyntheticBiology #ProteinEngineering #BioAI #MachineLearning #AlignData #GROQSEQ #RNApolymerase

English

1.3K

The Align Foundation@Align_Bio·17 Mar

🔗 Access the dataset on the Align Data Portal: hubs.la/Q0470NBn0 👏 Huge thanks to our collaborators and everyone involved in making this dataset possible.

English

144

The Align Foundation@Align_Bio·17 Mar

This is the result of a fantastic collaboration with David Ross's Lab at @NIST, @simonsnitz at @Harvard and @Damp_Lab. We’re thrilled to make the data publicly accessible to support the broader research community.

English

167

The Align Foundation@Align_Bio·17 Mar

🚀 Public Data Release: GROQ-seq Transcription Factor Dataset We’re excited to announce the public release of The Align Foundation's first GROQ-seq dataset, one of the largest high-resolution transcription factor datasets of its kind.

English

2.7K

The Align Foundation@Align_Bio·24 Şub

Align and @GoogleDeepMind are partnering to build AI-ready datasets & evaluations for the future of predictive #AMR biology. Researchers worldwide can submit concepts through March 31 w/ roadmapping workshops coming to North America + APAC this spring. 🔗 hubs.la/Q044rDpL0

English

27.9K

The Align Foundation@Align_Bio·15 Oca

Designed to be scalable and with closed-loop experimentation, Tesseract will power AI to map sequence → function, predict gene transfer success, and assign function to unknown genes from rich phenotypic signatures. Big thanks to the collaborators+reviewers who helped shape this!

English

276

The Align Foundation@Align_Bio·15 Oca

In partnership with @Pioneer__Labs, we’re proposing Tesseract: a large-scale, open microbial phenomics dataset to functionally annotate microbial genomes at scale. 🧬🤖 ✅5M diverse genes x 50 host strains × 100 conditions 🔗 Read the proposal: zenodo.org/records/179902…

English

710

The Align Foundation@Align_Bio·17 Ara

🚀 Team growth: 2x growth in team size vs last year to support all this open science. Huge thanks to collaborators, participants, and the entire Align team. You make this possible!

English

197

The Align Foundation@Align_Bio·17 Ara

🔬 ALIGN WRAPPED: our year in open science 🔬 Thread 🧵 We turned up the impact. Here are highlights from the past 12 months at The Align Foundation. Want to collaborate or learn more? ✉️ contact@alignbio.org #AIforBiology #OpenScience #ProteinEngineering #Microbes #Data

English

581

The Align Foundation@Align_Bio·17 Ara

🎤 Outreach: Attended 30+ conferences to present our work, build partnerships, and spread open science worldwide.

English

132

The Align Foundation@Align_Bio·17 Ara

📝 4 peer-reviewed publications: • Data Scaling • Results of the Protein Engineering Tournament • Can protein expression be ‘solved’? • The influence of automation and AI on biology

English

156

The Align Foundation@Align_Bio·17 Ara

📦 Data releases: 5 public datasets (>2M data points) live in our Data Portal, plus 7 datasets under embargo (>500K data points) arriving next year and many more in the pipeline.

English

Keşfet

@Deborah @Ginkgo @Dave @NIST @simonsnitz @Harvard @Damp_Lab @GoogleDeepMind