Martin Mayta

3.3K posts

Martin Mayta

Martin Mayta

@MartinMayta2

BSc Biotechnology - PhD in Biological Sciences. Teaching assistant professor at @UNRoficial & @UAPArgentina (Argentina)🇦🇷.

Katılım Ekim 2020
579 Takip Edilen218 Takipçiler
Jason Sheltzer
Jason Sheltzer@JSheltzer·
AI is cool and all... but a new paper in @ScienceMagazine kind of figured out the origin of life? The paper reports the discovery of a simple 45-nucleotide RNA molecule that can perfectly copy itself.
Jason Sheltzer tweet media
English
181
1.1K
6.2K
861.9K
Martin Mayta
Martin Mayta@MartinMayta2·
Interesting...
Biology+AI Daily@BiologyAIDaily

TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations 1. TerraBind achieves 26-fold faster inference than state-of-the-art methods like Boltz-2 while improving binding affinity prediction accuracy by approximately 20%, addressing a critical computational bottleneck in structure-based drug design. 2. The core innovation challenges the prevailing assumption that full all-atom diffusion is necessary for accurate predictions. Instead, TerraBind uses a coarse pocket-level representation with only protein Cβ atoms and ligand heavy atoms, eliminating expensive generative modeling. 3. The architecture combines frozen pretrained encoders—COATI-3 for molecular representations and ESM-2 for protein sequences—with a lean 48-layer pairformer trunk of just 27M parameters, compared to Boltz-2's 509M parameters. 4. For pose generation, TerraBind employs a diffusion-free optimization module that produces 3D coordinates in under 0.2 seconds, matching diffusion-based baselines on FoldBench, PoseBusters, and Runs N'Poses benchmarks. 5. The binding affinity module operates directly on structural pairformer representations without requiring coordinate generation, outperforming Boltz-2 on 15 of 18 proprietary drug discovery targets and achieving superior Pearson correlation on CASP16. 6. A built-in uncertainty quantification system uses pairwise distance entropy as a zero-shot confidence metric, validated to correlate with both pose accuracy and binding strength without separate training. 7. The epistemic neural network (epinet) module provides calibrated affinity uncertainty estimates, enabling a continual learning framework that achieves 6× greater affinity improvement over greedy selection strategies in simulated drug discovery cycles. 8. Structural fine-tuning on minimal proprietary crystallographic data (as few as 3-6 structures) yields 17% affinity improvement on held-out compounds, demonstrating practical adaptability for specific drug programs. 📜Paper: arxiv.org/abs/2602.07735 #TerraBind #DrugDiscovery #MachineLearning #ProteinLigand #BindingAffinity #StructurePrediction #ComputationalBiology #AIforScience

English
0
0
1
106
Martin Mayta
Martin Mayta@MartinMayta2·
Cool!
Biology+AI Daily@BiologyAIDaily

IntelliFold-2: Surpassing AlphaFold 3 via Architectural Refinement and Structural Consistency 1. The authors present IntelliFold-2, an open-source biomolecular structure prediction model that outperforms AlphaFold 3 on therapeutically relevant tasks, particularly antibody-antigen interactions and protein-ligand co-folding. 2. On antibody-antigen docking, IntelliFold-2 achieves 54.5% success rate (v2) and 58.2% (v2-Pro), substantially exceeding AlphaFold 3's 47.9% on the Foldbench benchmark. 3. For protein-ligand co-folding, the model reaches 66.7% (v2) and 67.7% (v2-Pro) success rates, compared to AlphaFold 3's 64.9%, demonstrating consistent improvements in small molecule binding prediction. 4. The architecture introduces latent space scaling in Pairformer blocks, increasing hidden dimensions to enhance representational capacity and hardware efficiency, achieving approximately 30% model FLOPs utilization in the v2-Plus variant. 5. A revised atom-attention mechanism with stochastic atomization enforces more principled multiscale structural representations, improving robustness at the atomic level while maintaining global structural coherence. 6. The authors apply Proximal Policy Optimization (PPO) to fine-tune the diffusion sampling module, framing the sampler as a stochastic policy to encourage physically plausible trajectories and reduce random sampling failures. 7. Difficulty-aware loss reweighting using a focal-loss-style approach emphasizes hard examples such as flexible loops and ambiguous side-chain configurations, leading to more stable optimization dynamics. 8. Three model variants are released: IntelliFold-2-Flash for efficient academic use and fine-tuning, IntelliFold-2 as the most accurate open-source version with 48 widened Pairformer blocks, and IntelliFold-2-Pro as the server-side flagship with exclusive PPO-enhanced sampling. 9. The training pipeline includes re-processed Protein Data Bank curation and scaled self-distillation datasets to improve generalization across complex biomolecular systems. 💻Code: github.com/IntelliGen-AI/… 📜Paper: biorxiv.org/content/10.648… #AlphaFold3 #ProteinStructurePrediction #Bioinformatics #ComputationalBiology #DeepLearning #StructuralBiology #DrugDiscovery #AntibodyDesign #OpenSource #MachineLearning

English
0
0
0
32
Martin Mayta
Martin Mayta@MartinMayta2·
In cellulo recording of a transcriptome! 😲 A genetically encoded device for transcriptome storage in mammalian cells | Science science.org/doi/10.1126/sc…
English
0
0
0
27
Martin Mayta
Martin Mayta@MartinMayta2·
interesting...
Niko McCarty.@NikoMcCarty

New Blog: A new paper shows that enzymes can be made by mixing just four molecules together, none of which are amino acids. The four molecules randomly link together to form long polymer chains, some of which catalyze chemical reactions. Though this sounds impressive, the paper itself is quite strange. For one, it is extremely short (only about 2,700 words) and has no discussion section. The text is also absurdly dense; likely designed to be read by materials or physics people, rather than biologists. And lastly, I think the paper is most interesting for the things it leaves unwritten — the ideas left out rather than put in. Understanding why this paper matters, then, is mostly an exercise in speculation. For context, scientists have been trying to design new enzymes for decades. But this "design" has traditionally been done by searching for amino acid sequences which then fold into a 3D shape with some desired function. Computational biologists tend to fixate on the sequence; they tend to consider proteins as individuals rather than as populations of molecules. Enzyme design is also a really hard problem. An enzyme's interior holds amino acids in a precise way, such that the amino acid(s) in the active site latch onto substrates and convert them into new molecules. This "active site" is surrounded by other amino acids that create a microenvironment suited to the reaction. If the substrate is negatively-charged, for example, the microenvironment works to exclude positively-charged molecules. Despite their complexity, biologists have designed viable enzymes computationally. Last year, David Baker's group at the University of Washington designed a serine hydrolase that breaks down ester groups, or chemicals made by joining together an acid and an alcohol. This AI-designed enzyme has an active site made from three amino acids (a "catalytic triad") that work together to catalyze the reactions. But it was quite slow, completing just one reaction per second, compared to the thousands of reactions per second that is typical of natural serine hydrolases. Enzyme design thus remains a mostly unsolved problem. This new Nature paper, though, took a completely different approach. The key breakthrough, in my eyes, is its focus on populations of polymers rather than in trying to create one perfect polymer. The authors created enzymes using a statistical or probabilistic approach, rather than a deterministic one. The researchers focused on metalloenzymes, which are arguably simpler than serine hydrolases because they only have a single amino acid in their active site, rather than a 'triad'. Metalloenzymes hold metal ions (often zinc, iron, or copper) in that active site; hence the name. The researchers made two types of metalloenzymes: terpene cyclases, which take a string of carbons as substrate and "loop" them into a circle, and peroxidases, which use the iron in heme to oxidize substrates, like hydrogen peroxide. I'll just focus on the terpene cyclase, as the approach taken was largely identical in both cases. In nature, terpene cyclases take a straight chain of ten carbon atoms — a molecule called citronellal — and fold them into a ring. If all goes well, the enzyme makes isopulegol, which is a carbon ring with one alcohol group. But if water gets into the active site, this reaction is disrupted and the enzyme instead makes menthoglycol, which is the same carbon loop but with two alcohol groups. Natural terpene cyclases have aspartate in their active site. The aspartate donates a proton to citronellal, thus making one of its carbon atoms positively charged. This triggers cyclization into a ring, as the "activated" carbon joins the carbon at the other side of the chain. The aspartate is surrounded by a hydrophobic shell, which keeps water out so that isopulegol gets made selectively instead of menthoglycol. Seeking to create random polymers which could mimic a terpene cyclase, the researchers first analyzed 1,300 metalloproteins, looking for commonalities between them. They found two things: First, metalloproteins tend to have one "key" amino acid in their interior — often histidine or aspartate — which latches onto the metal ion, locking it in place, so that it can perform the chemical reaction. Second, metalloenzymes tend to surround their active sites with hydrophobic amino acids, which exclude water molecules. To make a metalloenzyme, then, one basically just needs to situate a single amino acid, or electron donor, inside a hydrophobic shell. Next, the authors scoured chemical databases for molecules with these same properties, meaning they are hydrophobic or similar in shape and charge to histidine or aspartate. They ultimately settled on four molecules: 1. Methyl methacrylate (MMA), a hydrophobic molecule. 2. 2-ethylhexyl methacrylate (EHMA), an even more hydrophobic molecule. 3. Oligo(ethylene glycol) methyl ether methacrylate (OEGMA), a hydrophilic molecule. 4. 3-sulfopropyl methacrylate potassium salt (SPMA), which mimics aspartate as an electron donor; the active site surrogate. (Note: You need some hydrophilic molecules, even when trying to build a hydrophobic active site, because the polymers won’t dissolve in water without them. Instead, they will aggregate or precipitate out of the solution. Hence the inclusion of OEGMA.) Then, the researchers mixed these four molecules together, and each molecule randomly linked with others to create long and unique polymers. The hope was that some of these "pseudo-random" polymers would position a SPMA amid hydrophobic molecules, thus creating a terpene cyclase mimic. Initially, things did not go to plan. In their first trial, the researchers mixed 50% MMA, 20% EHMA, 25% OEGMA, and 5% SPMA and added the resulting polymers to citronellal. After 24 hours, the polymers cyclized citronellal, but poorly. About half of the citronellal molecules were converted, and only 55 percent of products were isopulegol. In other words, the polymers could slowly catalyze reactions, but not selectively. So the authors iterated. To optimize their reaction, they used a Monte Carlo algorithm to generate 100,000 polymer sequences based on each molecule's ratio and reactivity. By tinkering with the molecular ratios and re-running these simulations, they figured out they could improve the odds that SPMA would be surrounded by hydrophobic residues — and thus act like a terpene cyclase — if they increased SPMA's concentration (to 15%) while decreasing OEGMA (to 5%). This yielded much better results. In a second round, the polymers converted 91 percent of citronellal after 24 hours, with a selectivity for isopulegol of 76 percent. So why does any of this matter? Well, the paper doesn't really say, outside of some vague or indirect commentary. So what follows is mostly speculation… I think one reason this paper is important is because it does away with the outdated notion that enzymes must be tuned at the sequence-level. The study shows, rather, that enzymes can be made spontaneously using pseudo-random populations of molecules, much like the earliest cells on Earth probably did. Early lifeforms didn't need to evolve the perfect enzyme; they just needed to find concoctions of molecules that were "good enough" for a particular function. The study also suggests that the 20 amino acids used by cells are not particularly special, and their functions can be replaced with other molecules carrying the same properties — like "charged" or "hydrophobic" or "flexible" and so on. When I first discussed this paper with a friend, a protein biochemist, they urged me not to write about it. They said that metalloenzymes are not particularly difficult to make, and so this paper's outcomes aren't all that surprising. They pointed to another study demonstrating that it's possible to make functional metalloenzymes simply by mixing purified phenylalanine with zinc ions. My retort to their criticism, though, is that the authors have already used this same "random polymer" approach to make other types of proteins. In 2020, for example, they made protein channels that were exquisitely sensitive to protons and, during our conversation, hinted that they have also made other classes of enzymes, including hydrolases. But still, this paper leaves so much left unsaid. I suspect many protein biochemists reading this blog still won't find the work impressive or useful or surprising or whatever. It takes a long time to overturn dogma, after all, and it'll be an uphill battle to change peoples' perceptions of enzymes and how one can make them. The paper itself also took seven years of work, according to a corresponding author, and involved many back-and-forth debates with a "hostile" reviewer. The manuscript was cut nearly in half (from nearly 5,000 words to 2,700), losing much of its philosophical framing. "This was the hardest paper I've ever published," the authors told me. And after spending a week wrestling with whether to write about it, I understand why.

English
0
0
0
22
Martin Mayta
Martin Mayta@MartinMayta2·
Wow so cool!
Biology+AI Daily@BiologyAIDaily

AI-­assisted protein design to rapidly convert antibody sequences to intrabodies targeting diverse peptides and histone modifications @ScienceAdvances 1. A new AI-driven pipeline has been developed to convert antibody sequences into functional intrabodies, significantly improving the success rate of intrabody design. This approach leverages AlphaFold2, ProteinMPNN, and live-­cell screening to optimize antibody frameworks while preserving epitope-binding regions. 2. The study successfully converted 19 out of 26 antibody sequences into functional single-­chain variable fragment intrabodies, including those targeting histone modifications for real-­time imaging of chromatin dynamics. Notably, 18 of these sequences had previously failed using standard methods. 3. The pipeline integrates advanced AI tools to predict and optimize the folding and stability of intrabodies in the intracellular environment. This addresses key challenges such as misfolding and aggregation that often hinder intrabody functionality. 4. The method was applied to create intrabodies targeting histone modifications, tripling the number of available intrabodies for this purpose. This advancement enables more detailed studies of chromatin dynamics and gene regulation in living cells. 5. The authors provide open-source code for their pipeline, along with useful metrics and tables to predict which designs will retain functionality inside cells. This resource will facilitate further research and development in intrabody design. 6. As antibody sequence databases continue to expand, this AI-driven approach is expected to accelerate intrabody design, making it easier, more cost-effective, and broadly accessible for biological research. 💻Code: github.com/jbderoo/scFv_P… 📜Paper: science.org/doi/10.1126/sc… #AIProteinDesign #IntrabodyEngineering #HistoneModifications #LiveCellImaging #Bioengineering #ComputationalBiology

English
0
0
0
9
Martin Mayta retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Explaining How Mutations Affect AlphaFold Predictions 1. A new study explores the inner workings of AlphaFold models, particularly how mutations influence its protein structure predictions. The research introduces CAAT (Conformational Attention Analysis Tool), a novel algorithm that identifies critical amino acids affecting AlphaFold's predictions. 2. The study reveals that AlphaFold relies on simple, sparse patterns of amino acids to select protein conformations. For example, in the human immunoprotein XCL1, which switches between monomeric and dimeric states, AlphaFold's predictions are heavily influenced by amino acids at positions 14, 43, and 48. This finding highlights both the strengths and limitations of AlphaFold's approach. 3. CAAT successfully identifies key amino acids important for protein structure by analyzing AlphaFold's attention patterns. The tool was tested on multiple fold-switching proteins, including XCL1, KaiB, and RfaH. In each case, CAAT's predictions were validated experimentally, demonstrating its potential to guide effective mutations and improve protein design. 4. The research underscores the importance of understanding AI models' mechanisms to address their limitations. AlphaFold's reliance on sparse patterns can lead to oversights, such as ignoring the effects of mutations in low-attention regions. This insight suggests that incorporating broader physical concepts like solvent interaction and temperature could enhance the accuracy of protein structure predictions. 5. The study's findings extend beyond protein structure prediction, offering a framework for interpreting other transformer-based models. By converting AI interpretability into a functional tool, this work paves the way for more transparent and reliable AI applications in computational biology. 📜Paper: biorxiv.org/content/10.648… 💻Code: github.com/prameshsharma2… #AlphaFold #ProteinStructure #AIInterpretability #ComputationalBiology #MachineLearning
Biology+AI Daily tweet media
English
0
16
92
4.7K
Martin Mayta
Martin Mayta@MartinMayta2·
WOW so cool!
Jorge Bravo Abad@bravo_abad

AI-designed DNA switches that turn genes on in exactly one cell type Every cell in your body carries the same genome, yet a liver cell behaves nothing like a neuron. The difference lies in regulatory DNA—short sequences that act as switches, controlling which genes are active in which tissues. These enhancers can drive strong expression in one cell type while staying completely silent in another. Designing synthetic versions of these switches—sequences that reliably activate a target gene in diseased cells while leaving healthy tissue untouched—has remained a central challenge for gene therapy. And they need to be compact: the viral vectors used to deliver therapeutic genes have strict size limits. Lucas Ferreira DaSilva and coauthors tackle this with DNA-Diffusion, a generative AI framework that applies diffusion models—the same architecture behind image generators like DALL-E—to DNA sequence space. The model trains on regions of open, accessible chromatin from three human cell lines (B-lymphocytes, leukemia cells, and liver cancer cells), learning what sequence patterns correspond to regulatory activity in each cell type. The generated sequences aren't copies of training data—only 4.7% share even a 20 base pair (bp) match—yet they contain the right binding sites for cell-type-specific transcription factors (the proteins that read these switches). A tunable parameter lets users dial between sequences resembling natural enhancers and sequences optimized for maximum activation. Validation spans three levels: computational predictions of chromatin accessibility and gene expression; a library of 5,850 synthetic sequences tested for enhancer activity across all three cell lines; and critically, modulation of an actual gene in its native chromosomal location. The team targeted AXIN2, a gene that protects against leukemia progression but is often silenced in malignant B cells. A naturally occurring mutation upstream of AXIN2 modestly reactivates it and correlates with better patient survival. Multiple AI-designed sequences surpassed this protective variant's activation levels. The message: by combining generative AI with functional genomics, it's now possible to design compact 200 bp regulatory elements—small enough for standard gene therapy delivery—that achieve cell-type-specific control exceeding what evolution has produced, opening a path toward therapies where synthetic switches activate genes only where needed. Paper: nature.com/articles/s4158…

English
0
0
0
23
Martin Mayta retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
De novo designed bifunctional proteins for targeted protein degradation 1. A new study by Mylemans et al. presents a novel approach to targeted protein degradation (TPD) using de novo designed bifunctional proteins. This method offers an alternative to traditional PROTACs by leveraging computational protein design to create stable and adaptable protein scaffolds that can bind to both a target protein and an E3 ubiquitin ligase, potentially expanding the range of targets accessible for degradation. 2. The researchers developed a helix-turn-helix scaffold that can present different binding sites. Using computational tools like ProteinMPNN and AlphaFold2, they incorporated high-affinity binding sites for the anti-apoptotic protein BCL-xL and a short linear motif (SLiM) for the E3 ligase KLHL20. The resulting proteins were shown to degrade BCL-xL in cells, leading to apoptosis. 3. A key innovation in this work is the rational design of a highly stable protein scaffold that can be customized with multiple binding sites. This scaffold, based on a coiled-coil peptide, was optimized for thermostability and structural integrity, allowing it to maintain its function in cellular environments. 4. The study demonstrates the potential of combining computational design with experimental validation to create protein-based degraders. The bifunctional proteins were tested in vitro and in cell assays, showing significant degradation of BCL-xL and induction of apoptosis in lung cancer cells, comparable to the effects of the small-molecule PROTAC DT2216. 5. This work highlights the versatility of de novo protein design in creating targeted therapies. By using computational methods to design and optimize protein-protein interactions, the researchers were able to address limitations of current TPD strategies, such as the limited number of usable E3 ligases and the challenges in designing high-affinity binders. 📜Paper: biorxiv.org/content/10.648… #ProteinDesign #TargetedProteinDegradation #ComputationalBiology #DeNovoProteins #Biotechnology
Biology+AI Daily tweet media
English
6
8
39
6.6K
Martin Mayta
Martin Mayta@MartinMayta2·
wow!! a super-stable protein designed by AI?? good or bad?🤔☠️🦠
Jorge Bravo Abad@bravo_abad

AI-designed proteins that survive 150 °C and nanonewton forces Proteins are usually fragile machines. Heat them, pull on them, or send them through a high-temperature sterilization step (like those used in hospitals), and most will unfold and aggregate, losing their function. Yet many natural systems—like muscle titin or spider silk—hint that if you organize β-sheet hydrogen bonds in the right way, you can get remarkable mechanical strength and thermal resilience. Bin Zheng and coauthors take that idea and push it to the extreme. Starting from the titin I27 domain, they use an AI+MD pipeline—RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for structure prediction, and steered/annealing MD for screening—to systematically elongate the force-bearing β strands and maximize backbone hydrogen bonds in a shearing geometry. Across multiple design rounds, they grow the network from 4 to 33 backbone H-bonds, creating a “SuperMyo” series of proteins with unfolding forces above 1,000 pN—roughly 4× stronger than I27 under the same pulling conditions. Remarkably, these proteins not only refold after force, but also retain structure and function after exposure to 150 °C and repeated high-temperature sterilization cycles, and can be used as crosslinkers to make hydrogels that survive those treatments intact. The message is powerful: by combining generative protein design with physics-based simulations, it’s now possible to turn a simple principle—pack as many shear-mode hydrogen bonds as possible into β sheets—into synthetic proteins and materials that rival or surpass nature’s own mechanostable systems, enabling protein-based hydrogels and biomaterials that remain functional under conditions that would normally destroy conventional proteins. Paper: nature.com/articles/s4155…

English
0
0
0
23