


Jitender Dublad
1.5K posts

@DubladJitender
Scientist | Biochemist | Wnt signalling | Bioengineering | Science communicator | Founder and host of @reasonwscience podcast





Progress in AI modeling of proteins leaves major gaps affecting most proteins and especially functional analysis. The opportunities to transcend them beacon: AI models can now predict static protein structures with high accuracy. This achievement is rightly celebrated. It is equally important to recognize what remains unresolved and why those gaps matter hugely for biology. 1. Modeling intrinsically disordered regions (IDRs) is a central limitation. Roughly 30–40% of amino acid residues in the human proteome fall into this category, and ~70% of proteins contain substantial disordered segments. These regions do not adopt a single stable structure; instead, they exist as dynamic ensembles that often become structured only upon binding or under specific cellular conditions. Current AI models -- trained on static structures -- do not predict these ensembles. Instead, they either assign low confidence or produce arbitrary conformations. This is not a minor edge case; it is a large and functionally critical fraction of proteome space, deeply involved in signaling, regulation, and disease. 2. A second key limitation concerns protein function. Biology ultimately depends on changes in conformation, interactions, and state. Many key biological processes arise from shifts between multiple conformations or from subtle perturbations induced by amino acid substitutions, post-translational modifications, or binding partners. Current models are optimized to predict a single, most likely structure. They are not designed to capture how that structure changes under perturbation, nor how populations of states shift. As a result, predicting function -- arguably the central goal -- remains a weakness in many cases. Outlook These two challenges point to a deeper issue: proteins are not static objects but dynamic systems governed by energy landscapes. What is needed next is not just better structure prediction, but models that can capturing ensembles, relative state populations, and the effects of perturbations on those distributions. This will likely require accurate and scalable measurements of proteins, integrating generative models, explicit or learned energetics, and dynamic sampling into a unified framework. In this sense, the field is entering a new phase. Predicting “the structure” was a milestone. Understanding how proteins move, adapt, and function -- especially in the large, disordered fraction of the proteome -- remains the frontier.







