If a protein has no fixed structure, can we ever say a model 'predicts' it — or are we just modeling a distribution and calling it understanding?

Question

AlphaFold and its successors transformed how we think about protein structure, but they were trained on, and excel at, ordered proteins with a single dominant fold. Intrinsically disordered proteins (IDPs) and disordered regions — which make up a large fraction of the human proteome and are central to many "undruggable" disease targets — don't have one structure. They exist as a shifting ensemble of conformations, and their function often depends on that disorder.
This raises a hard question that cuts across machine learning, biophysics, and philosophy of science: what does it even mean to "predict" something that has no ground-truth answer?
Points to fight over:

Ensembles vs. structures. When a model outputs a single confident structure for a disordered region (often with low pLDDT), is that a useful approximation or an actively misleading artifact? Should benchmarks penalize confident wrongness more harshly than honest uncertainty?
What counts as validation? For ordered proteins we have crystal structures. For IDPs the "truth" is a probability distribution inferred indirectly from NMR, SAXS, smFRET — each with its own model assumptions baked in. If we validate an ML ensemble against an experimentally-derived ensemble that is itself a model, are we ever touching reality, or just comparing two inferences?
Functional relevance over structural accuracy. Maybe demanding accurate ensembles is the wrong goal entirely. Should we instead predict function (binding, phase separation, allostery) directly and treat the conformational ensemble as a latent nuisance variable we never need to nail down?
The drug-discovery stakes. "Undruggable" IDP targets (think transcription factors, certain oncoproteins, alpha-synuclein) are exactly where this matters. If our generative models hallucinate plausible-but-wrong transient pockets, we could waste years chasing binding sites that don't meaningfully exist. How much model confidence is enough to justify a medicinal chemistry campaign?

Provocation to seed debate: Is the entire framing of "structure prediction" a category error for the disordered proteome — and is the field clinging to it because structures are legible and fundable, while distributions are not?

Nimit Akhawat · Answer

AlphaFold succeeded because it solved a problem that was unusually well-posed: for many globular proteins, evolution selects a dominant free-energy minimum, and experimental methods like crystallography provide a convenient "ground truth." The model predicts one representative structure, and nature often cooperates by presenting one. IDPs violate almost every assumption underlying that paradigm. Their biological state is an ensemble of rapidly interconverting conformations whose populations shift with concentration, post-translational modifications, binding partners, crowding, and time. There isn't a single correct answer waiting to be recovered. That doesn't mean prediction becomes impossible. It means the target changes. A weather forecast isn't wrong because tomorrow has many possible atmospheric microstates. Quantum mechanics isn't meaningless because particles are described probabilistically. Likewise, an IDP predictor should be judged by whether it predicts the correct ensemble statistics—not whether it guesses one arbitrary conformation. The philosophical distinction matters because it changes what "understanding" means. A model that outputs one crisp structure for an intri…

If a protein has no fixed structure, can we ever say a model 'predicts' it — or are we just modeling a distribution and calling it understanding?

1 Answer