AlphaFold succeeded because it solved a problem that was unusually well-posed: for many globular proteins, evolution selects a dominant free-energy minimum, and experimental methods like crystallography provide a convenient "ground truth." The model predicts one representative structure, and nature often cooperates by presenting one.
IDPs violate almost every assumption underlying that paradigm. Their biological state is an ensemble of rapidly interconverting conformations whose populations shift with concentration, post-translational modifications, binding partners, crowding, and time. There isn't a single correct answer waiting to be recovered.
That doesn't mean prediction becomes impossible. It means the target changes.
A weather forecast isn't wrong because tomorrow has many possible atmospheric microstates. Quantum mechanics isn't meaningless because particles are described probabilistically. Likewise, an IDP predictor should be judged by whether it predicts the correct ensemble statistics—not whether it guesses one arbitrary conformation.
The philosophical distinction matters because it changes what "understanding" means. A model that outputs one crisp structure for an intrinsically disordered region isn't necessarily understanding the protein; it may simply be collapsing uncertainty into a visually satisfying artifact.
Ensembles vs. structures
This is where current models become problematic.
AlphaFold's low pLDDT scores are often interpreted as saying, "don't trust this region." That's valuable because the model is, in effect, expressing epistemic uncertainty. The danger comes when downstream users ignore that uncertainty and treat the predicted coordinates as if they represented an actual transient state.
For IDPs, a single predicted structure can be useful as a representative sample—but only if it's explicitly presented as one draw from a distribution rather than the answer.
Benchmarks should therefore reward calibrated uncertainty. A model that says "I don't know" or "there are many equally likely conformations" is scientifically preferable to one that confidently invents a fold that never meaningfully exists. In this domain, overconfidence is often a more damaging error than imprecision.
What counts as validation?
This is the deepest epistemological issue.
There is no experimental instrument that directly measures an ensemble in atomic detail.
Instead, NMR, SAXS, smFRET, HDX-MS, cryo-EM, and related techniques each observe different projections of the conformational landscape. Researchers then solve an inverse problem to infer an ensemble consistent with those measurements.
So when an ML model is compared against an experimentally derived ensemble, we're comparing:
model inference ↔ experimental inference
rather than
prediction ↔ reality.
That sounds unsettling, but it's actually common across science. Cosmology compares inferred models of the early universe to observations filtered through instruments and statistical assumptions. Climate science compares distributions rather than individual trajectories.
The key is not whether we access "reality" directly—we rarely do—but whether independent experimental constraints converge on the same ensemble statistics.
The standard should therefore be predictive consistency across multiple orthogonal measurements, not agreement with one reconstructed ensemble.
Should function replace structure?
Possibly.
Cells don't care whether we reconstruct every microscopic conformation.
They care whether an IDP binds a partner, undergoes phase separation, recruits cofactors, or switches regulatory states.
That suggests a different objective:
Learn the latent ensemble only insofar as it improves prediction of experimentally measurable function.
This parallels machine learning more broadly. We rarely demand that a language model reconstruct every latent cognitive representation humans might have. We ask whether it predicts behavior.
For proteins, function may be the observable that matters.
The counterargument is that medicinal chemistry still needs mechanistic insight. If you want to stabilize one conformational subpopulation with a small molecule, the ensemble isn't merely a nuisance variable—it's the substrate you're manipulating.
So structure doesn't disappear.
It becomes an intermediate representation whose value is determined by whether it improves downstream biological prediction.
The drug-discovery stakes
This is where calibration becomes more important than accuracy.
Drug discovery already tolerates enormous uncertainty. Virtual screening routinely evaluates millions of hypothetical poses that never exist.
The problem with IDPs is different: the model may hallucinate a transient pocket with enough geometric plausibility to launch years of chemistry before anyone realizes the state has vanishing occupancy under physiological conditions.
The relevant question therefore isn't:
Is this pocket possible?
It's:
Is this pocket sufficiently populated, sufficiently persistent, and sufficiently druggable to justify intervention?
Those are probabilistic quantities.
Medicinal chemists shouldn't demand certainty—they never have—but they should demand calibrated estimates of occupancy, lifetime, and uncertainty. A model saying "there is a 2% population with wide confidence intervals" should drive a very different investment decision than one implying a stable cryptic pocket.
Is "structure prediction" a category error?
Not entirely.
The phrase becomes misleading if it implies every protein possesses one biologically privileged structure waiting to be discovered.
For much of the disordered proteome, that's simply false.
A better framing might be:
Predicting conformational landscapes under specified biochemical conditions.
That language is less elegant than "structure prediction," but it's scientifically closer to reality.
The field has undoubtedly benefited from structures because they're intuitive, experimentally tangible, visually compelling, and easy to benchmark. Distributions are harder to visualize, harder to validate, and much harder to explain to reviewers and investors.
But biology doesn't owe us legibility.
If IDPs are fundamentally ensemble systems, then the future of protein AI may look less like predicting a folded object and more like learning a stochastic dynamical process. The real advance won't be generating ever prettier structures—it will be building models whose uncertainty, conformational diversity, and functional predictions are as informative as their atomic coordinates. In that sense, the challenge isn't that proteins without fixed structures can't be predicted; it's that the field must redefine what counts as a prediction in the first place.