Matter
Volume 4, Issue 10, 6 October 2021, Pages 3195-3216
Journal home page for Matter

Article
CryoFold: Determining protein structures and data-guided ensembles from cryo-EM density maps

https://doi.org/10.1016/j.matt.2021.09.004Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Ensemble refinement of cryo-EM density using MD simulations with CryoFold

  • Bayesian inferencing offers multi-model interpretation of intermediates

  • Ensemble models have scored highly in EMDB-organized competitions

Progress and potential

Structural data represent an average over many snapshots of a dynamic molecule. In search of the most representative snapshots, we often forget the many that are not as prominent in the data. However, such rare instances of a molecule are relevant to biological functions and contribute to data uncertainty. Using a scheme of data-driven molecular dynamics simulations, coined CryoFold, we have modeled the ensemble of folded states underlying cryoelectron microscopy (cryo-EM), X-ray, and NMR data of soluble and membrane proteins, and their multimeric complexes. An ensemble model offers, on one hand, the most probable structural snapshot conducive to the best-resolved parts of the data, while, on the other, brings to light the protein dynamics and conformational disorder, underlying fuzzier data segments that are often simplified by the traditional single-model interpretation.

Summary

Cryoelectron microscopy requires molecular modeling for refinement of structures. Ensemble models arrive at low free-energy molecular structures, but are computationally expensive and limited to resolving only small proteins. We introduce CryoFold, a pipeline of molecular dynamics simulations that determines ensembles of protein structures by integrating density data of varying sparsity at 3–5 Å resolution with sequence information and coarse-grained topological knowledge of the protein folds. We present six examples, folding proteins between 72 and 2,000 residues, including large membrane and multi-domain systems, and results from two Electron Microscopy Data Bank (EMDB) competitions. Driven by data from a single state, CryoFold discovers ensembles of common low-energy models together with rare low-probability structures that capture the equilibrium distribution of proteins constrained by the density maps. Many of these conformations are experimentally validated and functionally relevant. We arrive at a set of best practices for data-guided protein folding that are controlled using a Python graphical user interface (GUI).

Keywords

molecular dynamics simulations
cryoelectron microscopy
integrative modeling
protein folding ensemble
computations
ensemble refinement
ATP synthase
CryoEM modeling

Material advancement progression

MAP3: Understanding

Data and code availability

The data and procedures are available in supplemental information. CryoFold source code and GUI are available through https://github.com/SingharoyLab/CryoFold_GUI and https://github.com/SingharoyLab/S2C2_Workshop_2021_MELD-MDFF-NNPs

Cited by (0)

10

These authors contributed equally

11

Lead contact