Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN

Abstract

Single-particle cryogenic electron microscopy (cryo-EM) has emerged as a powerful technique to visualize the structural landscape sampled by a protein complex. However, algorithmic and computational bottlenecks in analyzing heterogeneous cryo-EM datasets have prevented the full realization of this potential. CryoDRGN is a machine learning system for heterogeneous cryo-EM reconstruction of proteins and protein complexes from single-particle cryo-EM data. Central to this approach is a deep generative model for heterogeneous cryo-EM density maps, which we empirically find is effective in modeling both discrete and continuous forms of structural variability. Once trained, cryoDRGN is capable of generating an arbitrary number of 3D density maps, and thus interpreting the resulting ensemble is a challenge. Here, we showcase interactive and automated processing approaches for analyzing cryoDRGN results. Specifically, we detail a step-by-step protocol for the analysis of an existing assembling 50S ribosome dataset, including preparation of inputs, network training and visualization of the resulting ensemble of density maps. Additionally, we describe and implement methods to comprehensively analyze and interpret the distribution of volumes with the assistance of an associated atomic model. This protocol is appropriate for structural biologists familiar with processing single-particle cryo-EM datasets and with moderate experience navigating Python and Jupyter notebooks. It requires 3–4 days to complete. CryoDRGN is open source software that is freely available.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The cryoDRGN workflow.
Fig. 2: Training and assessing convergence of cryoDRGN networks.
Fig. 3: Particle filtering.
Fig. 4: Analysis of a cryoDRGN model trained on high-resolution particle images.
Fig. 5: Atomic model-based analysis of cryoDRGN-generated structural ensemble.

Similar content being viewed by others

Data availability

All final and intermediate results presented in this protocol are available at https://doi.org/10.5281/zenodo.5164127.

Code availability

The software and scripts used in these analyses are available at https://github.com/zhonge/cryodrgn (version 0.3.5) and https://github.com/lkinman/occupancy-analysis (version 0.1.2), as described in Materials. Updates to cryoDRGN will be posted at cryodrgn.csail.mit.edu. All code is available through the open source GPL-3.0 License.

References

  1. Lyumkis, D. Challenges and opportunities in cryo-EM single-particle analysis. J. Biol. Chem. 294, 5181–5197 (2019).

    Article  CAS  Google Scholar 

  2. Wu, M. & Lander, G. C. Present and emerging methodologies in cryo-EM single-particle analysis. Biophys. J. 119, 1281–1289 (2020).

    Article  CAS  Google Scholar 

  3. Serna, M. Hands on methods for high resolution cryo-electron microscopy structures of heterogeneous macromolecular complexes. Front. Mol. Biosci. 6, 33 (2019).

    Article  CAS  Google Scholar 

  4. Dashti, A. et al. Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11, 4734 (2020).

    Article  CAS  Google Scholar 

  5. Dashti, A. et al. Trajectories of the ribosome as a Brownian nanomachine. Proc. Natl Acad. Sci. USA 111, 17492–17497 (2014).

    Article  CAS  Google Scholar 

  6. Haselbach, D. et al. Long-range allosteric regulation of the human 26S proteasome by 20S proteasome-targeting cancer drugs. Nat. Commun. 8, 15578 (2017).

    Article  CAS  Google Scholar 

  7. Gui, M. et al. Structures of radial spokes and associated complexes important for ciliary motility. Nat. Struct. Mol. Biol. 28, 29–37 (2021).

    Article  CAS  Google Scholar 

  8. Zhong, E., Bepler, T., Berger, B. & Davis, J. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 8, 176–185 (2021).

  9. Punjani, A. & Fleet, D. J. 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213, 107702 (2021).

    Article  CAS  Google Scholar 

  10. Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife https://doi.org/10.7554/eLife.42166 (2018).

  11. Grant, T., Rohou, A. & Grigorieff, N. cisTEM, user-friendly software for single-particle image processing. eLife https://doi.org/10.7554/eLife.35383 (2018).

  12. Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).

    Article  CAS  Google Scholar 

  13. Nakane, T., Kimanius, D., Lindahl, E. & Scheres, S. H. Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in RELION. eLife https://doi.org/10.7554/eLife.36861 (2018).

  14. Kingma, D. & Welling, M. Auto-encoding variational Bayes. 2nd International Conference on Learning Representations (2013).

  15. Zhong, E.D., Bepler, T., Davis, J.H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Eighth International Conference on Learning Representations (2020).

  16. Davis, J. H. et al. Modular assembly of the bacterial large ribosomal subunit. Cell 167, 1610–1622 e1615 (2016).

    Article  CAS  Google Scholar 

  17. Rabuck-Gibbons, J. N., Lyumkis, D. & Williamson, J. R. Quantitative mining of compositional heterogeneity in cryo-EM datasets of ribosome assembly intermediates. Structure https://doi.org/10.1016/j.str.2021.12.005 (2022).

  18. von Loeffelholz, O. et al. Focused classification and refinement in high-resolution cryo-EM structural analysis of ribosome complexes. Curr. Opin. Struct. Biol. 46, 140–148 (2017).

    Article  Google Scholar 

  19. Zhong, E.D., Lerer A., Davis J.H. & Berger B. CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images. IEEE/CVF International Conference on Computer Vision (2021).

  20. Punjani, A. & Fleet, D. J. 3D flexible refinement: structure and motion of flexible proteins from cryo-em. Preprint at bioRxiv https://doi.org/10.1101/2021.04.22.440893 (2021).

  21. Ludtke, S. & Chen, M. Deep learning based mixed-dimensional GMM for characterizing variability in CryoEM. Nat. Methods 18, 930–936 (2021).

  22. Zhong, E. D., Lerer, A., Davis, J. H. & Berger, B. Exploring generative atomic models in cryo-EM reconstruction. Preprint at Arxiv https://arxiv.org/abs/2107.01331v1 (2021).

  23. Rosenbaum, D. et al. Inferring a continuous distribution of atom coordinates from cryo-EM images using VAEs. Preprint at Arxiv https://arxiv.org/abs/2106.14108v1 (2021).

  24. Sekne, Z., Ghanim, G. E., van Roon, A. M. & Nguyen, T. H. D. Structural basis of human telomerase recruitment by TPP1-POT1. Science 375, 1173–1176 (2022).

    Article  CAS  Google Scholar 

  25. Chaaban, S. & Carter, A. P. Structure of dynein-dynactin on microtubules shows tandem recruitment of cargo adaptors. Preprint at bioRxiv https://doi.org/10.1101/2022.03.17.482250 (2022).

  26. Schoppe, J. et al. Flexible open conformation of the AP-3 complex explains its role in cargo recruitment at the Golgi. J. Biol. Chem. 297, 101334 (2021).

    Article  CAS  Google Scholar 

  27. Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).

    Article  CAS  Google Scholar 

  28. Zivanov, J., Nakane, T. & Scheres, S. H. W. A Bayesian approach to beam-induced motion correction in cryo-EM single-particle analysis. IUCrJ 6, 5–17 (2019).

    Article  CAS  Google Scholar 

  29. Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).

    Article  CAS  Google Scholar 

  30. Cheng, Y., Grigorieff, N., Penczek, P. A. & Walz, T. A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015).

    Article  CAS  Google Scholar 

  31. McInnes, L., Healy, J., Saul, N. & Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).

  32. Davis, J. H. & Williamson, J. R. Structure and dynamics of bacterial ribosome biogenesis. Philos. Trans. Soc. B https://doi.org/10.1098/rstb.2016.0181 (2017).

  33. Trabuco, L. G., Villa, E., Schreiner, E., Harrison, C. B. & Schulten, K. Molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and X-ray crystallography. Methods 49, 174–180 (2009).

    Article  CAS  Google Scholar 

  34. Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank the MIT-IBM Satori team for GPU computing resources and support. This work was funded by the NSF GRFP Fellowship to E.D.Z., NIH grant R35-GM141861 to B.B., NSFCAREER-2046778 and NIH grant R01-GM144542 to J.H.D. and a grant from the MIT J-Clinic for Machine Learning and Health to J.H.D. and B.B. Research in the Davis lab is supported by the Alfred P. Sloan Foundation, the James H. Ferry Fund and the Whitehead Family.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: all authors. Funding acquisition: B.B. and J.H.D. Investigation: J.H.D., L.K., B.M.P. and E.D.Z. Software: L.K., B.M.P. and EDZ. Supervision: B.B. and J.H.D. Visualization: L.K. and B.M.P. Writing—original draft: L.K., B.M.P. and E.D.Z. Writing—review and editing: all authors.

Corresponding authors

Correspondence to Ellen D. Zhong, Bonnie Berger or Joseph H. Davis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks Daniel Edelberg, Dong Si and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Zhong, E. et al. Nat. Methods 18, 176–185 (2021): https://doi.org/10.1038/s41592-020-01049-4

Gui, M. et al. Nat. Struct. Mol. Biol. 28, 29–37 (2021): https://doi.org/10.1038/s41594-020-00530-0

Schoppe, J. et al. J. Biol. Chem. 297, 101334 (2021): https://doi.org/10.1016/j.jbc.2021.101334

Key data used in this protocol

Davis, J. H. et al. Cell 167, 1610–1622.e1615 (2016): https://doi.org/10.1016/j.cell.2016.11.020

Extended data

Extended Data Fig. 1 Assessing cryoDRGN input parsing.

Comparison of 10,000 cryoDRGN-parsed particles back-projected at D = 128 px (left) with the unsharpened map from cryoSPARC’s homogeneous refinement (right).

Extended Data Fig. 2 Assessing convergence of representative cryoDRGN density maps during network training.

a, Particle sets of interest A–J identified in epoch 49 by the ‘UMAP local maximum’ method are mapped to prior epochs’ UMAP embeddings. The on-data median latent value of each particle set is embedded into UMAP space and annotated for each epoch. Note that each annotated point maps to the same high-occupancy region of UMAP space following convergence. b, Corresponding volumes generated from each on-data median latent value at five epoch intervals as shown in a. Note that the volumes’ gross morphology stabilizes by epochs 14–19, though some additional details in maxima I and J require 24–29 epochs of training. c, FSC plots correlating each local maximum volume at epochj and at epochj-5.

Extended Data Fig. 3 Visualizing particle filtering.

a, Representative particles filtered by ind_keep.star, selected for further training, and corresponding 2D classification using default cryoSPARC parameters. b, Representative particles filtered by ind_bad.star excluded from further training, and corresponding 2D classification using default cryoSPARC parameters. c, Three-way Venn diagram of ‘junk’ particles identified by one of the following methods: two classes from k = 6 Gaussian mixture model latent-space classification (red, 35,421 particles); nine classes from k = 20 k-means latent-space classification (green, 29,080 particles); or latent encoding magnitude (z-norm) exceeding 0.5 standard deviations larger than the mean (blue, 30,879 particles). d, Corresponding CryoSPARC 2D-classification results using ‘junk’ particles identified through the GMM (top), k-means (middle) or z-norm (bottom) filtering approaches. e,f, UMAP embedding (e) or PCA projections of latent space (f) highlighting location of junk particles identified by GMM (red), k-means (green) or z-norm (blue) methods.

Extended Data Fig. 4 Training and assessing convergence of high-resolution training.

a, Representative plot of average total loss at each epoch. b, Median per-particle movement through latent space, characterized by vectors connecting each particle’s latent embedding in successive epochs. Resulting vector dot products (left), magnitude (center) and cosine distance (right) are shown. c, Identification of representative latent embeddings via the ‘UMAP local maxima method’. The UMAP embedding of epoch 99 is binned into a 2D histogram, smoothed, annotated with local maxima and overlaid with the maxima. The on-data median UMAP location of each maximum and its neighboring eight bins is shown. Label order corresponds to decreasing particle count in each local maximum. d,e, Map–map correlation (d) and FSC (e) at Nyquist frequency calculated between representative volumes generated as defined in c at five epoch intervals. Epochs for which the encoder network has not converged are noted with dotted lines.

Extended Data Fig. 5 Assessing convergence of representative cryoDRGN density maps during high-resolution training.

a, Particle sets A–J identified by the ‘UMAP local maximum’ method (Box 1) mapped to prior epochs as illustrated in Extended Data Fig. 2. b, Corresponding volumes generated from labeled positions in a. Note that the volumes’ gross morphology stabilizes by epochs 19–29, though maximum I stabilizes as a 70S ribosome around epoch 39. c, FSC plots between volumes from each local maximum offset by five epochs of training, as in Extended Data Fig. 2. The map-to-map FSC stabilizes by epoch 39.

Extended Data Fig. 6 Assessing results of high-resolution training.

a, The UMAP representation of the latent space resulting from 50 epochs of high-resolution training, colored by indicated imaging parameters. b, Angular and translational pose distributions. c, PCA of the latent space, colored by the 20 k-means cluster centers automatically generated by cryodrgn analyze. Numbered black dots indicate the locations in latent space of each k-means cluster center volume.

Extended Data Fig. 7 Sampled points from latent space used in subunit occupancy analysis.

UMAP representation of the latent space resulting from 50 epochs of high-resolution training, with contours colored with darker blues as particle density increases. Sampled points correspond to the centers of 500 k-means clusters and are indicated with white circles.

Extended Data Fig. 8 Confusion matrix of published class labels and classes assigned by subunit occupancy analysis.

k-Means 500 cluster center maps were assigned to 15 classes by subunit occupancy analysis. Particles within a given k-means 500 cluster are assigned to the same subunit occupancy class as the center map. Published particle labels were drawn from ref. 16, and the fractional correspondence is plotted as a heatmap. Note that published classes A and F corresponded to 70S and 30S particles, respectively.

Extended Data Fig. 9 Graph traversal through latent space for the B→D1→D2→D3→D4→E3→E5 assembly pathway.

Centroid volumes from the subunit occupancy classes were aligned and compared with the assembly intermediate structures identified in ref. 16 to determine approximate equivalences between published classes and subunit occupancy classes. The volumes corresponding to intermediates B, D1, D2, D3, D4, E3 and E5 were provided to cryodrgn graph_traversal as anchor points; the resulting path through latent space is shown. Non-anchor points are indicated with white circles, whereas anchor points and their corresponding class ID are shown with colored circles. Volumes resulting from the complete graph traversal are shown in Supplementary Video 3.

Extended Data Fig. 10 Selection of particles corresponding to the C4 minor class.

Particles (1,149) in the C4 class were identified by subunit occupancy analysis and are highlighted in orange.

Supplementary information

Supplementary Information

Supplementary Protocols 1–6 and Supplementary Tables 1 and 2.

Supplementary Video 1

PC1 trajectory from high resolution training. Density maps sampled along PC1 were automatically generated by the cryodrgn analyze command. Volumes are displayed at the same isosurface level, and generated from the 5th to 95th PC1 value along the PC1 axis.

Supplementary Video 2

PC2 trajectory from high-resolution training. Density maps sampled along PC2 were automatically generated by the cryodrgn analyze command. Volumes are displayed at the same isosurface level, and generated from the 5th to 95th PC2 value along the PC2 axis.

Supplementary Video 3

Graph traversal showing the B→D1→D2→D3→D4→E3→E5 assembly pathway. Graph traversal pathway was generated using the cryodrgn graph_traversal command as described in the protocol. The path taken by the traversal through latent space is shown in Extended Data Figure 9. All volumes are displayed at the same isosurface level.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kinman, L.F., Powell, B.M., Zhong, E.D. et al. Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN. Nat Protoc 18, 319–339 (2023). https://doi.org/10.1038/s41596-022-00763-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-022-00763-x

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics