Principles and characteristics of biological assemblies in experimentally determined protein structures

https://doi.org/10.1016/j.sbi.2019.03.006Get rights and content

Highlights

  • Biological assemblies are the most biologically relevant structures present in experimentally determined protein structures.

  • They can be defined by their stoichiometry, symmetry, and the protein-protein interfaces present in an assembly.

  • In crystal structures, these assemblies must be distinguished from crystal contacts.

  • Benchmarking is challenging since most assembly structures are known from crystals, leading to circular reasoning.

  • Recent successful methods for identifying assemblies depend on comparison of multiple crystal forms across protein families.

More than half of all structures in the PDB are assemblies of two or more proteins, including both homooligomers and heterooligomers. Structural information on these assemblies comes from X-ray crystallography, NMR, and cryo-EM spectroscopy. The correct assembly in an X-ray structure is often ambiguous, and computational methods have been developed to identify the most likely biologically relevant assembly based on physical properties of assemblies and sequence conservation in interfaces. Taking advantage of the large number of structures now available, some of the most recent methods have relied on similarity of interfaces and assemblies across structures of homologous proteins.

Introduction

Proteins function by their interactions with other molecules, including other proteins, nucleic acids, and small molecules. Many and perhaps most proteins function as homooligomers and heterooligomers, containing two or more copies of at least one subunit type. The structures of these protein complexes are referred to as biological assemblies. Structure information on biological assemblies is obtained by using X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other experimental methods. In this review, we discuss the principles and characteristics of biological assemblies. We first define several of these principles and characteristics and then examine recent developments relevant to each of these areas. We discuss recent computational approaches for identifying the correct assembly present in a crystal and how these methods should be benchmarked.

  • 1

    A biological assembly is a functionally relevant complex of proteins and perhaps other molecules. It can be defined structurally by its Cartesian coordinates and by its stoichiometry, molecular interfaces, and symmetry. The biological assembly of an experimental structure is defined as the largest functional form present in the experimental data. This assembly is usually one of the functional forms found in vivo, although in some cases it may not be since experimental conditions or protein constructs may affect the oligomeric state of a protein in experimental structure determination. For cryo-EM, the structure of the biological assembly is obtained more-or-less directly from the experiment [1]. For NMR, the experimental data may provide distance constraints between residues in different subunits and thus provide direct information on contacts needed to build an assembly. For crystal structures, the biological assembly is either the asymmetric unit, a subset of the asymmetric unit, or parts or the whole of multiple asymmetric units from a single unit cell or multiple neighboring unit cells. The biological assembly of a crystal structure is often ambiguous. The PDB contains author annotations of biological assemblies in the form of Cartesian coordinates, which for crystal structures differ from the asymmetric unit (ASU) about 40% of the time. Typically, crystals contain one assembly type, as defined by stoichiometry, interfaces, and symmetry that covers the entire crystal (e.g. a crystal cannot be a lattice of dimers of one type and monomers that do not form the dimer).

  • 2

    Stoichiometries of biological assemblies define the number of each subunit type (designated by Roman letters) in the assembly (usually denoted A, A2, A3, AB, A2B2, etc.) and may be even (same number of each subunit type: AB, A2B2C2, etc.) or uneven (different numbers of different subunit types: A2B, A3B2, etc.). For crystal structures, ordinarily the biological assembly should have at least one copy of every protein entity (sequence) in the asymmetric unit, although some entries are incorrectly annotated and missing some subunit types in the first biological assembly.

  • 3

    Interfaces between pairs of proteins in biological assemblies can be isologous (homodimeric and symmetric) or heterologous (homodimeric or heterodimeric, and asymmetric) or pseudoisologous (heterodimeric but approximately symmetric interfaces of homologous protein domains, as determined by sequence and/or structural alignment). Similar interfaces are typically present across multiple crystal forms of the same or related proteins when all or part of the assemblies are also conserved. Biologically relevant interfaces are typically larger and more conserved in sequence than interfaces induced by crystallization. However, some assemblies held together by large, primary interfaces may also contain some small, non-conserved interfaces that are induced by assembly formation of the primary interfaces (e.g. a cyclic tetramer may have one large interface that occurs four times but a small one along the diagonal).

  • 4

    Symmetry types of biological assemblies observed in the PDB include asymmetric (C1), cyclic (Cn, n ≥ 2), dihedral (Dn, n ≥ 2), icosahedral, helical, tetrahedral, and octahedral. Cn assemblies have a single n-fold axis of symmetry. C2 cycles contain an isologous interface. Proteins in C3 and larger cycles are connected by heterologous interfaces around the cycle. Dn assemblies contain two symmetry axes: they consist of C2 symmetric assemblies of two Cn cycles. The interfaces between the cycles are all isologous. Symmetry can be global – covering the whole assembly, or local – covering only a subset of subunits in the assembly. Symmetry can be exact – involving copies of the same subunits related by translations and/or crystallographic symmetry operators, or approximate – involving different monomers from the asymmetric unit of the crystal. Most assemblies are ‘closed,’ that is not extending indefinitely within a crystal lattice, with the exception of helical or filamentous assemblies found in some structures, such as actin.

  • 5

    Benchmarking of computational methods for determining the biological relevance of interfaces or assemblies should be based on a comparison of the Cartesian coordinates of the predicted interfaces and assemblies and the ‘true’ interfaces or assemblies. However, the most commonly used benchmarks list the stoichiometry and symmetry of the supposedly correct assemblies without defining the coordinates. There are numerous discrepancies between the symmetries and stoichiometries of the ‘true’ assemblies in these benchmarks and the biological assemblies deposited for these entries in the PDB. Even when the symmetry and stoichiometry of a predicted assembly agree with those of the deposited assembly, it is not guaranteed that the deposited assembly is correct (e.g. a crystal can contain more than one C2 homodimer that may be a plausible biological assembly).

  • 6

    Methods for distinguishing biologically relevant interfaces (i.e. present in biological assemblies) from crystal-induced interfaces are based on biophysical properties of the interfaces, sequence conservation of amino acids in the interfaces, and/or structural conservation of the interface across multiple crystal forms of the same or related proteins. An analysis of structural conservation across multiple PDB entries provides more information when crystallographic space groups are used to build all possible interfaces within the crystal, rather than depending only on those in the asymmetric unit or the deposited or predicted biological assemblies.

  • 7

    Methods for identifying the correct biological assembly within crystals are based on many of the same criteria that authors use (physical properties, symmetry, sequence, and structural conservation). They also depend on dividing a crystal into possible assemblies based on the principle of closed assemblies and that a crystal should have only one form of assembly that covers the whole crystal. They can be more successful if they consider all possible assemblies within a crystal, not just those that have been deposited as such in the PDB or predicted by other programs.

  • 8

    There are exceptions to every rule…

We discuss each of these areas in turn.

Section snippets

Biological assemblies

Crystallographic data analysis provides the asymmetric unit (ASU), which can be used to build a model of the crystal lattice via symmetry operations. However, from the earliest days of protein crystallography, it was realized that biologically relevant assemblies may not necessarily coincide with the ASU. After the first structure of myoglobin was determined by Kendrew et al. [2], Perutz et al. determined the structure of hemoglobin, which they observed to be a pseudosymmetric tetramer of

Stoichiometries of biological assemblies

As described above, stoichiometries are usually represented as strings of letters and counts, for example, A4B4 is a heterooctamer of four copies of subunit type A and four copies of subunit type B. If there are more than 26 protein entities (different sequences), then lower case letter is used. If there are more than 52 entities, then the upper-case letters are used again followed by more lower-case letters. Some ribosome structures have over 80 different protein types, and hence we use a-z,

Interfaces in biological assemblies

Monod, Wyman, and Changeux defined the properties of allosteric assemblies, including the types of interfaces found in oligomeric assemblies [13]. They defined two modes of association in terms of residue contacts: “heterologous associations: the domain of bonding is made up of two different binding sets; isologous associations: the domain of bonding involves two identical sets.” Isologous interfaces are exactly C2 symmetric if they come from two copies of the same monomer in the asymmetric

Symmetries of biological assemblies

There are numerous forms of symmetry in oligomeric protein assemblies [33]. The most common ones are cyclic (Cn) with one n-fold axis of symmetry and dihedral (Dn), with two perpendicular axes of symmetry (one Cn and one C2). Non-symmetric assemblies have C1 symmetry by definition. Other forms of symmetry in biological assemblies include tetrahedral, octahedral, and icosahedral (mostly viruses). Assemblies can also be pseudosymmetric, for instance when homologous proteins with similar folds

Benchmarking identification of biologically relevant interfaces and assemblies

Methods have been developed both to distinguish biological from non-biological interfaces within crystals [16,19,28,31••,44, 45, 46,47••,48, 49, 50, 51] and to identify the entire biological assembly within a crystal [6,8,52,53••,54••,55••]. To benchmark these methods properly, the predicted assemblies must be compared to experimentally validated assemblies, not only in stoichiometry and symmetry but also the presence of each interface type. Interfaces and assemblies can be compared either

Methods for identifying biologically relevant interfaces in crystal structures

As described above, many methods have been developed for analyzing which crystal interfaces may be biological and which ones are artifacts of crystallization. We review several of the more recent ones. One important feature of some of the recent work is more rigorous separation of training and testing sets. Many machine learning methods benefit from balanced training data, as we showed for predicting the phenotypes of missense mutations [60]. We also found that balanced accuracy (the average of

Methods for identifying biologically relevant assemblies in crystal structures

There are fewer methods for identifying the likely biological assembly [6,8,52,53••,54••,55••] in a protein crystal than there methods for identifying biologically relevant interfaces. There are several algorithmic features that these methods should implement, although not all do. First, they should build the asymmetric units of the entire unit cell and then neighboring unit cells. If the unit cells are built properly (i.e. each ASU copy in the first unit cell is built by applying a symmetry

There are exceptions to every rule

There are several forms of biological assemblies that are not necessarily well handled by the methods described above, which have implemented rules that define the valid assemblies from crystals. These rules probably prevent more incorrect predictions than they forbid correct predictions that violate the rules. But nevertheless, the violations are often very interesting, and in some cases these are unexpected or undetected by authors. These situations include complexes that have uneven

Conclusions

The several methods that depend on comparing assemblies and interfaces in different crystals of homologous proteins will only improve as the number of structures in the PDB increases. Methods that depend on sequence conservation will also improve as the sequence databases increase in size, especially with the addition of more eukaryotic genomes. The rapid rise of cryo-electron microscopy and the vastly improved resolution of these structures provide new opportunities in the annotation of

Funding

This work was supported by the National Institutes of Health [R35 GM122517].

Conflict of interest statement

Nothing declared.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • •• of outstanding interest

References (75)

  • M.A. Schärer et al.

    CRK: an evolutionary approach for distinguishing biologically relevant interfaces from crystal contacts

    Proteins: Struct Funct Bioinform

    (2010)
  • Q. Xu et al.

    Identifying three-dimensional structures of autophosphorylation complexes in crystals of protein kinases

    Sci Signal

    (2015)
  • D.J. Scott et al.

    A novel ultra-stable, monomeric green fluorescent protein for direct volumetric imaging of whole organs using CLARITY

    Sci Rep

    (2018)
  • K. Baskaran et al.

    A PDB-wide, evolution-based assessment of protein-protein interfaces

    BMC Struct Biol

    (2014)
  • Q. Liu et al.

    Use B-factor related features for accurate classification between protein binding interfaces and crystal packing contacts

    BMC Bioinform

    (2014)
  • F.K. Schur et al.

    An atomic model of HIV-1 capsid-SP1 reveals structures regulating assembly and maturation

    Science

    (2016)
  • M.V. Shapovalov et al.

    Bioassemblymodeler (BAM): user-friendly homology modeling of protein homo-and heterooligomers

    PLoS One

    (2014)
  • M.F. Lensink et al.

    The challenge of modeling protein assemblies: the CASP12-CAPRI experiment

    Proteins: Struct Funct Bioinform

    (2018)
  • J.C. Kendrew et al.

    A three-dimensional model of the myoglobin molecule obtained by x-ray analysis

    Nature

    (1958)
  • M.F. Perutz et al.

    Structure of hæmoglobin: a three-dimensional Fourier synthesis at 5.5-Å. resolution, obtained by X-ray analysis

    Nature

    (1960)
  • H. Ponstingl et al.

    Automatic inference of protein quaternary structure from crystals

    J Appl Cryst

    (2003)
  • B. Weitzner et al.

    An unusually small dimer interface is observed in all available crystal structures of cytosolic sulfotransferases

    Proteins

    (2009)
  • C. Ambrogio et al.

    KRAS dimerization impacts MEK inhibitor sensitivity and oncogenic activity of mutant KRAS

    Cell

    (2018)
  • J.A. Marsh et al.

    Structural and evolutionary versatility in protein complexes with uneven stoichiometry

    Nat Commun

    (2015)
  • J. Monod et al.

    On the nature of allosteric transitions: a plausible model

    J Mol Biol

    (1965)
  • R.P. Bahadur et al.

    Dissecting subunit interfaces in homodimeric proteins

    Proteins: Struct Funct Genet

    (2003)
  • L. Lo Conte et al.

    The atomic structure of protein-protein recognition sites1

    J Mol Biol

    (1999)
  • W.S. Valdar et al.

    Conservation helps to identify biologically relevant crystal contacts1

    J Mol Biol

    (2001)
  • A.H. Elcock et al.

    Identification of protein oligomerization states by analysis of interface conservation

    Proc Natl Acad Sci U S A

    (2001)
  • M. Guharoy et al.

    Conservation and relative importance of residues across protein-protein interfaces

    Proc Natl Acad Sci U S A

    (2005)
  • Y.S. Choi et al.

    Evolutionary conservation in multiple faces of protein interaction

    Proteins: Struct Funct Bioinform

    (2009)
  • B. Ma et al.

    Protein–protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces

    Proc Natl Acad Sci U S A

    (2003)
  • A.J. Bordner et al.

    Statistical analysis and prediction of protein–protein interfaces

    Proteins: Struct Funct Bioinform

    (2005)
  • R.P. Saha et al.

    Interresidue contacts in proteins and protein−protein interfaces and their use in characterizing the homodimeric interface

    J Proteome Res

    (2005)
  • I. Halperin et al.

    Protein-protein interactions: coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking

    Structure

    (2004)
  • J.M. Duarte et al.

    Protein interface classification by evolutionary analysis

    BMC Bioinform

    (2012)
  • D.W. Heinz et al.

    Rapid crystallization of T4 lysozyme by intermolecular disulfide cross-linking

    Protein Eng Des Sel

    (1994)
  • Cited by (0)

    View full text