Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction
Introduction
Proteins have complex three-dimensional shapes that, by eye, often bear striking similarity to one another over their entire lengths or over shorter regions. In parallel to what can be deduced from pure sequence relationships, structural similarities also suggest the possibility of evolutionary relationships between proteins. Indeed, because it is widely accepted that structure is better conserved than sequence (at least given our current ability to detect sequence relationships), the identification of structural relationships between proteins can provide important structural and functional information not available from sequence analysis alone. However, detecting geometric relationships between proteins is a far more uncertain process than the identification of pure sequence relationships, as the latter can be clearly defined in statistical terms. In contrast, there is considerable ambiguity in how to describe a geometric relationship between two proteins, resulting in the large number of approaches to this problem described in the literature.
One effective but qualitative approach is based on manual pattern recognition. Richardson's [1] classical review of structural motifs in proteins was a striking example that has evolved over the years into manually curated structure classification schemes, as epitomized by the SCOP [2] and CATH [3] databases. Implicit in SCOP and CATH is a hierarchical view whereby ‘structure space’ is divided into isolated, non-overlapping ‘islands’ that are denoted by categories such as folds. It is perhaps surprising that the concept of a fold has entered the vocabulary of structural biology in the complete absence of a clear quantitative measure of how such an entity should be described. Implicit in the hierarchical view is that protein structure space is discrete, in the sense that if a particular protein belongs to one category it does not belong to some other category.
Does the use of inherently rigid classification schemes limit our recognition of important relationships that exist between proteins that have been segregated into different categories? In principle, one could consider overlapping classifications, whereby each object is assigned to multiple classes; unfortunately, there are no overlapping classifications of protein structure space. Indeed, there is growing evidence that protein structure space is continuous, in the sense that there are meaningful structural relationships between proteins that are classified very differently. In this review, we discuss these alternative perspectives, and argue that both hierarchical and continuous views have ranges of validity. We suggest that the development of computational tools and algorithms that recognize both descriptions of structure space can enhance our ability to predict protein structure and function.
Section snippets
Protein structure alignment
Structural alignment programs define scoring functions that measure the geometric similarity between proteins and use various algorithms to search for two substructures such that these functions are optimal. Most existing similarity measures can be classified into two main types depending on what they compare: the distances between corresponding pairs of atoms in the two structures (e.g. [4, 5, 6]); and the relative positions of the corresponding atoms of two proteins that have been
The nature of fold space
SCOP [2] and CATH [3] describe fold space in very similar ways. In SCOP's manual classification, the first two levels, ‘class’ and ‘fold’, are defined based purely on structure; the next level, ‘superfamily’, takes into account both structure and function, and the level below accounts for sequence as well, thus grouping proteins with clear evolutionary relationships. CATH combines manual classification with the automatic structural alignment program SSAP [6]: the topmost level, ‘class’, is
Does the description of fold space matter? Applications
The discrete and the continuous views of fold space have different advantages. The hierarchical classifications of proteins into evolutionarily related sequence families and superfamilies can be carried out in a relatively unambiguous fashion, and have the advantage that they are annotated and validated by experts in the field. Also, the sequence neighbors of every protein are well defined. The organization of this information into well-maintained databases is clearly extremely valuable. The
Conclusions
The increasing number of protein structures in the PDB and the availability of many fast programs that compare protein structures reveal many unsuspected similarities in protein structure space. Traditional discrete hierarchical classification schemes group proteins with clear evolutionary relationships. At the structural level, these classifications constitute an abstraction that groups structures into topologies and folds based on similarities that have been detected based, in part, on visual
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
We are grateful to Michael Levitt, Chris Tang, Mickey Kosloff and Burkhard Rost for many helpful discussions on the topics covered in this review. This work was supported in part by the Northeast Structural Genomics Consortium (NESG – GM074958). The thinking reflected in this review has evolved in part as a result of facing the challenges of NESG target selection.
References (43)
The anatomy and taxonomy of protein structure
Adv Protein Chem
(1981)- et al.
Protein structure comparison by alignment of distance matrices
J Mol Biol
(1993) - et al.
Protein structure alignment
J Mol Biol
(1989) - et al.
Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core
Curr Biol
(1993) - et al.
An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance
J Mol Biol
(2000) - et al.
Evaluation of protein fold comparison servers
Proteins
(2004) - et al.
An alternative view of protein fold space
Proteins
(2000) - et al.
Crystal structure of TM1457 from Thermotoga maritima
J Struct Biol
(2005) - et al.
Domain definition and target classification for CASP6
Proteins
(2005) - et al.
Free modeling with Rosetta in CASP6
Proteins
(2005)
TASSER: an automated method for the prediction of protein tertiary structures in CASP6
Proteins
The protein structure prediction problem could be solved using the current PDB library
Proc Natl Acad Sci USA
SCOP database in 2004: refinements integrate structure and sequence family data
Nucleic Acids Res
The CATH domain structure database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis
Nucleic Acids Res
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path
Protein Eng
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions
Acta Crystallogr D Biol Crystallogr
Use of non-crystallographic symmetry in protein structure refinement
Acta Crystallogr D Biol Crystallogr
Threading a database of protein cores
Proteins
Structure comparison and structure patterns
J Comput Biol
Approximate protein structural alignment in polynomial time
Proc Natl Acad Sci USA
Automatic classification of protein structure by using Gauss integrals
Proc Natl Acad Sci USA
Cited by (135)
Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences
2022, Computers in Biology and MedicineSearching protein space for ancient sub-domain segments
2021, Current Opinion in Structural BiologyUnravelling the complexity of signalling networks in cancer: A review of the increasing role for computational modelling
2017, Critical Reviews in Oncology/HematologyCitation Excerpt :Such motifs have 3D structure and shape, but there also is a parameter called “fold space” relating to protein folding that creates shape (Hou et al., 2003). While protein folding exists in conventional 3D space, this term refers to a description of the complement of folds/structural similarities in a protein expressed for example as a 3D map or computational parameter which enables comparisons of potential interactions/activities between different proteins to be made (Kolodny et al., 2006). In a similar way certain networks are termed “3D”, but actually mean a network that includes structural information, not necessarily one depicted in visual 3D (Lewis et al., 2015).
Understand protein functions by comparing the similarity of local structural environments
2017, Biochimica et Biophysica Acta - Proteins and ProteomicsGenome-Wide Analysis of Haemonchus contortus Proteases and Protease Inhibitors Using Advanced Informatics Provides Insights into Parasite Biology and Host–Parasite Interactions
2023, International Journal of Molecular Sciences