Abstract
The structural genomics initiatives significantly increased the numbers of three-dimensional structures available for proteins of unknown function. However, the extent to which structural information helps understanding function is still a matter of debate. Here, the value of detecting structural relationships at different levels (typically, fold and superfamily ) for transferring functional annotations between proteins is reviewed. First, function diversity of proteins sharing the same fold is investigated, and it is shown that although the identification of a fold can in some cases provide clues on functional properties, the diversity of functions within a fold can be such that this information is very limited for some particularly diverse folds (e.g. super-folds). Next, since structural data can help detecting homology in the absence of sequence similarity, function diversity between proteins from the same superfamily (homologous proteins) is analysed. The evolutionary causes and the mechanisms that have generated the observed functional diversity between related proteins are discussed, and helpful tools for the correlated analysis of structure, function and evolution are reviewed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Bibliography
Adams MA, Suits MDL, Zheng J, Jia Z (2007) Piecing together the structure-function puzzle: experiences in structure-based functional annotation of hypothetical proteins. Proteomics 7:2920–2932. doi:10.1002/pmic.200700099
Addou S, Rentzsch R, Lee D, Orengo CA (2009) Domain-based and family-specific sequence identity thresholds increase the levels of reliable protein function transfer. J Mol Biol 387:416–430. doi:10.1016/j.jmb.2008.12.045
Akiva E, Brown S, Almonacid DE et al (2014) The structure-function linkage database. Nucleic Acids Res 42:D521–D530. doi:10.1093/nar/gkt1130
Andreeva A, Murzin AG (2006) Evolution of protein fold in the presence of functional constraints. Curr Opin Struct Biol 16:399–408. doi:10.1016/j.sbi.2006.04.003
Andreeva A, Howorth D, Chandonia JM et al (2007) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419–D425. doi:10.1093/nar/gkm993
Andreeva A, Howorth D, Chothia C et al (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42:D310–D314. doi:10.1093/nar/gkt1242
Andreeva A, Howorth D, Chothia C et al (2015) Investigating protein structure and evolution with SCOP2. Curr Protoc Bioinform 49:1.26.1–1.26.21. doi:10.1002/0471250953.bi0126s49
Aravind L, Anantharaman V, Koonin EV (2002) Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA. Proteins 48:1–14. doi:10.1002/prot.10064
Ashburner M, Ball CAA, Blake JAA et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29. doi:10.1038/75556
Baier F, Tokuriki N (2014) Connectivity between catalytic landscapes of the Metallo-β-Lactamase superfamily. J Mol Biol 426:2442–2456. doi:10.1016/j.jmb.2014.04.013
Baier F, Chen J, Solomonson M et al (2015) Distinct metal isoforms underlie promiscuous activity profiles of metalloenzymes
Bashton M, Chothia C (2007) The generation of new protein functions by the combination of domains. Structure 15:85–99. doi:10.1016/j.str.2006.11.009
Bashton M, Nobeli I, Thornton JM (2006) Cognate ligand domain mapping for enzymes. J Mol Biol 364:836–852. doi:10.1016/j.jmb.2006.09.041
Bashton M, Nobeli I, Thornton JM (2008) PROCOGNATE: a cognate ligand domain mapping for enzymes. Nucleic Acids Res 36:D618–D622. doi:10.1093/nar/gkm611
Brudler R, Hitomi K, Daiyasu H et al (2003) Identification of a new cryptochrome class. Structure, function, and evolution. Mol Cell 11:59–67
Burroughs AM, Allen KN, Dunaway-Mariano D, Aravind L (2006) Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. J Mol Biol 361:1003–1034. doi:10.1016/j.jmb.2006.06.049
Caspi R, Altman T, Billington R et al (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 42:D459–D471. doi:10.1093/nar/gkt1103
Cheng H, Schaeffer RD, Liao Y et al (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10:e1003926. doi:10.1371/journal.pcbi.1003926
Chothia C, Gough J (2009) Genomic and structural aspects of protein evolution. Biochem J 419:15–28. doi:10.1042/BJ20090122
Colovos C, Cascio D, Yeates TO (1998) The 1.8 A crystal structure of the ycaC gene product from Escherichia coli reveals an octameric hydrolase of unknown specificity. Structure 6:1329–1337
Croft D, Mundo AFF, Haw R et al (2014) The Reactome pathway knowledgebase. Nucleic Acids Res 42:D472–D477. doi:10.1093/nar/gkt1102
Cuff A, Redfern OC, Greene L et al (2009) The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space. Structure 17:1051–1062. doi:10.1016/j.str.2009.06.015
Das S, Lee D, Sillitoe I et al (2015) Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics btv398:1–8. doi:10.1093/bioinformatics/btv398
Dessailly BH, Lensink MF, Orengo CA, Wodak SJ (2008) LigASite—a database of biologically relevant binding sites in proteins with known apo-structures. Nucleic Acids Res. doi:10.1093/nar/gkm839
Devos D, Valencia A (2000) Practical limits of function prediction. Proteins Struct Funct Genet 107:98–107
Devos D, Valencia A (2001) Intrinsic errors in genome annotation. Trends Genet 17:429–431
Dolinski K, Botstein D (2007) Orthology and functional conservation in eukaryotes. Annu Rev Genet 41:465–507. doi:10.1146/annurev.genet.40.110405.090439
Favia AD, Nobeli I, Glaser F, Thornton JM (2008) Molecular docking for substrate identification: the short-chain dehydrogenases/reductases. J Mol Biol 375:855–874. doi:10.1016/j.jmb.2007.10.065
Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230. doi:10.1093/nar/gkt1223
Fox NK, Brenner SE, Chandonia J-MM (2014) SCOPe: structural classification of proteins–extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42:D304–D309. doi:10.1093/nar/gkt1240
Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152. doi:10.1093/bioinformatics/bts565
Furnham N, Sillitoe I, Holliday GL et al (2012a) FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucleic Acids Res 40:D776–D782. doi:10.1093/nar/gkr852
Furnham N, Sillitoe I, Holliday GL et al (2012b) Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput Biol 8:e1002403 +. doi:10.1371/journal.pcbi.1002403
Furnham N, Holliday GL, de Beer TAP et al (2014) The catalytic site atlas 2.0: cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Res 42:D485–D489. doi:10.1093/nar/gkt1243
Furnham N, Dawson NL, Rahman SA et al (2015) Large-scale analysis exploring evolution of catalytic machineries and mechanisms in enzyme superfamilies. J Mol Biol. doi:10.1016/j.jmb.2015.11.010
Furukawa H, Singh SK, Mancusso R, Gouaux E (2005) Subunit arrangement and function in NMDA receptors. Nature 438:185–192
Gerlt JA, Babbitt PC (2001) Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. Annu Rev Biochem 70:209–246. doi:10.1146/annurev.biochem.70.1.209
Glasner M, Gerlt J, Babbitt P (2006) Evolution of enzyme superfamilies. Curr Opin Chem Biol 10:492–497. doi:10.1016/j.cbpa.2006.08.012
Goldstein RA (2008) The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol 18:170–177. doi:10.1016/j.sbi.2008.01.006
Greene LH, Lewis TE, Addou S et al (2007) The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35:D291–D297. doi:10.1093/nar/gkl959
Grishin NV (2001) Fold change in evolution of protein structures. J Struct Biol 134:167–185
Harrison PM, Gerstein M (2002) Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 318:1155–1174
Harrison A, Pearl F, Mott R et al (2002) Quantifying the similarities within fold space. J Mol Biol. doi:10.1016/S0022-2836(02)00992-0
Hegyi H, Gerstein M (2001) Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. Genome Res 11:1632–1640. doi:10.1101/gr.183801
Hernández S, Ferragut G, Amela I et al (2014) MultitaskProtDB: a database of multitasking proteins. Nucleic Acids Res 42:D517–D520. doi:10.1093/nar/gkt1153
Holliday GL, Andreini C, Fischer JD et al (2011) MACiE: exploring the diversity of biochemical reactions. Nucleic Acids Res 40:gkr799–D789. doi:10.1093/nar/gkr799
Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233:123–138. doi:10.1006/jmbi.1993.1489
Holm L, Sander C (1996a) The FSSP database: fold classification based on structure-structure alignment of proteins. Nucleic Acids Res 24:206–209
Holm L, Sander C (1996b) Mapping the protein universe. Science 273:595–603
Horowitz NH (1945) On the evolution of biochemical syntheses. Proc Natl Acad Sci USA 31:153–157
Jeffery CJ (1999) Moonlighting proteins. Tr Bioch Sci 24:8–11
Jeffery CJ (2004) Moonlighting proteins: complications and implications for proteomics research. Drug Discov Today TARGETS 3:71–78. doi:10.1016/S1741-8372(04)02405-3
Jiang H, Blouin C (2007) Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions. BMC Bioinform 8:444. doi:10.1186/1471-2105-8-444
Kanehisa M, Goto S, Sato Y et al (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205. doi:10.1093/nar/gkt1076
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. doi:10.1093/molbev/mst010
Khersonsky O, Tawfik DS (2010) Enzyme promiscuity: a mechanistic and evolutionary perspective. Annu Rev Biochem 79:471–505
Khersonsky O, Roodveldt C, Tawfik D (2006) Enzyme promiscuity: evolutionary and mechanistic aspects. Curr Opin Chem Biol 10:498–508. doi:10.1016/j.cbpa.2006.08.011
Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 346:1173–1188. doi:10.1016/j.jmb.2004.12.032
Kolodny R, Petrey D, Honig B (2006) Protein structure comparison: implications for the nature of “fold space”, and structure and function prediction. Curr Opin Struct Biol 16:393–398. doi:10.1016/j.sbi.2006.04.007
Kraulis PJ (1991) MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J Appl Crystallogr 24:946–950
Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 60:2256–2268. doi:10.1107/S0907444904026460
Lee D, Grant A, Marsden RL, Orengo C (2005) Identification and distribution of protein families in 120 completed genomes using Gene3D. Proteins Struct Funct Bioinforma. doi:10.1002/prot.20409
Lee D, Redfern O, Orengo C (2007) Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 8:995–1005. doi:10.1038/nrm2281
Lee DA, Rentzsch R, Orengo C (2010) GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains. Nucleic Acids Res 38:720–737. doi:10.1093/nar/gkp1049
Lees JG, Lee D, Studer RA et al (2014) Gene3D: multi-domain annotations for protein sequence and comparative genome analysis. Nucleic Acids Res 42:D240–D245. doi:10.1093/nar/gkt1205
Lopez G, Maietta P, Rodriguez JM et al (2011) Firestar–advances in the prediction of functionally important residues. Nucleic Acids Res 39:W235–W241. doi:10.1093/nar/gkr437
Madera M (2008) Profile comparer: a program for scoring and aligning profile hidden Markov models. Bioinformatics 24:2630–2631
Mani M, Chen C, Amblee V et al (2014) MoonProt: a database for proteins that are known to moonlight. Nucleic Acids Res gku954
Marsden RL, Ranea JAG, Sillero A et al (2006) Exploiting protein structure data to explore the evolution of protein function and biological complexity. Philos Trans R Soc B Biol Sci. doi:10.1098/rstb.2005.1801
Martin AC, Orengo CA, Hutchinson EG et al (1998) Protein folds and functions. Structure 6:875–884
Merritt EA, Bacon DJ (1997) [26] Raster3D: photorealistic molecular graphics. Methods Enzymol 277:505–524
Moult J, Melamud E (2000) From fold to function. Curr Opin Struct Biol 10:384–389
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540. doi:10.1016/S0022-2836(05)80134-2
Nagano N (2005) EzCatDB: the enzyme catalytic-mechanism database. Nucleic Acids Res 33:D407–D412. doi:10.1093/nar/gki080
Nagano N, Orengo CA, Thornton JM (2002) One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J Mol Biol 321:741–765
Nomenclature Committee of the IUBMB (1992) Enzyme nomenclature: recommendations of the nomenclature committee of the international union of biochemistry and molecular biology. Academic Press, San Diego, California
O’Boyle NM, Holliday GL, Almonacid DE, Mitchell JBO (2007) Using reaction mechanism to measure enzyme similarity. J Mol Biol 368:1484–1499. doi:10.1016/j.jmb.2007.02.065
Oates ME, Stahlhacke J, Vavoulis DV et al (2015) The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res 43:D227–D233. doi:10.1093/nar/gku1041
Ojha S, Meng EC, Babbitt PC (2007) Evolution of function in the “two dinucleotide binding domains” flavoproteins. PLoS Comput Biol 3:e121 +. doi:10.1371/journal.pcbi.0030121
Orengo CA, Taylor WR (1996) SSAP: sequential structure alignment program for protein structure comparison. In: Russell FD (ed) Methods in enzymology. Academic Press, Cambridge
Orengo CA, Jones DT, Thornton JM (1994) Protein domain superfolds and superfamilies
Orengo CA (1999) CORA—topological fingerprints for protein structural families. Protein Sci 8:699–715
Orengo CA, Michie AD, Jones S et al (1997) CATH—a hierarchic classification of protein domain structures. Structure 5:1093–1108
Pandya C, Farelli JD, Dunaway-Mariano D, Allen KN (2014) Enzyme promiscuity: engine of evolutionary innovation. J Biol Chem 289:30229–30236. doi:10.1074/jbc.R114.572990
Pethica RB, Levitt M, Gough J (2012) Evolutionarily consistent families in SCOP: sequence, structure and function. BMC Struct Biol 12:27. doi:10.1186/1472-6807-12-27
Piatigorsky J, Kantorow M, Gopal-Srivastava R, Tomarev SI (1994) Recruitment of enzymes and stress proteins as lens crystallins. EXS 71:241–250
Porter CT, Bartlett GJ, Thornton JM (2004) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133. doi:10.1093/nar/gkh028
Radivojac P, Clark WT, Oron TR et al (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221–227. doi:10.1038/nmeth.2340
Rahman SA, Cuesta SM, Furnham N et al (2014) EC-BLAST: a tool to automatically search and compare enzyme reactions. Nat Methods 11:171–174. doi:10.1038/nmeth.2803
Rausell A, Juan D, Pazos F, Valencia A (2010) Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc Natl Acad Sci 107:1995–2000. doi:10.1073/pnas.0908044107
Redfern OC, Harrison A, Dallman T et al (2007) CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol 3:e232 +. doi:10.1371/journal.pcbi.0030232
Reeves G, Dallman T, Redfern O et al (2006) Structural diversity of domain superfamilies in the CATH database. J Mol Biol 360:725–741. doi:10.1016/j.jmb.2006.05.035
Reid AJ, Yeats C, Orengo CA (2007) Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone. Bioinformatics 23:2353–2360. doi:10.1093/bioinformatics/btm355
Rison SCG, Thornton JM (2002) Pathway evolution, structurally speaking. Curr Opin Struct Biol 12:374–382. doi:10.1016/s0959-440x(02)00331-7
Rost B (2002) Enzyme function less conserved than anticipated. J Mol Biol 318:595–608
Ruepp A, Zollner A, Maier D et al (2004) The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 32:5539–5545. doi:10.1093/nar/gkh894
Russell RB, Saqi MA, Sayle RA et al (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mol Biol 269:423–439. doi:10.1006/jmbi.1997.1019
Russell RB, Sasieni PD, Sternberg MJ (1998) Supersites within superfolds. Binding site similarity in the absence of homology. J Mol Biol 282:903–918. doi:10.1006/jmbi.1998.2043
Sadreyev R, Grishin N (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 326:317–336
Sangar V, Blankenberg DJ, Altman N, Lesk AM (2007) Quantitative sequence-function relationships in proteins based on gene ontology. BMC Bioinform 8:294. doi:10.1186/1471-2105-8-294
Shakhnovich BE, Koonin EV (2006) Origins and impact of constraints in evolution of gene families. Genome Res 16:1529–1536. doi:10.1101/gr.5346206
Shindyalov IN, Bourne PE (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11:739–747. doi:10.1093/protein/11.9.739
Sillitoe I, Lewis TE, Cuff A et al (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43:D376–D381. doi:10.1093/nar/gku947
Söding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21:951–960. doi:10.1093/bioinformatics/bti125
Takahashi H, Inagaki E, Kuroishi C, Tahirov TH (2004) Structure of the Thermus thermophilus putative periplasmic glutamate/glutamine-binding protein. Acta Crystallogr Sect D Biol Crystallogr 60:1846–1854
Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637
The UniProt Consortium (2014) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212. doi:10.1093/nar/gku989
Tian W, Skolnick J (2003) How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol 333:863–882
Todd AE, Orengo CA, Thornton JM (2001) Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 307:1113–1143. doi:10.1006/jmbi.2001.4513
Todd AE, Orengo CA, Thornton JM (2002) Sequence and structural differences between enzyme and nonenzyme homologs. Structure 10:1435–1451
Whisstock JC, Lesk AM (2003) Prediction of protein function from protein sequence and structure. Q Rev Biophys 36:307–340
Wilson D, Madera M, Vogel C et al (2007) The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res 35:D308–D313. doi:10.1093/nar/gkl910
Ye Y, Godzik A (2003) Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19:ii246–ii255. doi:10.1093/bioinformatics/btg1086
Yeats C, Lees J, Reid A et al (2008) Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Res. doi:10.1093/nar/gkm1019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Dessailly, B.H., Dawson, N.L., Das, S., Orengo, C.A. (2017). Function Diversity Within Folds and Superfamilies. In: J. Rigden, D. (eds) From Protein Structure to Function with Bioinformatics. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-1069-3_9
Download citation
DOI: https://doi.org/10.1007/978-94-024-1069-3_9
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-1067-9
Online ISBN: 978-94-024-1069-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)