Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review
  • Published:

Global perspectives on proteins: comparing genomes in terms of folds, pathways and beyond

Abstract

The sequencing of complete genomes provides us with a global view of all the proteins in an organism. Proteomic analysis can be done on a purely sequence-based level, with a focus on finding homologues and grouping them into families and clusters of orthologs. However, incorporating protein structure into this analysis provides valuable simplification; it allows one to collect together very distantly related sequences, thus condensing the proteome into a minimal number of ‘parts.’ We describe issues related to surveying proteomes in terms of structural parts, including methods for fold assignment and formats for comparisons (eg top-10 lists and whole-genome trees), and show how biases in the databases and in sampling can affect these surveys. We illustrate our main points through a case study on the unique protein properties evident in many thermophile genomes (eg more salt bridges). Finally, we discuss metabolic pathways as an even greater simplification of genomes. In comparison to folds these allow the organization of many more genes into coherent systems, yet can nevertheless be understood in many of the same terms.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. Nowak R . Bacterial genome sequence bagged Science 1995 269: 468–470

    CAS  PubMed  Google Scholar 

  2. Langreth R . Scientists unlock sequence of ulcer bacterium’s genes Wall Street Journal 1997 (Aug 7) B1

  3. Wade N . Thinking small paying off big in gene quest New York Times 1997 02/03/97 Sect. A1

  4. Sherlock G . Analysis of large-scale gene expression data Curr Opin Immunol 2000 12: 201–205

    CAS  PubMed  Google Scholar 

  5. Blaisdell BE, Campbell AM, Karlin S . Similarities and dissimilarities of phage genomes Proc Nat Acad Sci USA 1996 93: 5854–5859

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Karlin S, Burge C . Dinucleotide relative abundance extremes: a genomic signature Trends Genet 1995 11: 283–290

    CAS  PubMed  Google Scholar 

  7. Karlin S, Burge C, Campbell AM . Statistical analyses of counts and distributions of restriction sites in DNA sequences Nucl Acids Res 1992 20: 1363–1370

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Karlin S, Mrazek J, Campbell AM . Frequent oligonucleotides and peptides of the haemophilus influenzae genome Nucl Acids Res 1996 24: 4263–4272

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Tatusov RL, Koonin EV, Lipman DJ . A genomic perspective on protein families Science 1997 278: 631–637

    CAS  PubMed  Google Scholar 

  10. Natale DA, Shankavaram UT, Galperin MY, Wolf YI, Aravind L, Koonin EV . Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) Genome Biol 2000 1

  11. Koonin EV, Mushegian AR, Rudd KE . Sequencing and analysis of bacterial genomes Curr Biol 1996 6: 404–416

    CAS  PubMed  Google Scholar 

  12. Brenner SE, Hubbard T, Murzin A, Chothia C . Gene duplications in H. influenzae Nature 1995 378: 140

    CAS  PubMed  Google Scholar 

  13. Riley M . Genes and proteins of Escherichia coli K-12 (GenProtEC) Nucl Acids Res 1997 25: 51–52

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Wolfe KH, Shields DC . Molecular evidence for an ancient duplication of the entire yeast genome Nature 1997 387: 708–713

    CAS  PubMed  Google Scholar 

  15. Gerstein M . A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure J Mol Biol 1997 274: 562–576

    CAS  PubMed  Google Scholar 

  16. Tamames J, Casari G, Ouzounis C, Valencia A . Conserved clusters of functionally related genes in two bacterial genomes J Mol Evol 1997 44: 66–73

    CAS  PubMed  Google Scholar 

  17. Teichmann SA, Park J, Chothia C . Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements Proc Natl Acad Sci USA 1998 95: 14658–14663

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Nobusato A, Uchiyama I, Ohashi S, Kobayashi I . Insertion with long target duplication: a mechanism for gene mobility suggested from comparison of two related bacterial genomes Gene 2000 259: 99–108

    CAS  PubMed  Google Scholar 

  19. Riley M . Genes and proteins of Escherichia coli K-12 Nucleic Acids Res 1998 26: 54

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Green P . Ancient conserved regions in gene sequences Curr Opin Struct Biol 1994 4: 404–412

    Google Scholar 

  21. Koonin EV, Tatusov RL, Rudd KE . Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications Proc Natl Acad Sci USA 1995 92: 11921–11925

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Ouzounis C, Kyrpides N, Sander C . Novel protein families in Archaean genomes Nucl Acids Res 1995 23: 565–570

    CAS  Google Scholar 

  23. Clayton RA, White O, Ketchum KA, Venter JC . The first genome from the third domain of life Nature 1997 387: 459–462

    CAS  PubMed  Google Scholar 

  24. Debeljak N, Horvat S, Vouk K, Lee M, Rozman D . Characterization of the mouse lanosterol 14alpha-demethylase (CYP51), a new member of the evolutionarily most conserved cytochrome P450 family Arch Biochem Biophys 2000 379: 37–45

    CAS  PubMed  Google Scholar 

  25. Bork P, Ouzounis C, Sander C, Scharf M, Schneider R, Sonnhammer E . What's in a genome? Nature 1992 358: 287

    CAS  PubMed  Google Scholar 

  26. Bork P, Ouzounis C, Sander C, Scharf M, Schneider R, Sonnhammer E . Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome iii Protein Sci 1992 1: 1677–1690

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Scharf M, Schneider R, Casari G, Bork P, Valencia A, Ouzounis C et al . GeneQuiz: a workbench for sequence analysis. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA AAAI Press 1994 pp 348–353

  28. Casari G, Andrade M, Bork P, Boyle J, Daruvar A, Ouzounis C et al . Challenging times for bioinformatics Nature 1995 376: 647–648

    CAS  PubMed  Google Scholar 

  29. Ouzounis C, Bork P, Casari G, Sander C . New protein functions in yeast chromosome VIII Protein Sci 1995 4: 2424–2428

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Gaasterland T, Sensen CW . Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture Biochimie 1996 78: 302–310

    CAS  PubMed  Google Scholar 

  31. McClelland M, Wilson RK . Comparison of sample sequences of the Salmonella typhi genome to the sequence of the complete Escherichia coli K-12 genome Infect Immun 1998 66: 4305–4312

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C et al . Automated genome sequence analysis and annotation Bioinformatics 1999 15: 391–412

    CAS  PubMed  Google Scholar 

  33. Iliopoulos I, Tsoka S, Andrade MA, Janssen P, Audit B, Tramontano A et al . Genome sequences and great expectations GenomeBiology.com 2000 2

  34. Thornton JM, Orengo CA, Todd AE, Pearl FM . Protein folds, functions and evolution J Mol Biol 1999 293: 333–342

    CAS  PubMed  Google Scholar 

  35. Hegyi H, Gerstein M . The relationship between protein structure and function: a comprehensive survey with application to the yeast genome J Mol Biol 1999 288: 147–164

    CAS  PubMed  Google Scholar 

  36. Gerstein M, Altman R . A structurally invariant core for the globins CABIOS 1995 11: 633–644

    CAS  PubMed  Google Scholar 

  37. Gerstein M, Altman RB . Average core structures and variability measures for protein families: application to the immunoglobulins J Mol Biol 1995 251: 161–175

    CAS  PubMed  Google Scholar 

  38. Henikoff S, Henikoff JG . Automated assembly of protein blocks for database searching Nucl Acids Res 1991 19: 6565–6572

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Henikoff S, Henikoff JG . Protein family classification based on searching a database of blocks Genomics 1994 19: 97–107

    CAS  PubMed  Google Scholar 

  40. Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L . Gene families: the taxonomy of protein paralogs and chimeras Science 1997 278: 609–614

    CAS  PubMed  Google Scholar 

  41. Henikoff S, Pietrokovski S, Henikoff JG . Superior performance in protein homology detection with the Blocks Database servers Nucl Acids Res 1998 26: 309–312

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Attwood TK, Beck ME, Flower DR, Scordis P, Selley JN . The PRINTS protein fingerprint database in its fifth year Nucl Acids Res 1998 26: 304–308

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Neuwald AF, Liu JS, Lawrence CE . Gibbs motif sampling: detection of bacterial outer membrane protein repeats Protein Sci 1995 4: 1618–1632

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Bairoch A, Bucher P, Hofmann K . The PROSITE database, its status in 1997 Nucl Acids Res 1997 25: 217–221

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Tatusov RL, Altschul SF, Koonin EV . Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks Proc Natl Acad Sci USA 1994 91: 12091–12095

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Sonnhammer E, Eddy S, Durbin R . Pfam: a comprehensive database of protein domain families based on seed alignments Proteins 1997 28: 405–420

    CAS  PubMed  Google Scholar 

  47. Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R . Pfam: multiple sequence alignments and HMM-profiles of protein domains Nucl Acids Res 1998 26: 320–322

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Corpet F, Gouzy J, Kahn D . The ProDom database of protein domain families Nucl Acids Res 1998 26: 323–326

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Fabian P, Murvai J, Hatsagi Z, Vlahovicek K, Hegyi H, Pongor S . The SBASE protein domain library, release 5.0: a collection of annotated protein sequence segments Nucl Acids Res 1997 25: 240–243

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Sonnhammer ELL, Kahn D . Modular arrangement of proteins as inferred from analysis of homology Protein Sci 1994 3: 482–492

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Henikoff S, Henikoff JG . Automated assembly of protein blocks for database searching Proc Natl Acad Sci 1993 19: 6565–6572

    Google Scholar 

  52. Chothia C, Lesk AM . The relation between the divergence of sequence and structure in proteins Embo J 1986 5: 823–826

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Chothia C, Gerstein M . Protein evolution How far can sequences diverge? Nature 1997 385: 579, 581

    PubMed  Google Scholar 

  54. Jain KK . Genomics for business Drug Discov Today 2001 6: 131–132

    PubMed  Google Scholar 

  55. Edwards A, Arrowsmith C, des Pallieres B . Proteomics: new tools for a new era Modern Drug Discovery 2000 5: 35–44

    Google Scholar 

  56. Christendat D, Yee A, Dharamsi A, Kluger Y, Savchenko A, Cort JR et al . Structural proteomics of an archaeon Nat Struct Biol 2000 7: 903–909

    CAS  PubMed  Google Scholar 

  57. Eisenstein E, Gilliland GL, Herzberg O, Moult J, Orban J, Poljak RJ et al . Biological function made crystal clear—annotation of hypothetical proteins via structural genomics Curr Opin Biotechnol 2000 11: 25–30

    CAS  PubMed  Google Scholar 

  58. Murzin A, Brenner SE, Hubbard T, Chothia C . SCOP: a structural classification of proteins for the investigation of sequences and structures J Mol Biol 1995 247: 536–540

    CAS  PubMed  Google Scholar 

  59. Holm L, Sander C . Protein structure comparison by alignment of distance matrices J Mol Biol 1993 233: 123–128

    CAS  PubMed  Google Scholar 

  60. Johnson MS, Sali A, Blundell TL . Phylogenetic relationships from three-dimensional protein structures Meth Enz 1990 183: 670–691

    CAS  Google Scholar 

  61. Orengo CA, Flores TP, Taylor WR, Thornton JM . Identifying and classifying protein fold families Prot Eng 1993 6: 485–500

    CAS  Google Scholar 

  62. Pearl FM, Martin N, Bray JE, Buchan DW, Harrison AP, Lee D et al . A rapid classification protocol for the CATH domain database to support structural genomics Nucl Acids Res 2001 29: 223–227

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL . The Pfam protein families database Nucl Acids Res 2000 28: 263–266

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C . SCOP: a structural classification of proteins database Nucl Acids Res 2000 28: 257–259

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W et al . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucl Acids Res 1997 25: 3389–3402

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Bairoch A, Apweiler R . The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998 Nucl Acids Res 1998 26: 38–42

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Benson DA, Boguski M, Lipman DJ, Ostell J . Genbank Nuc Acid Res 1996 24: 1–5

  68. Lipman DJ, Pearson WR . Rapid and sensitive protein similarity searches Science 1985 227: 1435–1441

    CAS  PubMed  Google Scholar 

  69. Pearson WR, Lipman DJ . Improved tools for biological sequence analysis Proc Natl Acad Sci USA 1988 85: 2444–2448

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Gerstein M . Measurement of the effectiveness of transitive sequence comparison, through a third ‘intermediate’ sequence Bioinformatics 1998 14: 707–714

    CAS  PubMed  Google Scholar 

  71. Park J, Teichmann SA, Hubbard T, Chothia C . Intermediate sequences increase the detection of homology between sequences J Mol Biol 1997 273: 349–354

    CAS  PubMed  Google Scholar 

  72. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D . Hidden Markov models in computational biology: applications to protein modelling J Mol Biol 1994 235: 1501–1531

    CAS  PubMed  Google Scholar 

  73. Baldi P, Chauvin Y, Hunkapiller T . Hidden Markov models of biological primary sequence information Proc Natl Acad Sci 1994 91: 1059–1063

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Eddy SR, Mitchison G, Durbin R . Maximum discrimination hidden Markov models of sequence consensus J Comp Bio 1994 9: 9–23

    Google Scholar 

  75. Taubes G . Software matchmakers help make sense of sequences Science 1996 273: 588–590

    CAS  PubMed  Google Scholar 

  76. Bowie JU, Lüthy R, Eisenberg D . A method to identify protein sequences that fold into a known three-dimensional structure Science 1991 253: 164–170

    CAS  PubMed  Google Scholar 

  77. Eddy SR . Hidden Markov models Curr Opin Struc Biol 1996 6: 361–365

    CAS  Google Scholar 

  78. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A . Information content of binding sites on nucleotide sequences J Mol Biol 1986 188: 415–431

    CAS  PubMed  Google Scholar 

  79. Staden R . Methods for calculating the probabilities of finding patterns in sequences Comput Appl Biosci 1989 5: 89–96

    CAS  PubMed  Google Scholar 

  80. Gribskov M, McLachlan AD, Eisenberg D . Profile analysis: detection of distantly related proteins Proc Natl Acad Sci USA 1987 84: 4355–4358

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Yi TM, Lander ES . Protein secondary structure prediction using nearest-neighbor methods J Mol Biol 1993 232: 1117–1129

    CAS  PubMed  Google Scholar 

  82. Bucher P, Karplus K, Moeri N, Hofmann K . A flexible motif search technique based on generalized profiles Comput Chem 1996 20: 3–23

    CAS  PubMed  Google Scholar 

  83. Al-Lazikani B, Jung J, Xiang Z, Honig B . Protein structure prediction Curr Opin Chem Biol 2001 5: 51–56

    CAS  PubMed  Google Scholar 

  84. Sali A . Modeling mutations and homologous proteins Curr Opin Biotechnol 1995 6: 437–451

    CAS  PubMed  Google Scholar 

  85. Blundell TL, Sibanda BL, Sternberg MJ, Thornton JM . Knowledge-based prediction of protein structures and the design of novel molecules Nature 1987 326: 347–352

    CAS  PubMed  Google Scholar 

  86. Bajorath J, Stenkamp R, Aruffo A . Knowledge-based model building of proteins: concepts and examples Protein Sci 1993 2: 1798–1810

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Sali A, Sánchez R . Advances in comparative protein-structure modeling Curr Opin Struct Biol 1997 7: 206–214

    PubMed  Google Scholar 

  88. Gerstein M, Hegyi H . Comparing genomes in terms of protein structure: surveys of a finite parts list FEMS Microbiol Rev 1998 22: 277–304

    CAS  PubMed  Google Scholar 

  89. Skolnick J, Fetrow JS . From genes to protein structure and function: novel applications of computational approaches in the genomic era Trends Biotechnol 2000 18: 34–39

    CAS  PubMed  Google Scholar 

  90. Chothia C . Proteins. One thousand families for the molecular biologist Nature 1992 357: 543–544

    CAS  PubMed  Google Scholar 

  91. Orengo CA, Jones DT, Thornton JM . Protein superfamilies and domain superfolds Nature 1994 372: 631–634

    CAS  PubMed  Google Scholar 

  92. Lesk AM, Chothia C . How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins J Mol Biol 1980 136: 225–270

    CAS  PubMed  Google Scholar 

  93. Gerstein M . Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census Proteins 1998 33: 518–534

    CAS  PubMed  Google Scholar 

  94. Lin J, Gerstein M . Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels Genome Res 2000 10: 808–818

    CAS  PubMed  PubMed Central  Google Scholar 

  95. Ouzounis C, Kyrpides N . The emergence of major cellular processes in evolution FEBS Lett 1996 390: 119–123

    CAS  PubMed  Google Scholar 

  96. Gerstein M, Lin J, Hegyi H . Protein folds in the worm genome Pac Symp Biocomput 2000 30–41

  97. Sauder JM, Dunbrack RL Jr . Genomic fold assignment and rational modeling of proteins of biological interest Proc Int Conf Intell Syst Mol Biol 2000 8: 296–306

    CAS  PubMed  Google Scholar 

  98. Fischer D, Eisenberg D . Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium Proc Natl Acad Sci USA 1997 94: 11929–11934

    CAS  PubMed  PubMed Central  Google Scholar 

  99. Rychlewski L, Zhang B, Godzik A . Fold and function predictions for Mycoplasma genitalium proteins Fold Des 1998 3: 229–238

    CAS  PubMed  Google Scholar 

  100. Mallick P, Goodwill KE, Fitz-Gibbon S, Miller JH, Eisenberg D . Selecting protein targets for structural genomics of Pyrobaculum aerophilum: validating automated fold assignment methods by using binary hypothesis testing Proc Natl Acad Sci USA 2000 97: 2450–2455

    CAS  PubMed  PubMed Central  Google Scholar 

  101. Dubchak I, Muchnik I, Kim SH . Assignment of folds for proteins of unknown function in three microbial genomes Microb Comp Genomics 1998 3: 171–175

    CAS  PubMed  Google Scholar 

  102. Frishman D, Mewes H-W . PEDANTic genome analysis Trends Genet 1997 13: 415–416

    CAS  Google Scholar 

  103. Harrison PM, Echols N, Gerstein MB . Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome Nucl Acids Res 2001 29: 818–830

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Honig B . Protein folding: from the levinthal paradox to structure prediction J Mol Biol 1999 293: 283–293

    CAS  PubMed  Google Scholar 

  105. Sternberg MJ, Bates PA, Kelley LA, MacCallum RM . Progress in protein structure prediction: assessment of CASP3 Curr Opin Struct Biol 1999 9: 368–373

    CAS  PubMed  Google Scholar 

  106. Finkel’shtein AV, Rykunov DS, Lobanov MI, Badretdinov FI, Reva BA, Skolnick J et al . [When and how can homologs overcome errors in the energy estimates and make the 3D structure prediction possible] Biofizika 1999 44: 980–991

    PubMed  Google Scholar 

  107. O’Donoghue SI, Nilges M . Tertiary structure prediction using mean-force potentials and internal energy functions: successful prediction for coiled-coil geometries Fold Des 1997 2: S47–52

    PubMed  Google Scholar 

  108. Hansson M, Gough SP, Brody SS . Structure prediction and fold recognition for the ferrochelatase family of proteins Proteins 1997 27: 517–522

    CAS  PubMed  Google Scholar 

  109. Rost B . PHD: predicting one-dimensional protein secondary structure by profile-based neural networks Meth Enz 1996 266: 525–539

    CAS  Google Scholar 

  110. Defay T, Cohen FE . Evaluation of current techniques for ab initio protein structure prediction Proteins 1995 23: 431–445

    CAS  PubMed  Google Scholar 

  111. Pedersen JT, Moult J . Ab initio protein folding simulations with genetic algorithms: simulations on the complete sequence of small proteins Proteins 1997 Suppl 1: 179–184

    CAS  PubMed  Google Scholar 

  112. Garnier J, Gibrat JF, Robson B . GOR method for predicting protein secondary structure from amino acid sequence Meth Enzymol 1996 266: 540–553

    CAS  Google Scholar 

  113. Garnier J, Osguthorpe DJ, Robson B . Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins J Mol Biol 1978 120: 97–120

    CAS  PubMed  Google Scholar 

  114. Gibrat JF, Garnier J, Robson B . Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs J Mol Biol 1987 198: 425–443

    CAS  PubMed  Google Scholar 

  115. King RD, Saqi M, Sayle R, Sternberg MJ . DSC: public domain protein secondary structure predication Comput Appl Biosci 1997 13: 473–474

    CAS  PubMed  Google Scholar 

  116. Livingstone CD, Barton GJ . Identification of functional residues and secondary structure from protein multiple sequence alignment Meth Enzymol 1996 266: 497–512

    CAS  Google Scholar 

  117. Gerstein M, Levitt M . A structural census of the current population of protein sequences Proc Natl Acad Sci U S A 1997 94: 11911–11916

    CAS  PubMed  PubMed Central  Google Scholar 

  118. Engelman DM, Steitz TA, Goldman A . Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins Annu Rev Biophys Biophys Chem 1986 15: 321–353

    CAS  PubMed  Google Scholar 

  119. Gribskov M, Devereux J . Sequence Analysis Primer Oxford University Press: New York 1992

    Google Scholar 

  120. Kyte J, Doolittle RF . A simple method for displaying the hydrophobic character of a protein J Mol Biol 1982 157: 105–132

    CAS  PubMed  Google Scholar 

  121. Jähnig F . Structure predictions of membrane proteins are not that bad TIBS 1990 15: 93–95

    PubMed  Google Scholar 

  122. von Heijne G . Membrane proteins: from sequence to structure Annu Rev Biophys Biomol Struct 1994 23: 167–192

    CAS  PubMed  Google Scholar 

  123. von Heijne G . Principles of membrane protein assembly and structure Prog Biophys Mol Biol 1996 66: 113–139

    CAS  PubMed  Google Scholar 

  124. Das R, Gerstein M . The stability of thermophilic proteins: a study based on comprehensive genome comparison Functional & Integrative Genomics 2000 1: 76–88

    CAS  Google Scholar 

  125. Auerbach G, Ostendorp R, Prade L, Korndorfer I, Dams T, Huber R et al . Lactate dehydrogenase from the hyperthermophilic bacterium Thermotoga maritima: the crystal structure at 2.1 A resolution reveals strategies for intrinsic protein stabilization Structure 1998 6: 769–781

    CAS  PubMed  Google Scholar 

  126. Hennig M, Darimont B, Sterner R, Kirschner K, Jansonius JN . 2.0 A structure of indole-3-glycerol phosphate synthase from the hyperthermophile Sulfolobus solfataricus: possible determinants of protein stability Structure 1995 3: 1295–1306

    CAS  PubMed  Google Scholar 

  127. Knapp S, de Vos WM, Rice D, Ladenstein R . Crystal structure of glutamate dehydrogenase from the hyperthermophilic eubacterium Thermotoga maritima at 3.0 A resolution J Mol Biol 1997 267: 916–932

    CAS  PubMed  Google Scholar 

  128. Hennig M, Sterner R, Kirschner K, Jansonius JN . Crystal structure at 2.0 A resolution of phosphoribosyl anthranilate isomerase from the hyperthermophile Thermotoga maritima: possible determinants of protein stability Biochemistry 1997 36: 6009–6016

    CAS  PubMed  Google Scholar 

  129. Korndorfer I, Steipe B, Huber R, Tomschy A, Jaenicke R . The crystal structure of holo-glyceraldehyde-3-phosphate dehydrogenase from the hyperthermophilic bacterium Thermotoga maritima at 2.5 A resolution J Mol Biol 1995 246: 511–521

    CAS  PubMed  Google Scholar 

  130. Russell RJ, Ferguson JM, Hough DW, Danson MJ, Taylor GL . The crystal structure of citrate synthase from the hyperthermophilic archaeon Pyrococcus furiosus at 1.9 A resolution Biochemistry 1997 36: 9983–9994

    CAS  PubMed  Google Scholar 

  131. Salminen T, Teplyakov A, Kankare J, Cooperman BS, Lahti R, Goldman A . An unusual route to thermostability disclosed by the comparison of Thermus thermophilus and Escherichia coli inorganic pyrophosphatases Protein Sci 1996 5: 1014–1025

    CAS  PubMed  PubMed Central  Google Scholar 

  132. Spassov VZ, Karshikoff AD, Ladenstein R . The optimization of protein-solvent interactions: thermostability and the role of hydrophobic and electrostatic interactions Protein Sci 1995 4: 1516–1527

    CAS  PubMed  PubMed Central  Google Scholar 

  133. Szilagyi A, Zavodszky P . Structural basis for the extreme thermostability of D-glyceraldehyde-3-phosphate dehydrogenase from Thermotoga maritima: analysis based on homology modelling Protein Eng 1995 8: 779–789

    CAS  PubMed  Google Scholar 

  134. Wallon G, Yamamoto K, Kirino H, Yamagishi A, Lovett ST, Petsko GA et al . Purification, catalytic properties and thermostability of 3-isopropylmalate dehydrogenase from Escherichia coli Biochim Biophys Acta 1997 1337: 105–112

    CAS  PubMed  Google Scholar 

  135. Yip KS, Stillman TJ, Britton KL, Artymiuk PJ, Baker PJ, Sedelnikova SE et al . The structure of Pyrococcus furiosus glutamate dehydrogenase reveals a key role for ion-pair networks in maintaining enzyme stability at extreme temperatures Structure 1995 3: 1147–1158

    CAS  PubMed  Google Scholar 

  136. Kawamura S, Tanaka I, Yamasaki N, Kimura M . Contribution of a salt bridge to the thermostability of DNA binding protein HU from Bacillus stearothermophilus determined by site-directed mutagenesis J Biochem (Tokyo) 1997 121: 448–455

    CAS  Google Scholar 

  137. Mande SS, Gupta N, Ghosh A, Mande SC . Homology model of a novel xylanase: molecular basis for high-thermostability and alkaline stability J Biomol Struct Dyn 2000 18: 137–144

    CAS  PubMed  Google Scholar 

  138. Hartley BS, Hanlon N, Jackson RJ, Rangarajan M . Glucose isomerase: insights into protein engineering for increased thermostability Biochim Biophys Acta 2000 1543: 294–335

    CAS  PubMed  Google Scholar 

  139. Qu CC, Akanuma SS, Tanaka NN, Moriyama HH, Oshima TT . Design, X-ray crystallography, molecular modelling and thermal stability studies of mutant enzymes at site 172 of 3-isopropylmalate dehydrogenase from Thermus thermophilus Acta Crystallogr D Biol Crystallogr 2001 57: 225–232

    CAS  PubMed  Google Scholar 

  140. Xiao L, Honig B . Electrostatic contributions to the stability of hyperthermophilic proteins J Mol Biol 1999 289: 1435–1444

    CAS  PubMed  Google Scholar 

  141. Vetriani C, Maeder DL, Tolliday N, Yip KS, Stillman TJ, Britton KL et al . Protein thermostability above 100 degrees C: a key role for ionic interactions Proc Natl Acad Sci U S A 1998 95: 12300–12305

    CAS  PubMed  PubMed Central  Google Scholar 

  142. Lebbink JH, Knapp S, van der Oost J, Rice D, Ladenstein R, de Vos WM . Engineering activity and stability of Thermotoga maritima glutamate dehydrogenase. I. Introduction of a six-residue ion-pair network in the hinge region J Mol Biol 1998 280: 287–296

    CAS  PubMed  Google Scholar 

  143. Scholtz JM, Qian H, Robbins VH, Baldwin RL . The energetics of ion-pair and hydrogen-bonding interactions in a helical peptide Biochemistry 1993 32: 9668–9676

    CAS  PubMed  Google Scholar 

  144. Huyghues-Despointes BM, Scholtz JM, Baldwin RL . Effect of a single aspartate on helix stability at different positions in a neutral alanine-based peptide Protein Sci 1993 2: 1604–1611

    CAS  PubMed  PubMed Central  Google Scholar 

  145. Russell RB, Barton GB . Multiple protein sequence alignment from tertiary structure comparisons. Assignment of global and residue level confidences Proteins 1992 14: 309–323

    CAS  PubMed  Google Scholar 

  146. Grindley HM, Artymiuk PJ, Rice DW, Willett P . Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm J Mol Biol 1993 229: 707–721

    CAS  PubMed  Google Scholar 

  147. Bono H, Ogata H, Goto S, Kanehisa M . Reconstruction of amino acid biosynthesis pathways from the complete genome sequence Genome Res 1998 8: 203–210

    CAS  PubMed  Google Scholar 

  148. Galperin MY, Koonin EV . Functional genomics and enzyme evolution. Homologous and analogous enzymes encoded in microbial genomes Genetica 1999 106: 159–170

    CAS  PubMed  Google Scholar 

  149. Dandekar T, Schuster S, Snel B, Huynen M, Bork P . Pathway alignment: application to the comparative analysis of glycolytic enzymes Biochem J 1999 343: 115–124

    CAS  PubMed  PubMed Central  Google Scholar 

  150. Forst CV, Schulten K . Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information J Comput Biol 1999 6: 343–360

    CAS  PubMed  Google Scholar 

  151. Ogata H, Fujibuchi W, Goto S, Kanehisa M . A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters Nucl Acids Res 2000 28: 4021–4028

    CAS  PubMed  PubMed Central  Google Scholar 

  152. Kanehisa M, Goto S . KEGG: kyoto encyclopedia of genes and genomes Nucl Acids Res 2000 28: 27–30

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M B Gerstein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, R., Junker, J., Greenbaum, D. et al. Global perspectives on proteins: comparing genomes in terms of folds, pathways and beyond. Pharmacogenomics J 1, 115–125 (2001). https://doi.org/10.1038/sj.tpj.6500021

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/sj.tpj.6500021

Keywords

Search

Quick links