Abstract
The sequencing of complete genomes provides us with a global view of all the proteins in an organism. Proteomic analysis can be done on a purely sequence-based level, with a focus on finding homologues and grouping them into families and clusters of orthologs. However, incorporating protein structure into this analysis provides valuable simplification; it allows one to collect together very distantly related sequences, thus condensing the proteome into a minimal number of ‘parts.’ We describe issues related to surveying proteomes in terms of structural parts, including methods for fold assignment and formats for comparisons (eg top-10 lists and whole-genome trees), and show how biases in the databases and in sampling can affect these surveys. We illustrate our main points through a case study on the unique protein properties evident in many thermophile genomes (eg more salt bridges). Finally, we discuss metabolic pathways as an even greater simplification of genomes. In comparison to folds these allow the organization of many more genes into coherent systems, yet can nevertheless be understood in many of the same terms.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 6 print issues and online access
$259.00 per year
only $43.17 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Nowak R . Bacterial genome sequence bagged Science 1995 269: 468–470
Langreth R . Scientists unlock sequence of ulcer bacterium’s genes Wall Street Journal 1997 (Aug 7) B1
Wade N . Thinking small paying off big in gene quest New York Times 1997 02/03/97 Sect. A1
Sherlock G . Analysis of large-scale gene expression data Curr Opin Immunol 2000 12: 201–205
Blaisdell BE, Campbell AM, Karlin S . Similarities and dissimilarities of phage genomes Proc Nat Acad Sci USA 1996 93: 5854–5859
Karlin S, Burge C . Dinucleotide relative abundance extremes: a genomic signature Trends Genet 1995 11: 283–290
Karlin S, Burge C, Campbell AM . Statistical analyses of counts and distributions of restriction sites in DNA sequences Nucl Acids Res 1992 20: 1363–1370
Karlin S, Mrazek J, Campbell AM . Frequent oligonucleotides and peptides of the haemophilus influenzae genome Nucl Acids Res 1996 24: 4263–4272
Tatusov RL, Koonin EV, Lipman DJ . A genomic perspective on protein families Science 1997 278: 631–637
Natale DA, Shankavaram UT, Galperin MY, Wolf YI, Aravind L, Koonin EV . Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) Genome Biol 2000 1
Koonin EV, Mushegian AR, Rudd KE . Sequencing and analysis of bacterial genomes Curr Biol 1996 6: 404–416
Brenner SE, Hubbard T, Murzin A, Chothia C . Gene duplications in H. influenzae Nature 1995 378: 140
Riley M . Genes and proteins of Escherichia coli K-12 (GenProtEC) Nucl Acids Res 1997 25: 51–52
Wolfe KH, Shields DC . Molecular evidence for an ancient duplication of the entire yeast genome Nature 1997 387: 708–713
Gerstein M . A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure J Mol Biol 1997 274: 562–576
Tamames J, Casari G, Ouzounis C, Valencia A . Conserved clusters of functionally related genes in two bacterial genomes J Mol Evol 1997 44: 66–73
Teichmann SA, Park J, Chothia C . Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements Proc Natl Acad Sci USA 1998 95: 14658–14663
Nobusato A, Uchiyama I, Ohashi S, Kobayashi I . Insertion with long target duplication: a mechanism for gene mobility suggested from comparison of two related bacterial genomes Gene 2000 259: 99–108
Riley M . Genes and proteins of Escherichia coli K-12 Nucleic Acids Res 1998 26: 54
Green P . Ancient conserved regions in gene sequences Curr Opin Struct Biol 1994 4: 404–412
Koonin EV, Tatusov RL, Rudd KE . Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications Proc Natl Acad Sci USA 1995 92: 11921–11925
Ouzounis C, Kyrpides N, Sander C . Novel protein families in Archaean genomes Nucl Acids Res 1995 23: 565–570
Clayton RA, White O, Ketchum KA, Venter JC . The first genome from the third domain of life Nature 1997 387: 459–462
Debeljak N, Horvat S, Vouk K, Lee M, Rozman D . Characterization of the mouse lanosterol 14alpha-demethylase (CYP51), a new member of the evolutionarily most conserved cytochrome P450 family Arch Biochem Biophys 2000 379: 37–45
Bork P, Ouzounis C, Sander C, Scharf M, Schneider R, Sonnhammer E . What's in a genome? Nature 1992 358: 287
Bork P, Ouzounis C, Sander C, Scharf M, Schneider R, Sonnhammer E . Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome iii Protein Sci 1992 1: 1677–1690
Scharf M, Schneider R, Casari G, Bork P, Valencia A, Ouzounis C et al . GeneQuiz: a workbench for sequence analysis. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA AAAI Press 1994 pp 348–353
Casari G, Andrade M, Bork P, Boyle J, Daruvar A, Ouzounis C et al . Challenging times for bioinformatics Nature 1995 376: 647–648
Ouzounis C, Bork P, Casari G, Sander C . New protein functions in yeast chromosome VIII Protein Sci 1995 4: 2424–2428
Gaasterland T, Sensen CW . Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture Biochimie 1996 78: 302–310
McClelland M, Wilson RK . Comparison of sample sequences of the Salmonella typhi genome to the sequence of the complete Escherichia coli K-12 genome Infect Immun 1998 66: 4305–4312
Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C et al . Automated genome sequence analysis and annotation Bioinformatics 1999 15: 391–412
Iliopoulos I, Tsoka S, Andrade MA, Janssen P, Audit B, Tramontano A et al . Genome sequences and great expectations GenomeBiology.com 2000 2
Thornton JM, Orengo CA, Todd AE, Pearl FM . Protein folds, functions and evolution J Mol Biol 1999 293: 333–342
Hegyi H, Gerstein M . The relationship between protein structure and function: a comprehensive survey with application to the yeast genome J Mol Biol 1999 288: 147–164
Gerstein M, Altman R . A structurally invariant core for the globins CABIOS 1995 11: 633–644
Gerstein M, Altman RB . Average core structures and variability measures for protein families: application to the immunoglobulins J Mol Biol 1995 251: 161–175
Henikoff S, Henikoff JG . Automated assembly of protein blocks for database searching Nucl Acids Res 1991 19: 6565–6572
Henikoff S, Henikoff JG . Protein family classification based on searching a database of blocks Genomics 1994 19: 97–107
Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L . Gene families: the taxonomy of protein paralogs and chimeras Science 1997 278: 609–614
Henikoff S, Pietrokovski S, Henikoff JG . Superior performance in protein homology detection with the Blocks Database servers Nucl Acids Res 1998 26: 309–312
Attwood TK, Beck ME, Flower DR, Scordis P, Selley JN . The PRINTS protein fingerprint database in its fifth year Nucl Acids Res 1998 26: 304–308
Neuwald AF, Liu JS, Lawrence CE . Gibbs motif sampling: detection of bacterial outer membrane protein repeats Protein Sci 1995 4: 1618–1632
Bairoch A, Bucher P, Hofmann K . The PROSITE database, its status in 1997 Nucl Acids Res 1997 25: 217–221
Tatusov RL, Altschul SF, Koonin EV . Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks Proc Natl Acad Sci USA 1994 91: 12091–12095
Sonnhammer E, Eddy S, Durbin R . Pfam: a comprehensive database of protein domain families based on seed alignments Proteins 1997 28: 405–420
Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R . Pfam: multiple sequence alignments and HMM-profiles of protein domains Nucl Acids Res 1998 26: 320–322
Corpet F, Gouzy J, Kahn D . The ProDom database of protein domain families Nucl Acids Res 1998 26: 323–326
Fabian P, Murvai J, Hatsagi Z, Vlahovicek K, Hegyi H, Pongor S . The SBASE protein domain library, release 5.0: a collection of annotated protein sequence segments Nucl Acids Res 1997 25: 240–243
Sonnhammer ELL, Kahn D . Modular arrangement of proteins as inferred from analysis of homology Protein Sci 1994 3: 482–492
Henikoff S, Henikoff JG . Automated assembly of protein blocks for database searching Proc Natl Acad Sci 1993 19: 6565–6572
Chothia C, Lesk AM . The relation between the divergence of sequence and structure in proteins Embo J 1986 5: 823–826
Chothia C, Gerstein M . Protein evolution How far can sequences diverge? Nature 1997 385: 579, 581
Jain KK . Genomics for business Drug Discov Today 2001 6: 131–132
Edwards A, Arrowsmith C, des Pallieres B . Proteomics: new tools for a new era Modern Drug Discovery 2000 5: 35–44
Christendat D, Yee A, Dharamsi A, Kluger Y, Savchenko A, Cort JR et al . Structural proteomics of an archaeon Nat Struct Biol 2000 7: 903–909
Eisenstein E, Gilliland GL, Herzberg O, Moult J, Orban J, Poljak RJ et al . Biological function made crystal clear—annotation of hypothetical proteins via structural genomics Curr Opin Biotechnol 2000 11: 25–30
Murzin A, Brenner SE, Hubbard T, Chothia C . SCOP: a structural classification of proteins for the investigation of sequences and structures J Mol Biol 1995 247: 536–540
Holm L, Sander C . Protein structure comparison by alignment of distance matrices J Mol Biol 1993 233: 123–128
Johnson MS, Sali A, Blundell TL . Phylogenetic relationships from three-dimensional protein structures Meth Enz 1990 183: 670–691
Orengo CA, Flores TP, Taylor WR, Thornton JM . Identifying and classifying protein fold families Prot Eng 1993 6: 485–500
Pearl FM, Martin N, Bray JE, Buchan DW, Harrison AP, Lee D et al . A rapid classification protocol for the CATH domain database to support structural genomics Nucl Acids Res 2001 29: 223–227
Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL . The Pfam protein families database Nucl Acids Res 2000 28: 263–266
Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C . SCOP: a structural classification of proteins database Nucl Acids Res 2000 28: 257–259
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W et al . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucl Acids Res 1997 25: 3389–3402
Bairoch A, Apweiler R . The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998 Nucl Acids Res 1998 26: 38–42
Benson DA, Boguski M, Lipman DJ, Ostell J . Genbank Nuc Acid Res 1996 24: 1–5
Lipman DJ, Pearson WR . Rapid and sensitive protein similarity searches Science 1985 227: 1435–1441
Pearson WR, Lipman DJ . Improved tools for biological sequence analysis Proc Natl Acad Sci USA 1988 85: 2444–2448
Gerstein M . Measurement of the effectiveness of transitive sequence comparison, through a third ‘intermediate’ sequence Bioinformatics 1998 14: 707–714
Park J, Teichmann SA, Hubbard T, Chothia C . Intermediate sequences increase the detection of homology between sequences J Mol Biol 1997 273: 349–354
Krogh A, Brown M, Mian IS, Sjölander K, Haussler D . Hidden Markov models in computational biology: applications to protein modelling J Mol Biol 1994 235: 1501–1531
Baldi P, Chauvin Y, Hunkapiller T . Hidden Markov models of biological primary sequence information Proc Natl Acad Sci 1994 91: 1059–1063
Eddy SR, Mitchison G, Durbin R . Maximum discrimination hidden Markov models of sequence consensus J Comp Bio 1994 9: 9–23
Taubes G . Software matchmakers help make sense of sequences Science 1996 273: 588–590
Bowie JU, Lüthy R, Eisenberg D . A method to identify protein sequences that fold into a known three-dimensional structure Science 1991 253: 164–170
Eddy SR . Hidden Markov models Curr Opin Struc Biol 1996 6: 361–365
Schneider TD, Stormo GD, Gold L, Ehrenfeucht A . Information content of binding sites on nucleotide sequences J Mol Biol 1986 188: 415–431
Staden R . Methods for calculating the probabilities of finding patterns in sequences Comput Appl Biosci 1989 5: 89–96
Gribskov M, McLachlan AD, Eisenberg D . Profile analysis: detection of distantly related proteins Proc Natl Acad Sci USA 1987 84: 4355–4358
Yi TM, Lander ES . Protein secondary structure prediction using nearest-neighbor methods J Mol Biol 1993 232: 1117–1129
Bucher P, Karplus K, Moeri N, Hofmann K . A flexible motif search technique based on generalized profiles Comput Chem 1996 20: 3–23
Al-Lazikani B, Jung J, Xiang Z, Honig B . Protein structure prediction Curr Opin Chem Biol 2001 5: 51–56
Sali A . Modeling mutations and homologous proteins Curr Opin Biotechnol 1995 6: 437–451
Blundell TL, Sibanda BL, Sternberg MJ, Thornton JM . Knowledge-based prediction of protein structures and the design of novel molecules Nature 1987 326: 347–352
Bajorath J, Stenkamp R, Aruffo A . Knowledge-based model building of proteins: concepts and examples Protein Sci 1993 2: 1798–1810
Sali A, Sánchez R . Advances in comparative protein-structure modeling Curr Opin Struct Biol 1997 7: 206–214
Gerstein M, Hegyi H . Comparing genomes in terms of protein structure: surveys of a finite parts list FEMS Microbiol Rev 1998 22: 277–304
Skolnick J, Fetrow JS . From genes to protein structure and function: novel applications of computational approaches in the genomic era Trends Biotechnol 2000 18: 34–39
Chothia C . Proteins. One thousand families for the molecular biologist Nature 1992 357: 543–544
Orengo CA, Jones DT, Thornton JM . Protein superfamilies and domain superfolds Nature 1994 372: 631–634
Lesk AM, Chothia C . How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins J Mol Biol 1980 136: 225–270
Gerstein M . Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census Proteins 1998 33: 518–534
Lin J, Gerstein M . Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels Genome Res 2000 10: 808–818
Ouzounis C, Kyrpides N . The emergence of major cellular processes in evolution FEBS Lett 1996 390: 119–123
Gerstein M, Lin J, Hegyi H . Protein folds in the worm genome Pac Symp Biocomput 2000 30–41
Sauder JM, Dunbrack RL Jr . Genomic fold assignment and rational modeling of proteins of biological interest Proc Int Conf Intell Syst Mol Biol 2000 8: 296–306
Fischer D, Eisenberg D . Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium Proc Natl Acad Sci USA 1997 94: 11929–11934
Rychlewski L, Zhang B, Godzik A . Fold and function predictions for Mycoplasma genitalium proteins Fold Des 1998 3: 229–238
Mallick P, Goodwill KE, Fitz-Gibbon S, Miller JH, Eisenberg D . Selecting protein targets for structural genomics of Pyrobaculum aerophilum: validating automated fold assignment methods by using binary hypothesis testing Proc Natl Acad Sci USA 2000 97: 2450–2455
Dubchak I, Muchnik I, Kim SH . Assignment of folds for proteins of unknown function in three microbial genomes Microb Comp Genomics 1998 3: 171–175
Frishman D, Mewes H-W . PEDANTic genome analysis Trends Genet 1997 13: 415–416
Harrison PM, Echols N, Gerstein MB . Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome Nucl Acids Res 2001 29: 818–830
Honig B . Protein folding: from the levinthal paradox to structure prediction J Mol Biol 1999 293: 283–293
Sternberg MJ, Bates PA, Kelley LA, MacCallum RM . Progress in protein structure prediction: assessment of CASP3 Curr Opin Struct Biol 1999 9: 368–373
Finkel’shtein AV, Rykunov DS, Lobanov MI, Badretdinov FI, Reva BA, Skolnick J et al . [When and how can homologs overcome errors in the energy estimates and make the 3D structure prediction possible] Biofizika 1999 44: 980–991
O’Donoghue SI, Nilges M . Tertiary structure prediction using mean-force potentials and internal energy functions: successful prediction for coiled-coil geometries Fold Des 1997 2: S47–52
Hansson M, Gough SP, Brody SS . Structure prediction and fold recognition for the ferrochelatase family of proteins Proteins 1997 27: 517–522
Rost B . PHD: predicting one-dimensional protein secondary structure by profile-based neural networks Meth Enz 1996 266: 525–539
Defay T, Cohen FE . Evaluation of current techniques for ab initio protein structure prediction Proteins 1995 23: 431–445
Pedersen JT, Moult J . Ab initio protein folding simulations with genetic algorithms: simulations on the complete sequence of small proteins Proteins 1997 Suppl 1: 179–184
Garnier J, Gibrat JF, Robson B . GOR method for predicting protein secondary structure from amino acid sequence Meth Enzymol 1996 266: 540–553
Garnier J, Osguthorpe DJ, Robson B . Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins J Mol Biol 1978 120: 97–120
Gibrat JF, Garnier J, Robson B . Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs J Mol Biol 1987 198: 425–443
King RD, Saqi M, Sayle R, Sternberg MJ . DSC: public domain protein secondary structure predication Comput Appl Biosci 1997 13: 473–474
Livingstone CD, Barton GJ . Identification of functional residues and secondary structure from protein multiple sequence alignment Meth Enzymol 1996 266: 497–512
Gerstein M, Levitt M . A structural census of the current population of protein sequences Proc Natl Acad Sci U S A 1997 94: 11911–11916
Engelman DM, Steitz TA, Goldman A . Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins Annu Rev Biophys Biophys Chem 1986 15: 321–353
Gribskov M, Devereux J . Sequence Analysis Primer Oxford University Press: New York 1992
Kyte J, Doolittle RF . A simple method for displaying the hydrophobic character of a protein J Mol Biol 1982 157: 105–132
Jähnig F . Structure predictions of membrane proteins are not that bad TIBS 1990 15: 93–95
von Heijne G . Membrane proteins: from sequence to structure Annu Rev Biophys Biomol Struct 1994 23: 167–192
von Heijne G . Principles of membrane protein assembly and structure Prog Biophys Mol Biol 1996 66: 113–139
Das R, Gerstein M . The stability of thermophilic proteins: a study based on comprehensive genome comparison Functional & Integrative Genomics 2000 1: 76–88
Auerbach G, Ostendorp R, Prade L, Korndorfer I, Dams T, Huber R et al . Lactate dehydrogenase from the hyperthermophilic bacterium Thermotoga maritima: the crystal structure at 2.1 A resolution reveals strategies for intrinsic protein stabilization Structure 1998 6: 769–781
Hennig M, Darimont B, Sterner R, Kirschner K, Jansonius JN . 2.0 A structure of indole-3-glycerol phosphate synthase from the hyperthermophile Sulfolobus solfataricus: possible determinants of protein stability Structure 1995 3: 1295–1306
Knapp S, de Vos WM, Rice D, Ladenstein R . Crystal structure of glutamate dehydrogenase from the hyperthermophilic eubacterium Thermotoga maritima at 3.0 A resolution J Mol Biol 1997 267: 916–932
Hennig M, Sterner R, Kirschner K, Jansonius JN . Crystal structure at 2.0 A resolution of phosphoribosyl anthranilate isomerase from the hyperthermophile Thermotoga maritima: possible determinants of protein stability Biochemistry 1997 36: 6009–6016
Korndorfer I, Steipe B, Huber R, Tomschy A, Jaenicke R . The crystal structure of holo-glyceraldehyde-3-phosphate dehydrogenase from the hyperthermophilic bacterium Thermotoga maritima at 2.5 A resolution J Mol Biol 1995 246: 511–521
Russell RJ, Ferguson JM, Hough DW, Danson MJ, Taylor GL . The crystal structure of citrate synthase from the hyperthermophilic archaeon Pyrococcus furiosus at 1.9 A resolution Biochemistry 1997 36: 9983–9994
Salminen T, Teplyakov A, Kankare J, Cooperman BS, Lahti R, Goldman A . An unusual route to thermostability disclosed by the comparison of Thermus thermophilus and Escherichia coli inorganic pyrophosphatases Protein Sci 1996 5: 1014–1025
Spassov VZ, Karshikoff AD, Ladenstein R . The optimization of protein-solvent interactions: thermostability and the role of hydrophobic and electrostatic interactions Protein Sci 1995 4: 1516–1527
Szilagyi A, Zavodszky P . Structural basis for the extreme thermostability of D-glyceraldehyde-3-phosphate dehydrogenase from Thermotoga maritima: analysis based on homology modelling Protein Eng 1995 8: 779–789
Wallon G, Yamamoto K, Kirino H, Yamagishi A, Lovett ST, Petsko GA et al . Purification, catalytic properties and thermostability of 3-isopropylmalate dehydrogenase from Escherichia coli Biochim Biophys Acta 1997 1337: 105–112
Yip KS, Stillman TJ, Britton KL, Artymiuk PJ, Baker PJ, Sedelnikova SE et al . The structure of Pyrococcus furiosus glutamate dehydrogenase reveals a key role for ion-pair networks in maintaining enzyme stability at extreme temperatures Structure 1995 3: 1147–1158
Kawamura S, Tanaka I, Yamasaki N, Kimura M . Contribution of a salt bridge to the thermostability of DNA binding protein HU from Bacillus stearothermophilus determined by site-directed mutagenesis J Biochem (Tokyo) 1997 121: 448–455
Mande SS, Gupta N, Ghosh A, Mande SC . Homology model of a novel xylanase: molecular basis for high-thermostability and alkaline stability J Biomol Struct Dyn 2000 18: 137–144
Hartley BS, Hanlon N, Jackson RJ, Rangarajan M . Glucose isomerase: insights into protein engineering for increased thermostability Biochim Biophys Acta 2000 1543: 294–335
Qu CC, Akanuma SS, Tanaka NN, Moriyama HH, Oshima TT . Design, X-ray crystallography, molecular modelling and thermal stability studies of mutant enzymes at site 172 of 3-isopropylmalate dehydrogenase from Thermus thermophilus Acta Crystallogr D Biol Crystallogr 2001 57: 225–232
Xiao L, Honig B . Electrostatic contributions to the stability of hyperthermophilic proteins J Mol Biol 1999 289: 1435–1444
Vetriani C, Maeder DL, Tolliday N, Yip KS, Stillman TJ, Britton KL et al . Protein thermostability above 100 degrees C: a key role for ionic interactions Proc Natl Acad Sci U S A 1998 95: 12300–12305
Lebbink JH, Knapp S, van der Oost J, Rice D, Ladenstein R, de Vos WM . Engineering activity and stability of Thermotoga maritima glutamate dehydrogenase. I. Introduction of a six-residue ion-pair network in the hinge region J Mol Biol 1998 280: 287–296
Scholtz JM, Qian H, Robbins VH, Baldwin RL . The energetics of ion-pair and hydrogen-bonding interactions in a helical peptide Biochemistry 1993 32: 9668–9676
Huyghues-Despointes BM, Scholtz JM, Baldwin RL . Effect of a single aspartate on helix stability at different positions in a neutral alanine-based peptide Protein Sci 1993 2: 1604–1611
Russell RB, Barton GB . Multiple protein sequence alignment from tertiary structure comparisons. Assignment of global and residue level confidences Proteins 1992 14: 309–323
Grindley HM, Artymiuk PJ, Rice DW, Willett P . Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm J Mol Biol 1993 229: 707–721
Bono H, Ogata H, Goto S, Kanehisa M . Reconstruction of amino acid biosynthesis pathways from the complete genome sequence Genome Res 1998 8: 203–210
Galperin MY, Koonin EV . Functional genomics and enzyme evolution. Homologous and analogous enzymes encoded in microbial genomes Genetica 1999 106: 159–170
Dandekar T, Schuster S, Snel B, Huynen M, Bork P . Pathway alignment: application to the comparative analysis of glycolytic enzymes Biochem J 1999 343: 115–124
Forst CV, Schulten K . Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information J Comput Biol 1999 6: 343–360
Ogata H, Fujibuchi W, Goto S, Kanehisa M . A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters Nucl Acids Res 2000 28: 4021–4028
Kanehisa M, Goto S . KEGG: kyoto encyclopedia of genes and genomes Nucl Acids Res 2000 28: 27–30
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Das, R., Junker, J., Greenbaum, D. et al. Global perspectives on proteins: comparing genomes in terms of folds, pathways and beyond. Pharmacogenomics J 1, 115–125 (2001). https://doi.org/10.1038/sj.tpj.6500021
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.tpj.6500021