Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Discovering novel biology by in silico archaeology

Key Points

  • Archaea are prokaryotes that have evolved in parallel with bacteria. Since the discovery of the distinct status of the archaea, extensive physiological and biochemical research is starting to reveal the molecular basis of their remarkable lifestyle and unique biology.

  • Analysis of the first completely sequenced archaeal genomes revealed mysterious 'genomescapes', encoding incomplete pathways and many genes for which no function could be assigned. With the development of archaeal model organisms, and efficient genetic systems still being in their infancy, other methods had to be explored to tackle this problem.

  • With over 20 sequenced archaeal genomes available, several conceptually different types of comparative genomics analyses have proven to be a powerful tool for prediction of gene function. As such, these analyses can be used to improve the functional annotation of archaeal genomes and serve as a lead for experimental analysis.

  • In this review, we discuss how these different types of 'genome context' analysis, often in combination with subsequent experimental verification, have resulted in the functional identification of novel archaeal systems and several missing links that continue to exist in archaeal metabolic pathways.

  • In the near future, it is to be expected that the integration of comparative and functional genomics data resulting from large-scale experimental design (systems biology) will greatly contribute to a better understanding of the intriguing biology of the archaea.

Abstract

Archaea are prokaryotes that evolved in parallel with bacteria. Since the discovery of the distinct status of the Archaea, extensive physiological and biochemical research has been conducted to elucidate the molecular basis of their remarkable lifestyle and their unique biology. Here, we discuss how in-depth comparative genomics has been used to improve the annotation of archaeal genomes. Combined with experimental verification, bioinformatic analysis contributes to the ongoing discovery of novel metabolic conversions and control mechanisms, and as such to a better understanding of the intriguing biology of the Archaea.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Archaeal phylogeny.
Figure 2: Detection of novel functions in archaea using different types of genomic context.

Similar content being viewed by others

References

  1. Woese, C. R. & Fox, G. E. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl Acad. Sci. USA 74, 5088–5090 (1977). The hallmark report in which sequence analysis is used to reveal the distinct phylogenetic position of the Archaea.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Pennisi, E. Evolutionary biology. The birth of the nucleus. Science 305, 766–768 (2004).

    Article  CAS  PubMed  Google Scholar 

  3. Woese, C. R. & Gupta, R. Are archaebacteria merely derived 'prokaryotes'? Nature 289, 95–96 (1981).

    Article  CAS  PubMed  Google Scholar 

  4. Fox, G. E. et al. The phylogeny of prokaryotes. Science 209, 457–463 (1980).

    Article  CAS  PubMed  Google Scholar 

  5. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl Acad. Sci. USA 87, 4576–4579 (1990).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Woese, C. R. Default taxonomy: Ernst Mayr's view of the microbial world. Proc. Natl Acad. Sci. USA 95, 11043–11046 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Mayr, E. Two empires or three? Proc. Natl Acad. Sci. USA 95, 9720–9723 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rivera, M. C. & Lake, J. A. Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science 257, 74–76 (1992).

    Article  CAS  PubMed  Google Scholar 

  9. Woese, C. R., Magrum, L. J. & Fox, G. E. Archaebacteria. J. Mol. Evol. 11, 245–251 (1978).

    Article  CAS  PubMed  Google Scholar 

  10. Embley, T. M., Finlay, B. J., Thomas, R. H. & Dyal, P. L. The use of rRNA sequences and fluorescent probes to investigate the phylogenetic positions of the anaerobic ciliate Metopus palaeformis and its archaeobacterial endosymbiont. J. Gen. Microbiol. 138, 1479–1487 (1992).

    Article  CAS  PubMed  Google Scholar 

  11. Vogels, G. D. & Stumm, C. Interactions between methanogenic bacteria and hydrogenic ciliates in the rumen. Antonie Van Leeuwenhoek 46, 108 (1980).

    Article  CAS  PubMed  Google Scholar 

  12. Segerer, A. H. et al. Life in hot springs and hydrothermal vents. Orig. Life Evol. Biosph. 23, 77–90 (1993).

    Article  CAS  PubMed  Google Scholar 

  13. Stetter, K. O. Extremophiles and their adaptation to hot environments. FEBS Lett. 452, 22–25 (1999).

    Article  CAS  PubMed  Google Scholar 

  14. DeLong, E. F. & Pace, N. R. Environmental diversity of bacteria and archaea. Syst. Biol. 50, 470–478 (2001).

    Article  CAS  PubMed  Google Scholar 

  15. Lepp, P. W. et al. Methanogenic Archaea and human periodontal disease. Proc. Natl Acad. Sci. USA 101, 6176–6181 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Webster, N. S., Negri, A. P., Munro, M. M. & Battershill, C. N. Diverse microbial communities inhabit Antarctic sponges. Environ. Microbiol. 6, 288–300 (2004).

    Article  PubMed  Google Scholar 

  17. Venter, J. C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004).

    Article  PubMed  Google Scholar 

  18. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).

    Article  CAS  PubMed  Google Scholar 

  19. Hanford, M. J. & Peeples, T. L. Archaeal tetraether lipids: unique structures and applications. Appl. Biochem. Biotechnol. 97, 45–62 (2002).

    Article  CAS  PubMed  Google Scholar 

  20. Schouten, S., Hopmans, E. C., Pancost, R. D. & Sinninghe Damste, J. Widespread occurrence of structurally diverse tetraether membrane lipids: evidence for the ubiquitous presence of low-temperature relatives of hyperthermophiles. Proc. Natl Acad. Sci. USA 97, 14421–14426 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Doolittle, W. F. & Logsdon, J. M. Jr. Archaeal genomics: do archaea have a mixed heritage? Curr. Biol. 8, R209–R211 (1998).

    Article  CAS  PubMed  Google Scholar 

  22. Koonin, E. V., Mushegian, A. R., Galperin, M. Y. & Walker, D. R. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25, 619–637 (1997).

    Article  CAS  PubMed  Google Scholar 

  23. Crick, F. Central dogma of molecular biology. Nature 227, 561–563 (1970).

    Article  CAS  PubMed  Google Scholar 

  24. Dionne, I. et al. DNA replication in the hyperthermophilic archaeon Sulfolobus solfataricus. Biochem. Soc. Trans. 31, 674–676 (2003).

    Article  CAS  PubMed  Google Scholar 

  25. Lundgren, M., Andersson, A., Chen, L., Nilsson, P. & Bernander, R. Three replication origins in Sulfolobus species: synchronous initiation of chromosome replication and asynchronous termination. Proc. Natl Acad. Sci. USA 101, 7046–7051 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Robinson, N. P. et al. Identification of two origins of replication in the single chromosome of the archaeon Sulfolobus solfataricus. Cell 116, 25–38 (2004).

    Article  CAS  PubMed  Google Scholar 

  27. Myllykallio, H. et al. Bacterial mode of replication with eukaryotic-like machinery in a hyperthermophilic archaeon. Science 288, 2212–2215 (2000).

    Article  CAS  PubMed  Google Scholar 

  28. Bell, S. D. & Jackson, S. P. Transcription in Archaea. Cold Spring Harb. Symp. Quant. Biol. 63, 41–51 (1998).

    Article  CAS  PubMed  Google Scholar 

  29. Huet, J., Schnabel, R., Sentenac, A. & Zillig, W. Archaebacteria and eukaryotes possess DNA-dependent RNA polymerases of a common type. EMBO J. 2, 1291–1294 (1983).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Thomm, M. Archaeal transcription factors and their role in transcription initiation. FEMS Microbiol. Rev. 18, 159–171 (1996).

    Article  CAS  PubMed  Google Scholar 

  31. Kyrpides, N. C. & Woese, C. R. Archaeal translation initiation revisited: the initiation factor 2 and eukaryotic initiation factor 2B α-β-δ subunit families. Proc. Natl Acad. Sci. USA 95, 3726–3730 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 289, 905–920 (2000).

    Article  CAS  PubMed  Google Scholar 

  33. Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).

    Article  CAS  PubMed  Google Scholar 

  34. White, M. F. Archaeal DNA repair: paradigms and puzzles. Biochem. Soc. Trans. 31, 690–693 (2003).

    Article  CAS  PubMed  Google Scholar 

  35. Baumeister, W. & Lupas, A. The proteasome. Curr. Opin. Struct. Biol. 7, 273–278 (1997).

    Article  CAS  PubMed  Google Scholar 

  36. Evguenieva-Hackenburg, E., Walter, P., Hochleitner, E., Lottspeich, F. & Klug, G. An exosome-like complex in Sulfolobus solfataricus. EMBO Rep. 4, 889–893 (2003).

    Article  CAS  Google Scholar 

  37. Koonin, E. V., Wolf, Y. I. & Aravind, L. Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. Genome Res. 11, 240–252 (2001). This manuscript, which reports on the prediction of the archaeal exosome complex, is a nice example of how comparative analysis of archaeal genomes has revealed an important archaeal protein complex.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Rivera, M. C. & Lake, J. A. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431, 152–155 (2004).

    Article  CAS  PubMed  Google Scholar 

  39. Lake, J. A. Origin of the eukaryotic nucleus: eukaryotes and eocytes are genotypically related. Can. J. Microbiol. 35, 109–118 (1989).

    Article  CAS  PubMed  Google Scholar 

  40. Bult, C. J. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1073 (1996). Report of the first completely sequenced archaeal genome, which confirmed their distinct evolutionary position, but also revealed a mysterious genome encoding incomplete pathways and many genes for which no function could be assigned.

    Article  CAS  PubMed  Google Scholar 

  41. Makarova, K. S. & Koonin, E. V. Comparative genomics of Archaea: how much have we learned in six years, and what's next? Genome Biol. 4, 115 (2003). Overview of the current status of comparative archaeal genome analysis, which lists many examples of how bioinformatics has contributed to prediction of protein function.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Galagan, J. E. et al. The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res. 12, 532–542 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Waters, E. et al. The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl Acad. Sci. USA 100, 12984–12988 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Snel, B., Bork, P. & Huynen, M. A. Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 12, 17–25 (2002).

    Article  CAS  PubMed  Google Scholar 

  45. Ettema, T., van der Oost, J. & Huynen, M. Modularity in the gain and loss of genes: applications for function prediction. Trends Genet. 17, 485–487 (2001).

    Article  CAS  PubMed  Google Scholar 

  46. Aravind, L., Watanabe, H., Lipman, D. J. & Koonin, E. V. Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc. Natl Acad. Sci. USA 97, 11319–11324 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Diruggiero, J. et al. Evidence of recent lateral gene transfer among hyperthermophilic archaea. Mol. Microbiol. 38, 684–693 (2000).

    Article  CAS  PubMed  Google Scholar 

  48. Maeder, D. L. et al. Divergence of the hyperthermophilic archaea Pyrococcus furiosus and P. horikoshii inferred from complete genomic sequences. Genetics 152, 1299–1305 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Makarova, K. S. et al. Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res. 9, 608–628 (1999).

    CAS  PubMed  Google Scholar 

  50. Wolf, Y. I., Rogozin, I. B., Kondrashov, A. S. & Koonin, E. V. Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res. 11, 356–372 (2001).

    Article  CAS  PubMed  Google Scholar 

  51. Huynen, M. A. & Snel, B. Gene and context: integrative approaches to genome analysis. Adv. Protein Chem. 54, 345–379 (2000). Comprehensive overview of how the different types of genomic information can be used for prediction of protein function.

    Article  CAS  PubMed  Google Scholar 

  52. Ibba, M. & Soll, D. Aminoacyl-tRNAs: setting the limits of the genetic code. Genes Dev. 18, 731–738 (2004).

    Article  CAS  PubMed  Google Scholar 

  53. Randau, L., Munch, R., Hohn, M. J., Jahn, D. & Soll, D. Nanoarchaeum equitans creates functional tRNAs from separate genes for their 5′- and 3′-halves. Nature 433, 537–541 (2005).

    Article  CAS  PubMed  Google Scholar 

  54. Tumbula, D. et al. Archaeal aminoacyl-tRNA synthesis: diversity replaces dogma. Genetics 152, 1269–1276 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Randau, L., Pearson, M. & Soll, D. The complete set of tRNA species in Nanoarchaeum equitans. FEBS Lett. 579, 2945–2947 (2005).

    Article  CAS  PubMed  Google Scholar 

  56. Snel, B., Bork, P. & Huynen, M. Genome evolution. Gene fusion versus gene fission. Trends Genet. 16, 9–11 (2000).

    Article  CAS  PubMed  Google Scholar 

  57. Makarova, K. S., Aravind, L., Grishin, N. V., Rogozin, I. B. & Koonin, E. V. A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res. 30, 482–496 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Guy, C. P., Majernik, A. I., Chong, J. P. & Bolt, E. L. A novel nuclease-ATPase (Nar71) from archaea is part of a proposed thermophilic DNA repair system. Nucleic Acids Res. 32, 6176–6186 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Korbel, J. O., Jensen, L. J., von Mering, C. & Bork, P. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nature Biotechnol. 22, 911–917 (2004).

    Article  CAS  Google Scholar 

  60. Ettema, T. J., Huynen, M. A., de Vos, W. M. & van der Oost, J. TRASH: a novel metal-binding domain predicted to be involved in heavy-metal sensing, trafficking and resistance. Trends Biochem. Sci. 28, 170–173 (2003).

    Article  CAS  PubMed  Google Scholar 

  61. Daugherty, M., Vonstein, V., Overbeek, R. & Osterman, A. Archaeal shikimate kinase, a new member of the GHMP-kinase family. J. Bacteriol. 183, 292–300 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Makarova, K. S. & Koonin, E. V. Filling a gap in the central metabolism of archaea: prediction of a novel aconitase by comparative-genomic analysis. FEMS Microbiol. Lett. 227, 17–23 (2003).

    Article  CAS  PubMed  Google Scholar 

  63. Galperin, M. Y., Aravind, L. & Koonin, E. V. Aldolases of the DhnA family: a possible solution to the problem of pentose and hexose biosynthesis in archaea. FEMS Microbiol. Lett. 183, 259–264 (2000).

    Article  CAS  PubMed  Google Scholar 

  64. Siebers, B. et al. Archaeal fructose-1,6-bisphosphate aldolases constitute a new family of archaeal type class I aldolase. J. Biol. Chem. 276, 28710–28718 (2001).

    Article  CAS  PubMed  Google Scholar 

  65. van der Oost, J., Huynen, M. A. & Verhees, C. H. Molecular characterization of phosphoglycerate mutase in archaea. FEMS Microbiol. Lett. 212, 111–120 (2002).

    Article  CAS  PubMed  Google Scholar 

  66. Ettema, T. J. et al. Identification and functional verification of archaeal-type phosphoenolpyruvate carboxylase, a missing link in archaeal central carbohydrate metabolism. J. Bacteriol. 186, 7754–7762 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Morett, E. et al. Systematic discovery of analogous enzymes in thiamin biosynthesis. Nature Biotechnol. 21, 790–795 (2003).

    Article  CAS  Google Scholar 

  68. Rodionov, D. A., Vitreschak, A. G., Mironov, A. A. & Gelfand, M. S. Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms. J. Biol. Chem. 277, 48949–48959 (2002).

    Article  CAS  PubMed  Google Scholar 

  69. Gabaldon, T. & Huynen, M. A. Prediction of protein function and pathways in the genome era. Cell. Mol. Life Sci. 61, 930–944 (2004).

    Article  CAS  PubMed  Google Scholar 

  70. Bisbal, C., Martinand, C., Silhol, M., Lebleu, B. & Salehzada, T. Cloning and characterization of a RNAse L inhibitor. A new component of the interferon-regulated 2–5A pathway. J. Biol. Chem. 270, 13308–133s17 (1995).

    Article  CAS  PubMed  Google Scholar 

  71. Zimmerman, C. et al. Identification of a host protein essential for assembly of immature HIV-1 capsids. Nature 415, 88–92 (2002).

    Article  CAS  PubMed  Google Scholar 

  72. Estevez, A. M., Haile, S., Steinbuchel, M., Quijada, L. & Clayton, C. Effects of depletion and overexpression of the Trypanosoma brucei ribonuclease L inhibitor homologue. Mol. Biochem. Parasitol. 133, 137–141 (2004).

    Article  CAS  PubMed  Google Scholar 

  73. Kispal, G. et al. Biogenesis of cytosolic ribosomes requires the essential iron–sulphur protein Rli1p and mitochondria. EMBO J. 24, 589–598 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Yarunin, A. et al. Functional link between ribosome formation and biogenesis of iron-sulfur proteins. EMBO J. 24, 580–588 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. De Rosa, M. et al. Glucose metabolism in the extreme thermoacidophilic archaebacterium Sulfolobus solfataricus. Biochem. J. 224, 407–414 (1984).

    Article  CAS  PubMed  Google Scholar 

  76. Ahmed, H. et al. The semi-phosphorylative Entner-Doudoroff pathway in hyperthermophilic archaea — a re-evaluation. Biochem. J. 4 May 2005 [epub ahead of print].

  77. Velculescu, V. E., Zhang, L., Vogelstein, B. & Kinzler, K. W. Serial analysis of gene expression. Science 270, 484–487 (1995).

    Article  CAS  PubMed  Google Scholar 

  78. Chien, C. T., Bartel, P. L., Sternglanz, R. & Fields, S. The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl Acad. Sci. USA 88, 9578–9582 (1991).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Covert, M. W., Knight, E. M., Reed, J. L., Herrgard, M. J. & Palsson, B. O. Integrating high-throughput and computational data elucidates bacterial networks. Nature 429, 92–96 (2004).

    Article  CAS  PubMed  Google Scholar 

  80. Luscombe, N. M. et al. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308–312 (2004).

    Article  CAS  PubMed  Google Scholar 

  81. Hood, L. & Galas, D. The digital code of DNA. Nature 421, 444–448 (2003).

    Article  CAS  PubMed  Google Scholar 

  82. Baliga, N. S. et al. Systems level insights into the stress response to UV radiation in the halophilic archaeon Halobacterium NRC-1. Genome Res. 14, 1025–1035 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  84. Pearson, W. R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1990).

    Article  CAS  PubMed  Google Scholar 

  85. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Eddy, S. R. Multiple alignment using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 114–120 (1995).

    CAS  PubMed  Google Scholar 

  87. Ponting, C. P., Schultz, J., Copley, R. R., Andrade, M. A. & Bork, P. Evolution of domain families. Adv. Protein Chem. 54, 185–244 (2000).

    Article  CAS  PubMed  Google Scholar 

  88. Jacob, F. Complexity and tinkering. Ann. N. Y. Acad. Sci. 929, 71–73 (2001).

    Article  CAS  PubMed  Google Scholar 

  89. Jacob, F. Evolution and tinkering. Science 196, 1161–1166 (1977).

    Article  CAS  PubMed  Google Scholar 

  90. Letunic, I. et al. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32, D142–D144 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 32, D138–D141 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Marchler-Bauer, A. et al. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31, 383–387 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  93. Hulo, N. et al. Recent improvements to the PROSITE database. Nucleic Acids Res. 32, D134–D137 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Fitch, W. M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970). Important publication in which different homologous relations are discussed and the term 'orthology' is introduced.

    Article  CAS  PubMed  Google Scholar 

  95. Sonnhammer, E. L. & Koonin, E. V. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 18, 619–620 (2002).

    Article  CAS  PubMed  Google Scholar 

  96. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997). This report describes the classification of proteins encoded by completely sequenced genomes on the basis of the orthology concept. These 'clusters of orthologous genes', or COGs, are implemented in the COG database.

    Article  CAS  PubMed  Google Scholar 

  97. Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).

    Article  CAS  PubMed  Google Scholar 

  98. Huynen, M. A. & Bork, P. Measuring genome evolution. Proc. Natl Acad. Sci. USA 95, 5849–5856 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998).

    Article  CAS  PubMed  Google Scholar 

  100. Huynen, M., Snel, B., Lathe, W. & Bork, P. Exploitation of gene context. Curr. Opin. Struct. Biol. 10, 366–370 (2000).

    Article  CAS  PubMed  Google Scholar 

  101. Huynen, M., Snel, B., Lathe, W. 3rd & Bork, P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Galperin, M. Y. & Koonin, E. V. Who's your neighbor? New computational approaches for functional genomics. Nature Biotechnol. 18, 609–613 (2000).

    Article  CAS  Google Scholar 

  103. Beck, C. F. & Warren, R. A. Divergent promoters, a common form of gene organization. Microbiol. Rev. 52, 318–326 (1988).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Koonin, E. V., Mushegian, A. R. & Bork, P. Non-orthologous gene displacement. Trends Genet. 12, 334–336 (1996).

    Article  CAS  PubMed  Google Scholar 

  105. Marcotte, E. M. et al. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999).

    Article  CAS  PubMed  Google Scholar 

  106. Enright, A. J., Iliopoulos, I., Kyrpides, N. C. & Ouzounis, C. A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999).

    Article  CAS  PubMed  Google Scholar 

  107. Sinninghe Damste, J. S. et al. Distribution of membrane lipids of planktonic Crenarchaeota in the Arabian Sea. Appl. Environ. Microbiol. 68, 2997–3002 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Ochsenreiter, T., Selezi, D., Quaiser, A., Bonch-Osmolovskaya, L. & Schleper, C. Diversity and abundance of Crenarchaeota in terrestrial habitats studied by 16S RNA surveys and real time PCR. Environ. Microbiol. 5, 787–797 (2003).

    Article  CAS  PubMed  Google Scholar 

  109. Huber, H. et al. A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature 417, 63–67 (2002).

    Article  CAS  PubMed  Google Scholar 

  110. Huber, H., Hohn, M. J., Stetter, K. O. & Rachel, R. The phylum Nanoarchaeota: present knowledge and future perspectives of a unique form of life. Res. Microbiol. 154, 165–171 (2003).

    Article  CAS  PubMed  Google Scholar 

  111. Kawarabayasi, Y. et al. Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. DNA Res. 6, 83–101, 145–152 (1999).

    Article  CAS  PubMed  Google Scholar 

  112. Fitz-Gibbon, S. T. et al. Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. Proc. Natl Acad. Sci. USA 99, 984–989 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. She, Q. et al. The complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc. Natl Acad. Sci. USA 98, 7835–7840 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Kawarabayasi, Y. et al. Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7. DNA Res. 8, 123–140 (2001).

    Article  CAS  PubMed  Google Scholar 

  115. Klenk, H. P. et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390, 364–370 (1997).

    Article  CAS  PubMed  Google Scholar 

  116. Ng, W. V. et al. Genome sequence of Halobacterium species NRC-1. Proc. Natl Acad. Sci. USA 97, 12176–12181 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Baliga, N. S. et al. Genome sequence of Haloarcula marismortui: a halophilic archaeon from the Dead Sea. Genome Res. 14, 2221–2234 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Deppenmeier, U. et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 4, 453–461 (2002).

    CAS  PubMed  Google Scholar 

  119. Hendrickson, E. L. et al. Complete genome sequence of the mesophilic hydrogenotrophic methanogen Methanococcus maripaludis. J. Bacteriol. 186, 6956–6969 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Slesarev, A. I. et al. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc. Natl Acad. Sci. USA 99, 4644–4649 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Smith, D. R. et al. Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J. Bacteriol. 179, 7135–7155 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Cohen, G. N. et al. An integrated analysis of the genome of the hyperthermophilic archaeon Pyrococcus abyssi. Mol. Microbiol. 47, 1495–1512 (2003).

    Article  CAS  PubMed  Google Scholar 

  123. Robb, F. T. et al. Genomic sequence of hyperthermophile, Pyrococcus furiosus: implications for physiology and enzymology. Methods Enzymol. 330, 134–157 (2001).

    Article  CAS  PubMed  Google Scholar 

  124. Kawarabayasi, Y. et al. Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. DNA Res. 5, 55–76 (1998).

    Article  CAS  PubMed  Google Scholar 

  125. Fukui, T. et al. Complete genome sequence of the hyperthermophilic archaeon Thermococcus kodakaraensis KOD1 and comparison with Pyrococcus genomes. Genome Res. 15, 352–363 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Futterer, O. et al. Genome sequence of Picrophilus torridus and its implications for life around pH 0. Proc. Natl Acad. Sci. USA 101, 9091–9096 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Ruepp, A. et al. The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum. Nature 407, 508–513 (2000).

    Article  CAS  PubMed  Google Scholar 

  128. Kawashima, T. et al. Archaeal adaptation to higher temperatures revealed by genomic sequence of Thermoplasma volcanium. Proc. Natl Acad. Sci. USA 97, 14257–14262 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Huang, S. L. et al. PGTdb: a database providing growth temperatures of prokaryotes. Bioinformatics 20, 276–278 (2004).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank B. Snel (CMBI, NCMLS, Radboud University Nijmegen Medical Centre, The Netherlands) for construction of Figure 1b.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Thijs J. G. Ettema or John van der Oost.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

DATABASES

Entrez

Escherichia coli

HIV-1

Methanocaldococcus jannaschii

Methanosarcina acetivorans

Methanothermobacter thermautotrophicus

Nanoarchaeum equitans

Pyrococcus abyssi

Pyrococcus furiosus

Pyrococcus horikoshii

Sulfolobus solfataricus

Thermotoga maritima

FURTHER INFORMATION

Willem M. de Vos's laboratory

The COG database

Glossary

OPERON

A set of genes that share a common promoter element and are controlled as a single transcriptional unit, producing a polycistronic messenger. An operon consists of two or more structural genes, which usually encode proteins with a related function, for example, proteins that are part of the same metabolic pathway or protein complex.

PROTEASOME

A large, cylinder-shaped protease, consisting of several homologous subunits, that is found in all domains of life and has a crucial role in cellular protein turnover processes such as protein quality control, antigen processing, signal transduction, cell-cycle control, cell differentiation and apoptosis.

EXOSOME

A multi-subunit protein complex comprising RNases, RNA-binding proteins and helicases that coordinately mediate the processing and 3′–5′ degradation of a wide variety of RNA species.

NON-ORTHOLOGOUS GENE DISPLACEMENT

Occurrence in which a gene is replaced by another, non-orthologous gene that carries out the same function. This is often observed in archaeal genomes, which encode many diverged and non-canonical proteins and enzymes.

DIVERGON

Type of gene organization in which genes are divergently organized ('juxtapositioned'), thereby sharing cis-regulatory elements, enabling co-regulation. This type of gene organization is often observed with a transcriptional regulator and its target genes or operon.

CO-OCCURRENCE

Genes that are functionally related tend to be present or absent together ('co-occur') on different genomes. As such, this type of information can be used to predict functions for specific genes or proteins that have a similar phylogenetic distribution.

PHYLOGENETIC PATTERN

Term describing the presence or absence of a certain gene across genomes of different species, reflecting the differential acquisition and loss of this gene along the various evolutionary lineages. Complementary phylogenetic patterns are the blueprint of non-orthologous gene displacements.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ettema, T., de Vos, W. & van der Oost, J. Discovering novel biology by in silico archaeology. Nat Rev Microbiol 3, 859–869 (2005). https://doi.org/10.1038/nrmicro1268

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrmicro1268

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing