Skip to main content
Log in

Application of the Character Compatibility Approach to Generalized Molecular Sequence Data: Branching Order of the Proteobacterial Subdivisions

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

The character compatibility approach, which removes all homoplasic characters and involves finding the largest clique of compatible characters in a dataset, in principle, provides a powerful means for obtaining correct topology in difficult to resolve cases. However, the usefulness of this approach to generalized molecular sequence data for phylogeny determination has not been studied in the past. We have used this approach to determine the topology of 23 proteobacterial species (6 each of α-, β- and γ-, 3 δ-, and 2 ε-proteobacteria) using sequence data for 10 conserved proteins (Hsp60, Hsp70, EF-Tu, EF-G, alanyl-tRNA synthetase, RecA, GyrA, GyrB, RpoB and RpoC). All sites in the sequence alignments of these proteins where only two amino acids were found, with each amino acid present in at least two species, were selected. Mutual compatibility determination on these binary state sites was carried out by two means. In one case, all of these sites were combined into a large dataset (Set A; 957 characters) prior to compatibility analysis. In the second case, compatibility analysis was carried out on characters from individual proteins and all compatible sites were combined into a large dataset (Set B; 398 characters) for further studies. Upon compatibility analyses, the largest cliques that were obtained from Sets A and B consisted of 337 and 323 compatible characters, respectively. In these cliques, all proteobacterial subgroups were clearly distinguished and branching orders of most of the species were also resolved. The ε-proteobacteria exhibited the earliest branching, whereas the β- and γ-subgroups were found to have emerged last. The relative placement of the α- and δ-subgroups, however, was not resolved. The topology of these species was also determined based on 16S rRNA sequences and a concatenated dataset of sequences for all 10 proteins by means of neighbor-joining, maximum likelihood, and maximum parsimony methods. In the protein trees, all proteobacterial groups were reliably resolved and they branched in the following order: (ε(δ(α(β,γ)))). However, in the rRNA trees, the γ- and β-subgroups exhibited polyphyletic branching and many internal nodes were not resolved. These results indicate that the character compatibility analysis using generalized molecular sequence data provides a powerful means for evolutionary studies. Based on molecular sequences, it should be possible to obtain very large datasets of compatible characters that should prove very helpful in clarifying difficult to resolve phylogenetic relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF (2000) A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972–977

    Article  PubMed  CAS  Google Scholar 

  • Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA 102:14332–14337

    Article  PubMed  CAS  Google Scholar 

  • Bron C, Lerbosch J (1973) Alogrithm 457:Finding all cliques of an undirected graph. Commun Assoc Comput Mach 16:575–577

    Google Scholar 

  • Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ (2001) Universal trees based on large combined protein sequence data sets. Nat Genet 28:281–285

    Article  PubMed  CAS  Google Scholar 

  • Buneman P (1971) The recovery of trees from measures of dissimilarity. In: Hodson FR, Kendall DG, Tautu P (eds) Mathematics in the archaeological and historical sciences. Edinburgh University Press, Edinburgh, pp 387–395

    Google Scholar 

  • Creevey CJ, Fitzpatrick DA, Philip GK, Kinsella RJ, O’Connell MJ, Pentony MM, Travers SA, Wilkinson M, McInerney JO (2004) Does a tree-like phylogeny only exist at the tips in the prokaryotes? Proc Biol Sci 271:2551–2558

    Article  PubMed  CAS  Google Scholar 

  • Daubin V, Gouy M, Perriere G (2002) A phylogenomic approach to bacterial phylogeny:evidence of a core of genes sharing a common history. Genome Res 12:1080–1090

    Article  PubMed  CAS  Google Scholar 

  • De Ley J (1992) The Proteobacteria: ribosomal RNA cistron similarities and bacterial taxonomy. In: Balows A, Trüper HG, Dworkin M, Harder W, Schleifer KH (eds) The prokaryotes. Springer-Verlag, New York, pp 2111–2140

    Google Scholar 

  • Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361–375

    Article  PubMed  CAS  Google Scholar 

  • Eisen JA (1995) The RecA protein as a model molecule for molecular systematic studies of bacteria:comparison of trees of RecAs and 16S rRNAs from the same species. J Mol Evol 41:1105–1123

    Article  PubMed  CAS  Google Scholar 

  • Erwin DH, Davidson EH (2002) The last common bilaterian ancestor. Development 129:3021–3032

    PubMed  CAS  Google Scholar 

  • Estabrook GF, McMorris FR (1980) When is one estimate of evolutionary relationship a refinement of another? J Math Biol 10:367–373

    Article  Google Scholar 

  • Estabrook GF, Johnson CS Jr, McMorris FR (1976) A mathematical foundation for the analysis of cladistic character compatibility. Math Biosci 29:181–187

    Article  Google Scholar 

  • Felsenstein J (1978) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410

    Article  Google Scholar 

  • Felsenstein J (1981a) A likelihood approach to character weighting and what it tells us about parsimony and compatibility. Biol J Linn Soc 16:183–196

    Google Scholar 

  • Felsenstein J (1981b) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    Article  CAS  Google Scholar 

  • Felsenstein J (1985) Confidence limits in phylogenies: an approach using the bootstap. Evolution 39:783–791

    Article  Google Scholar 

  • Felsenstein J (1993) PHYLIP, version 3.5c. University of Washington, Seattle

    Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA

    Google Scholar 

  • Fitch WM (1971) Toward defining the course of evolution:minimum change for a specified tree topology. Syst Zool 20:406–416

    Article  Google Scholar 

  • Fitch WM (1975) Towards finding the tree of maximum parsimony. In: Estabrook GF (ed) Proceedings of the Eighth International Conference on Numerical Taxonomy. W. H. Freeman, San Francisco, pp 189–230

    Google Scholar 

  • Gogarten JP, Doolittle WF, Lawrence JG (2002) Prokaryotic evolution in light of gene transfer. Mol Biol Evol 19:2226–2238

    PubMed  CAS  Google Scholar 

  • Gophna U, Doolittle WF, Charlebois RL (2005) Weighted genome trees:refinements and applications. J Bacteriol 187:1305–1316

    Article  PubMed  CAS  Google Scholar 

  • Griffiths E, Gupta RS (2004) Signature sequences in diverse proteins provide evidence for the late divergence of the order Aquificales. Int Microbiol 7:41–52

    PubMed  CAS  Google Scholar 

  • Griffiths E, Ventresca MS, Gupta RS (2006) BLAST screening of chlamydial genomes to identify signature proteins that are unique for the Chlamydiales, Chlamydiaceae, Chlamydophila and Chlamydia groups of species. BMC Genomics 7:14

    Article  PubMed  CAS  Google Scholar 

  • Gupta RS (1995) Phylogenetic analysis of the 90 kD heat shock family of protein sequences and an examination of the relationship among animals, plants, and fungi species. Mol Biol Evol 12:1063–1073

    PubMed  CAS  Google Scholar 

  • Gupta RS (1998) Protein phylogenies and signature sequences: a reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol Mol Biol Rev 62:1435–1491

    PubMed  CAS  Google Scholar 

  • Gupta RS (2000) The phylogeny of Proteobacteria: relationships to other eubacterial phyla and eukaryotes. FEMS Microbiol Rev 24:367–402

    Article  PubMed  CAS  Google Scholar 

  • Gupta RS (2001) The branching order and phylogenetic placement of species from completed bacterial genomes, based on conserved indels found in various proteins. Inter Microbiol 4:187–202

    Article  CAS  Google Scholar 

  • Gupta RS (2003) Evolutionary relationships among photosynthetic bacteria. Photosynth Res 76:173–183

    Article  PubMed  CAS  Google Scholar 

  • Gupta RS (2005) Protein signatures distinctive of Alpha proteobacteria and its subgroups and a model for alpha proteobacterial evolution. Crit Rev Microbiol 31:135

    Article  CAS  Google Scholar 

  • Gupta RS (2006) Molecular signatures (unique proteins and conserved Indels) that are specific for the epsilon proteobacteria (Campylobacterales) BMC Genomics 7:167

    Article  PubMed  CAS  Google Scholar 

  • Gupta RS, Griffiths E (2002) Critical issues in bacterial phylogenies. Theor Popul Biol 61:423–434

    Article  PubMed  Google Scholar 

  • Harris JK, Kelley ST, Spiegelman GB, Pace NR (2003) The genetic core of the universal ancestor. Genome Res 13:407–412

    Article  PubMed  CAS  Google Scholar 

  • Hasegawa M, Fujiwara M (1993) Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. Mol Phylogenet Evol 2:1–5

    Article  PubMed  Google Scholar 

  • Huelsenbeck JP, Bollback JP (2001) Empirical and hierarchical Bayesian estimation of ancestral states. Syst Biol 50:351–366

    Article  PubMed  CAS  Google Scholar 

  • Jeanmougin F, Thompson JD, Gouy M, Higgins DG, Gibson TJ (1998) Multiple sequence alignment with Clustal x. Trends Biochem Sci 23:403–405

    Article  PubMed  CAS  Google Scholar 

  • Kainth P, Gupta RS (2005) Signature proteins that are distinctive of alpha proteobacteria. BMC Genomics 6:94

    Article  PubMed  CAS  Google Scholar 

  • Kannan S, Warnow TJ (1995) Inferring evolutionary history from DNA sequences. SIAM J Comput 23:713–737

    Article  Google Scholar 

  • Kersters K, Devos P, Gillis M, Vandamme P, Stackebrandt E (2003) Introduction to the proteobacteria. In: Dworkin M (ed) The prokaryotes:an evolving electronic resource for the microbiological community. Springer-Verlag, New York

    Google Scholar 

  • Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

    Article  PubMed  CAS  Google Scholar 

  • Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge

    Google Scholar 

  • Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 29:170–179

    Article  PubMed  CAS  Google Scholar 

  • Kumar S, Tamura K, Nei M (2004) MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5:150–163

    Article  PubMed  CAS  Google Scholar 

  • Kunisawa T (2001) Gene arrangements and phylogeny in the class Proteobacteria. J Theor Biol 213:9–19

    Article  PubMed  CAS  Google Scholar 

  • Kunisawa T (2006) Dichotomy of major bacterial phyla inferred from gene arrangement comparisons. J Theor Biol 239:367–375

    Article  PubMed  CAS  Google Scholar 

  • Lake JA, Rivera MC (2004) Deriving the genomic tree of life in the presence of horizontal gene transfer:conditioned reconstruction. Mol Biol Evol 21:681–690

    Article  PubMed  CAS  Google Scholar 

  • Le Quesne WJ (1969) A method of selection of characters in numerical taxonomy. Syst Zool 18:201–205

    Article  Google Scholar 

  • Le Quesne WJ (1975) The uniquely evolved character concept and its cladistic application. Syst Zool 23:513–517

    Article  Google Scholar 

  • Ludwig W, Klenk H-P (2001) Overview: a phylogenetic backbone and taxonomic framework for prokaryotic systamatics. In: Boone DR, Castenholz RW (eds) Bergey’s manual of systematic bacteriology. Springer-Verlag, Berlin, pp 49–65

    Google Scholar 

  • Maidak BL, Cole JR, Lilburn TG, Parker CT, Jr., Saxman PR, Farris RJ, Garrity GM, Olsen GJ, Schmidt TM, Tiedje JM (2001) The RDP-II (Ribosomal Database Project). Nucleic Acids Res 29:173–174

    Article  PubMed  CAS  Google Scholar 

  • Meacham CA (1994) Phylogenetic relationships at the basal radiation of angiosperms: further study by probability of character compatibilityy. Syst Bot 19:506–522

    Article  Google Scholar 

  • Meacham CA, Estabrook GF (1985) Comaptibility methods in systematics. Annu Rev Ecol Syst 16:431–446

    Article  Google Scholar 

  • Nielsen C (2003) Defining phyla: morphological and molecular clues to metazoan evolution. Evol Dev 5:386–393

    Article  PubMed  Google Scholar 

  • O’Keefe FR, Wagner PJ (2001) Inferring and testing hypthoses of cladistic character dependence by using character compatibility. Syst Bot 50:657–675

    CAS  Google Scholar 

  • Ochman H (2001) Lateral and oblique gene transfer. Curr Opin Genet Dev 11:616–619

    Article  PubMed  CAS  Google Scholar 

  • Olsen GJ, Woese CR, Overbeek R (1994) The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol 176:1–6

    PubMed  CAS  Google Scholar 

  • Penny D (1976) Criteria for optimising phylogenetic trees and the problem of determining the root of a tree. J Mol Evol 8:95–116

    Article  PubMed  CAS  Google Scholar 

  • Pisani D (2004) Identifying and removing fast-evolving sites using compatibility analysis: an example from the Arthropoda. Syst Biol 53:978–989

    Article  PubMed  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    PubMed  CAS  Google Scholar 

  • Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504

    Article  PubMed  CAS  Google Scholar 

  • Semple C, Steel M (2003) Phylogenetics. Oxford University Press, Oxford

    Google Scholar 

  • Sneath PHA (2001) Numerical taxonomy. In: Boone DR, Castenholz RW (eds) Bergey’s manual of systematic bacteriology. Springer-Verlag, Berlin, pp 39–42

    Google Scholar 

  • Sneath PHA, Sackin MJ, Ambler RP (1975) Detecting evolutionary incompatibilities from protein sequences. Syst Zool 24:311–332

    Article  Google Scholar 

  • Stackebrandt E, Murray RGE, Trüper HG (1988) Proteobacteria classis nov., a name for the phylogenetic taxon that includes the “purple bacteria and their relatives.” Int J Syst Bacteriol 38:321–325

    Google Scholar 

  • Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512–526

    PubMed  CAS  Google Scholar 

  • Tateno Y, Takezei N, Nei M (1994) Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum parsimony methods when substitution rate varies with site. Mol Biol Evol 12:261–277

    Google Scholar 

  • Van de Peer Y, De Wachter R (1994) TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput Appl Biosci 10:569–570

    PubMed  Google Scholar 

  • Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699

    PubMed  CAS  Google Scholar 

  • Wilkinson M (2001) PICA 4.0: software and documentation. Department of Zoology, Natural History Museum, London

    Google Scholar 

  • Wilkinson M, Cotton JA, Creevey C, Eulenstein O, Harris SR, Lapointe FJ, Levasseur C, McInerney JO, Pisani D, Thorley JL (2005) The shape of supertrees to come:tree shape related properties of fourteen supertree methods. Syst Biol 54:419–431

    Article  PubMed  Google Scholar 

  • Wilmotte A, Herdman M (2001) Phylogenetic relationships among the cyanobacteria based on 16S rRNA sequences. In: Boone DR, Castenholz RW (eds) Bergey’s manual of systematic bacteriology. Springer, New York, pp 487–493

    Google Scholar 

  • Wilson EO (1965) A consistency test for phylogenies based on contemporaneous species. Syst Zool 14:214–220

    Article  Google Scholar 

Download references

Acknowledgments

We thank Yan Li for writing the computer algorithms for the DUALSITE and the HARMONY programs. The work from R.S.G.’s lab, including support for Yan Li, was through a grant from the National Science and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Radhey S. Gupta.

Additional information

[Reviewing Editor: Dr. Yves Van de Peer]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, R.S., Sneath, P.H.A. Application of the Character Compatibility Approach to Generalized Molecular Sequence Data: Branching Order of the Proteobacterial Subdivisions. J Mol Evol 64, 90–100 (2007). https://doi.org/10.1007/s00239-006-0082-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-006-0082-2

Keywords

Navigation