Aligning amino acid sequences: Comparison of commonly used methods

Feng, D. F.; Johnson, M. S.; Doolittle, R. F.

doi:10.1007/BF02100085

Aligning amino acid sequences: Comparison of commonly used methods

Published: February 1985

Volume 21, pages 112–125, (1985)
Cite this article

Journal of Molecular Evolution Aims and scope Submit manuscript

D. F. Feng¹,
M. S. Johnson¹ &
R. F. Doolittle¹

422 Accesses
370 Citations
3 Altmetric
Explore all metrics

Summary

We examined two extensive families of protein sequences using four different alignment schemes that employ various degrees of “weighting” in order to determine which approach is most sensitive in establishing relationships. All alignments used a similarity approach based on a general algorithm devised by Needleman and Wunsch. The approaches included a simple program, UM (unitary matrix), whereby only identities are scored; a scheme in which the genetic code is used as a basis for weighting (GC); another that employs a matrix based on structural similarity of amino acids taken together with the genetic basis of mutation (SG); and a fourth that uses the empirical log-odds matrix (LOM) developed by Dayhoff on the basis of observed amino acid replacements. The two sequence families examined were (a) nine different globins and (b) nine different tyrosine kinase-like proteins. It was assumed a priori that all members of a family share common ancestry. In cases where two sequences were more than 30% identical, alignments by all four methods were almost always the same. In cases where the percentage identity was less than 20%, however, there were often significant differences in the alignments. On the average, the Dayhoff LOM approach was the most effective in verifying distant relationships, as judged by an empirical “jumbling test.” This was not universally the case, however, and in some instances the simple UM was actually as good or better. Trees constructed on the basis of the various alignments differed with regard to their limb lengths, but had essentially the same branching orders. We suggest some reasons for the different effectivenesses of the four approaches in the two different sequence settings, and offer some rules of thumb for assessing the significance of sequence relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets

Article Open access 24 October 2014

Quantiprot - a Python package for quantitative analysis of protein sequences

Article Open access 17 July 2017

References

Barker WC, Dayhoff MO (1982) Viral src gene products are related to the catalytic chain of mammalian cAMP-dependent protein kinase. Proc Natl Sci USA 79:2836–2839
Google Scholar
Dayhoff MO (1972) A model of evolutionary change in proteins. Detecting distant relationships: computer methods and results. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5. National Biomedical Research Foundation, Washington, DC, pp 89–110
Google Scholar
Dayhoff MO (1978) A model of evolutionary change in proteins. Matriees for detecting distant relationships. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5, suppl 3. National Biomedical Research Foundation, Washington, DC, 345–358
Google Scholar
Dayhoff MO, Barker WC, Hunt LT (1983) Establishing homologies in protein sequences. Methods Enzymol 91:524–545
PubMed Google Scholar
Doolittle RF (1979) Protein evolution. In: Neurath H, Hill RL (eds) The proteins, vol IV. Academic Press, New York, pp 1–118
Google Scholar
Doolittle RF (1981) Similar amino acid sequences: chance or common ancestry? Science 214:149–159
PubMed Google Scholar
Fitch WM (1966) An improved method of testing for evolutionary homology. J Mol Biol 16:9–16
PubMed Google Scholar
Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 15:279–284
Google Scholar
Fitch WM, Smith TF (1982) Implications of minimal length trees. Syst Zool 31:68–75
Google Scholar
Garlick RL, Riggs AF (1982) The amino acid sequence of a major polypeptide chain of earthworm hemoglobin. J Biol Chem 257:9005–9015
PubMed Google Scholar
Goodman M, Moore GW, Barnabas J (1974) The phylogeny of human globin genes investigated by the maximum parsimony method. J Mol Evol 3:1–48
Article PubMed Google Scholar
Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162:705–708
Article PubMed Google Scholar
Haber JE, Koshland DE Jr (1970) An evaluation of the relatedness of proteins based on comparison of amino acid sequences. J Mol Biol 50:617–639
Article PubMed Google Scholar
Hampe A, Laprevotte I, Galibert F (1982) Nucleotide sequences of feline retroviral oncogenes (v-fes) provide evidence for a family of tyrosine-specific protein kinase genes. Cell 30:775–785.
Article PubMed Google Scholar
Hampe A, Gobet M, Sherr CJ, Galibert F (1984) Nucleotide sequences of the feline retroviral oncogene v-fms shows unexpected homology with oncogenesencoding tyrosine-specific protein kinases. Proc Natl Acad Sci USA 81:85–89
PubMed Google Scholar
Keim P, Heinrikson RL, Fitch WM (1981) An examination of the expected degree of sequence similarity that might arise in proteins that have converged to similar conformational states. J Mol Biol. 151:179–197
Article PubMed Google Scholar
Kernighan BW, Ritchie DM (1978) The C programming language. Prentice-Hall, Englewood Cliffs, New Jersey
Google Scholar
Kitamura N, Kitamura A, Toyoshima K, Hirayama Y, Yoshida M (1982) Avian sarcoma virus Y73 genome sequence and structural similarity of its transforming gene product to that of Rous sarcoma virus. Nature 297:205–208
Article PubMed Google Scholar
Liljeqvist G, Braunitzer G, Paléus S (1979) Die Sequenz des monomeren Hämoglobins III vonMyxine glutinosa L: ein neuer Hämkomplex: E7 Glutamin, E11 Isoleucin. Hoppe Seylers Z Physiol Chem 360:125–135
PubMed Google Scholar
Lorincz AT, Reed SI (1984) Primary structure homology between the product of yeast cell division control gene CDC28 and vertebrate oncogenes. Nature 307:183–185
Article PubMed Google Scholar
McLachlan AD (1971) Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c551. J Mol Biol 61:409–424
Article PubMed Google Scholar
McLachlan AD (1972) Repeating sequences and gene duplication in proteins. J Mol Biol 64:417–437
Article PubMed Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Article PubMed Google Scholar
Ploegman JH, Drent G, Kalk KH, Hol WGJ, Heinrikson RL, Keim P, Weng L, Russell J (1978) The covalent and tertiary structure of bovine liver rhodanese. Nature 273:124–129
Google Scholar
Rapp UR, Goldsborough MD, Mark GE, Bonner TI, Groffen J, Reynolds FH, Stephenson JR (1983) Structure and biological activity of v-raf, a unique oncogene transduced by a retrovirus. Proc Natl Acad Sci USA 80:4218–4222
PubMed Google Scholar
Reddy EP, Smith MJ, Srinivasan A (1983) Nucleotide sequence of Abelson murine leukemia virus genome: structural similarity of its transforming gene product to otheronc gene products with tyrosine-specific kinase activity. Proc Natl Acad Sci USA 80:3623–3627, Proc Natl Acad Sci USA 80:7372 (correction)
PubMed Google Scholar
Schwartz DE, Tizard R, Gilbert W (1983) Nucleotide sequence of Rous sarcoma virus. Cell 32:853–869
Article PubMed Google Scholar
Sellers PH (1974) Evolutionary distances. SIAM J Appl Math 26:787–793
Article Google Scholar
Shibuya M, Hanafusa H (1982) Nucleotide sequence of Fujinami sarcoma virus: evolutionary relationship of its transforming gene with transforming genes of other sarcoma viruses. Cell 30:787–795
Article PubMed Google Scholar
Shoji S, Parmelee DC, Wade RD, Kumar S, Ericsson LH, Walsh KA, Neurath H, Long GL, Demaille JG, Fisher EH, Titani K (1981) Complete amino acid sequence of the catalytic subunit of bovine cardiac muscle cyclic AMP-dependent protein kinase. Proc Natl Acad Sci USA 78:848–851
PubMed Google Scholar
Smith TF, Waterman MS, Fitch WM (1981) Comparative biosequence metrics. J Mol Evol 18:38–46
Article PubMed Google Scholar
Stephens RM, Rice NR, Hiebsch RR, Bose HR, Gilden RV (1983) Nucleotide sequence of v-rel: the oncogene of reticuloendotheliosis virus. Proc Natl Acad Sci USA 80:6229–6233
PubMed Google Scholar
Suzuki T, Takagi T, Gotoh T (1982) Amino acid sequence of the smallest polypeptide chain containing heme of extracellular hemoglobin from the polychaeteTylorrhynchus heterochaetus. Biochim Biophys Acta 708:253–258
Google Scholar
Takagi T, Tobita M, Shikama K (1983) Amino acid sequence of dimeric myoglobin fromCerithidea rhizophorarum. Biochim Biophys Acta 745:32–36
PubMed Google Scholar
Van Beveren C, Galleshaw JA, Jonas V, Berns AJM, Doolittle RF, Donoghue DJ, Verma IM (1981) Nucleotide sequence and formation of the transforming gene of a mouse sarcoma virus. Nature 289:258–262
Article PubMed Google Scholar
Waterman MS, Smith TE, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chemistry, University of California, 92093, San Diego, La Jolla, California, USA
D. F. Feng, M. S. Johnson & R. F. Doolittle

Authors

D. F. Feng
View author publications
You can also search for this author in PubMed Google Scholar
M. S. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
R. F. Doolittle
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, D.F., Johnson, M.S. & Doolittle, R.F. Aligning amino acid sequences: Comparison of commonly used methods. J Mol Evol 21, 112–125 (1985). https://doi.org/10.1007/BF02100085

Download citation

Received: 05 July 1984
Revised: 17 September 1984
Issue Date: February 1985
DOI: https://doi.org/10.1007/BF02100085

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aligning amino acid sequences: Comparison of commonly used methods

Summary

Access this article

Similar content being viewed by others

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets

Quantiprot - a Python package for quantitative analysis of protein sequences

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

Aligning amino acid sequences: Comparison of commonly used methods

Summary

Access this article

Similar content being viewed by others

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets

Quantiprot - a Python package for quantitative analysis of protein sequences

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation