Skip to main content

Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1851))

Abstract

Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Soding J, Lupas AN (2003) More than the sum of their parts: on the evolution of proteins from peptides. BioEssays 25(9):837–846

    Article  Google Scholar 

  2. Leipe DD, Aravind L, Grishin NV, Koonin EV (2000) The bacterial replicative helicase DnaB evolved from a RecA duplication. Genome Res 10(1):5–16

    CAS  PubMed  Google Scholar 

  3. Tyzack JD, Furnham N, Sillitoe I, Orengo CM, Thornton JM (2017) Understanding enzyme function evolution from a computational perspective. Curr Opin Struct Biol 47(Suppl C):131–139. https://doi.org/10.1016/j.sbi.2017.08.003

    Article  CAS  PubMed  Google Scholar 

  4. Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10(12):e1003926. https://doi.org/10.1371/journal.pcbi.1003926

    Article  PubMed  PubMed Central  Google Scholar 

  5. Song N, Sedgewick RD, Durand D (2007) Domain architecture comparison for multidomain homology identification. J Comput Biol 14(4):496–516. https://doi.org/10.1089/cmb.2007.A009

    Article  CAS  PubMed  Google Scholar 

  6. Holland TA, Veretnik S, Shindyalov IN, Bourne PE (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol 361(3):562–590. https://doi.org/10.1016/j.jmb.2006.05.060

    Article  CAS  PubMed  Google Scholar 

  7. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36(Database issue):D419–D425

    CAS  PubMed  Google Scholar 

  8. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  CAS  Google Scholar 

  9. Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960. https://doi.org/10.1093/bioinformatics/bti125

    Article  Google Scholar 

  10. Remmert M, Biegert A, Hauser A, Soding J (2011) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173. https://doi.org/10.1038/nmeth.1818

    Article  CAS  PubMed  Google Scholar 

  11. Cheng H, Liao Y, Schaeffer RD, Grishin NV (2015) Manual classification strategies in the ECOD database. Proteins 83(7):1238–1251. https://doi.org/10.1002/prot.24818

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21(7):988–992. https://doi.org/10.1093/bioinformatics/bti082

    Article  CAS  PubMed  Google Scholar 

  13. Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152. https://doi.org/10.1093/bioinformatics/bts565

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Institutes of Health (GM094575 to NVG) and the Welch Foundation (I-1505 to NVG).

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Schaeffer, D., Grishin, N.V. (2019). Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8736-8_15

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8735-1

  • Online ISBN: 978-1-4939-8736-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics