Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment

Schaeffer, Dustin; Grishin, Nick V.

doi:10.1007/978-1-4939-8736-8_15

Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment

Dustin Schaeffer³ &
Nick V. Grishin^3,4

Protocol
First Online: 27 September 2018

2686 Accesses
1 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1851))

Abstract

Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Soding J, Lupas AN (2003) More than the sum of their parts: on the evolution of proteins from peptides. BioEssays 25(9):837–846
Article Google Scholar
Leipe DD, Aravind L, Grishin NV, Koonin EV (2000) The bacterial replicative helicase DnaB evolved from a RecA duplication. Genome Res 10(1):5–16
CAS PubMed Google Scholar
Tyzack JD, Furnham N, Sillitoe I, Orengo CM, Thornton JM (2017) Understanding enzyme function evolution from a computational perspective. Curr Opin Struct Biol 47(Suppl C):131–139. https://doi.org/10.1016/j.sbi.2017.08.003
Article CAS PubMed Google Scholar
Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10(12):e1003926. https://doi.org/10.1371/journal.pcbi.1003926
Article PubMed PubMed Central Google Scholar
Song N, Sedgewick RD, Durand D (2007) Domain architecture comparison for multidomain homology identification. J Comput Biol 14(4):496–516. https://doi.org/10.1089/cmb.2007.A009
Article CAS PubMed Google Scholar
Holland TA, Veretnik S, Shindyalov IN, Bourne PE (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol 361(3):562–590. https://doi.org/10.1016/j.jmb.2006.05.060
Article CAS PubMed Google Scholar
Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36(Database issue):D419–D425
CAS PubMed Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article CAS Google Scholar
Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960. https://doi.org/10.1093/bioinformatics/bti125
Article Google Scholar
Remmert M, Biegert A, Hauser A, Soding J (2011) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173. https://doi.org/10.1038/nmeth.1818
Article CAS PubMed Google Scholar
Cheng H, Liao Y, Schaeffer RD, Grishin NV (2015) Manual classification strategies in the ECOD database. Proteins 83(7):1238–1251. https://doi.org/10.1002/prot.24818
Article CAS PubMed PubMed Central Google Scholar
Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21(7):988–992. https://doi.org/10.1093/bioinformatics/bti082
Article CAS PubMed Google Scholar
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152. https://doi.org/10.1093/bioinformatics/bts565
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Institutes of Health (GM094575 to NVG) and the Welch Foundation (I-1505 to NVG).

Author information

Authors and Affiliations

Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
Dustin Schaeffer & Nick V. Grishin
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
Nick V. Grishin

Authors

Dustin Schaeffer
View author publications
You can also search for this author in PubMed Google Scholar
Nick V. Grishin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GlaxoSmithKline, Cellzome – a GSK company Meyerhofstrasse 1, Heidelberg, Baden-Württemberg, Germany
Tobias Sikosek

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Schaeffer, D., Grishin, N.V. (2019). Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_15

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8736-8_15
Published: 27 September 2018
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8735-1
Online ISBN: 978-1-4939-8736-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics