Skip to main content

Domain Architecture in Homolog Identification

  • Conference paper
  • 362 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4205))

Abstract

Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the alignment of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of domain architecture. We developed several schemes for scoring the similarity of a pair of protein sequences by exploiting an analogy between comparing proteins using their domain content and comparing documents based on their word content. We evaluate the proposed methods using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting critical domains and of compensating for proteins with large numbers of domains.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huynen, M.A., Bork, P.: Measuring genome evolution. PNAS 95(11), 5849–5856 (1998)

    Article  Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  3. Gilbert, W.: The exon theory of genes. Cold Spring Harb. Symp. Quant. Biol. 52, 901–905 (1987)

    Google Scholar 

  4. Patthy, L.: Genome evolution and the evolution of exon-shuffling–a review. Gene 238(1), 103–114 (1999)

    Article  Google Scholar 

  5. Eichler, E.E.: Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17(11), 661–669 (2001)

    Article  Google Scholar 

  6. Emanuel, B.S., Shaikh, T.H.: Segmental duplications: an ’expanding’ role in genomic instability and disease. Nat. Rev. Genet. 2(10), 791–800 (2001)

    Article  Google Scholar 

  7. Kaessmann, H., Zollner, S., Nekrutenko, A., Li, W.H.: Signatures of domain shuffling in the human genome. Genome Res. 12(11), 1642–1650 (2002)

    Article  Google Scholar 

  8. Wang, W., Zhang, J., Alvarez, C., Llopart, A., Long, M.: The origin of the jingwei gene and the complex modular structure of its parental gene, yellow emperor, in drosophila melanogaster. Mol. Biol. Evol. 17(9), 1294–1301 (2000)

    Google Scholar 

  9. Long, M.: Evolution of novel genes. Curr. Opin. Genet. Dev. 11(6), 673–680 (2001)

    Article  Google Scholar 

  10. Long, M., Thornton, K.: Gene duplication and evolution. Science 293(5535), 1551 (2001)

    Article  Google Scholar 

  11. Apic, G., Gough, J., Teichmann, S.A.: Domain coalmbinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310(2), 311–325 (2001)

    Article  Google Scholar 

  12. Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz, J., Mott, R., Ciccarelli, F., Copley, R.R., Ponting, C.P., Bork, P.: Recent improvements to the smart domain-based sequence annotation resource. Nucleic Acids Res. 30(1), 242–244 (2002)

    Article  Google Scholar 

  13. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., Sonnhammer, E.L.L.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)

    Article  Google Scholar 

  14. Corpet, F., Gouzy, J., Kahn, D.: The ProDom database of protein domain families. Nucleic Acids Res. 26(1), 323–326 (1998)

    Article  Google Scholar 

  15. Gracy, J., Argos, P.: Domo: a new database of aligned protein domains. Trends Biochem. Sci. 23(12), 495–497 (1998)

    Article  Google Scholar 

  16. Heger, A., Holm, L.: Exhaustive enumeration of protein domain families. J. Mol. Biol. 328(3), 749–767 (2003)

    Article  Google Scholar 

  17. Murzin, A., Brenner, S., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540 (1995)

    Google Scholar 

  18. Geer, L.Y., Domrachev, M., Lipman, D.J., Bryant, S.H.: CDART: protein homology by domain architecture. Genome Res. 12(10), 1619–1623 (2002)

    Article  Google Scholar 

  19. Bjorklund, A.K., Ekman, D., Light, S., Frey-Skott, J., Elofsson, A.: Domain rearrangements in protein evolution. J. Mol. Biol. 353(4), 911–923 (2005)

    Article  Google Scholar 

  20. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  21. Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor, M.G.L., Nelson, C.R., Hariharan, I.K., Fortini, M.E., Li, P.W., Apweiler, R., Fleischmann, W., et al.: Comparative genomics of the eukaryotes. Science 287(5461), 2204–2215 (2000)

    Article  Google Scholar 

  22. Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428), 751–753 (1999)

    Article  Google Scholar 

  23. Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.: The universal protein resource (UniProt). Nucleic Acids Res. 33, D154–D159 (2005)

    Article  Google Scholar 

  24. Nicholson, A.C., Malik, S.B., Logsdon, J.M.J., Van Meir, E.G.: Functional evolution of ADAMTS genes: evidence from analyses of phylogeny and gene organization. BMC Evol. Biol. 5(1), 11 (2005)

    Article  Google Scholar 

  25. Stone, A.L., Kroeger, M., Sang, Q.X.: Structure-function analysis of the adam family of disintegrin-like and metalloproteinase-containing proteins (review). J. Protein Chem. 18(4), 447–465 (1999)

    Article  Google Scholar 

  26. Wolfsberg, T.G., White, J.M.: Adams in fertilization and development. Dev. Biol. 180(2), 389–401 (1996)

    Article  Google Scholar 

  27. Wharton, K.A.: Runnin’ with the Dvl: proteins that associate with Dsh/Dvl and their significance to Wnt signal transduction. Dev. Biol. 253(1), 1–17 (2003)

    Article  Google Scholar 

  28. Sheldahl, L.C., Slusarski, D.C., Pandur, P., Miller, J.R., Kühl, M., Moon, R.T.: Dishevelled activates Ca2+ flux, PKC, and CamKII in vertebrate embryos. J. Cell Biol. 161(4), 769–777 (2003)

    Article  Google Scholar 

  29. Mazet, F., Yu, J.K., Liberles, D.A., Holland, L.Z., Shimeld, S.M.: Phylogenetic relationships of the fox (forkhead) gene family in the bilateria. Gene 316, 79–89 (2003)

    Article  Google Scholar 

  30. Kaestner, K.H., Knochel, W., Martinez, D.E.: Unified nomenclature for the winged helix/forkhead transcription factors. Genes Dev. 14(2), 142–146 (2000)

    Google Scholar 

  31. Lowry, J.A., Atchley, W.R.: Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J. Mol. Evol. 50(2), 103–115 (2000)

    Google Scholar 

  32. Patient, R.K., McGhee, J.D.: The GATA family (vertebrates and invertebrates). Curr. Opin. Genet. Dev. 12(4), 416–422 (2002)

    Article  Google Scholar 

  33. Robinson, D.R., Wu, Y.M., Lin, S.F.: The protein tyrosine kinase family of the human genome. Oncogene 19(49), 5548–5557 (2000)

    Article  Google Scholar 

  34. Hanks, S.K.: Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 4(5), 111 (2003)

    Article  Google Scholar 

  35. Cheek, S., Zhang, H., Grishin, N.V.: Sequence and structure classification of kinases. J. Mol. Biol. 320(4), 855–881 (2002)

    Article  Google Scholar 

  36. Shiu, S.H., Li, W.H.: Origins, lineage-specific expansions, and multiple losses of tyrosine kinases in eukaryotes. Mol. Biol. Evol. 21(5), 828–840 (2004)

    Article  Google Scholar 

  37. Iwabe, N., Miyata, T.: Kinesin-related genes from diplomonad, sponge, amphioxus, and cyclostomes: divergence pattern of kinesin family and evolution of giardial membrane-bounded organella. Mol. Biol. Evol. 19(9), 1524–1533 (2002)

    Google Scholar 

  38. Lawrence, C.J., Dawe, R.K., Christie, K.R., Cleveland, D.W., Dawson, S.C., Endow, S.A., Goldstein, L.S., Goodson, H.V., Hirokawa, N., Howard, J., et al.: A standardized kinesin nomenclature. J. Cell Biol. 67(1), 19–22 (2004)

    Article  Google Scholar 

  39. Miki, H., Setou, M., Hirokawa, N.: Kinesin superfamily proteins (kifs) in the mouse transcriptome. Genome Res. 13(6B), 1455–1465 (2003)

    Article  Google Scholar 

  40. Welch, A.Y., Kasahara, M., Spain, L.M.: Identification of the mouse killer immunoglobulin-like receptor-like (Kirl) gene family mapping to chromosome X. Immunogenetics 54(11), 782–790 (2003)

    Google Scholar 

  41. Belkin, D., Torkar, M., Chang, C., Barten, R., Tolaini, M., Haude, A., Allen, R., Wilson, M.J., Kioussis, D., Trowsdale, J.: Killer cell Ig-like receptor and leukocyte Ig-like receptor transgenic mice exhibit tissue- and cell-specific transgene expression. J. Immunol. 171(6), 3056–3063 (2003)

    Google Scholar 

  42. Engel, J.: Laminins and other strange proteins. Biochemistry 31(44), 10643–10651 (1992)

    Article  Google Scholar 

  43. Hutter, H., Vogel, B.E., Plenefisch, J.D., Norris, C.R., Proenca, R.B., Spieth, J., Guo, C., Mastwal, S., Zhu, X., Scheel, J., Hedgecock, E.M.: Conservation and novelty in the evolution of cell adhesion and extracellular matrix genes. Science 287(5455), 989–994 (2000)

    Article  Google Scholar 

  44. Richards, T.A., Cavalier-Smith, T.: Myosin domain evolution and the primary divergence of eukaryotes. Nature 436(7054), 1113–1118 (2005)

    Article  Google Scholar 

  45. Goodson, H.V., Dawson, S.C.: Multiplying myosins. Proc. Natl. Acad. Sci. USA 103(10), 3498–3499 (2006)

    Article  Google Scholar 

  46. Foth, B.J., Goedecke, M.C., Soldati, D.: New insights into myosin evolution and classification. Proc. Natl. Acad. Sci. USA 103(10), 3681–3686 (2006)

    Article  Google Scholar 

  47. Maine, E.M., Lissemore, J.L., Starmer, W.T.: A phylogenetic analysis of vertebrate and invertebrate notch-related genes. Mol. Phylogenet. Evol. 4(2), 139–149 (1995)

    Article  Google Scholar 

  48. Westin, J., Lardelli, M.: Three novel notch genes in zebrafish: implications for vertebrate notch gene evolution and function. Dev. Genes. Evol. 207(1), 51–63 (1997)

    Article  Google Scholar 

  49. Kortschak, R.D., Tamme, R., Lardelli, M.: Evolutionary analysis of vertebrate notch genes. Dev. Genes. Evol. 211(7), 350–354 (2001)

    Article  Google Scholar 

  50. Degerman, E., Belfrage, P., Manganiello, V.: Structure, localization, and regulation of cGMP-inhibited phosphodiesterase (PDE3). J. Biol. Chem. 272(11), 6823–6826 (1997)

    Article  Google Scholar 

  51. Raper, J.: Semaphorins and their receptors in vertebrates and invertebrates. Curr. Opin. Neurobiol. 10(1), 88–94 (2000)

    Article  Google Scholar 

  52. Yazdani, U., Terman, J.R.: The semaphorins. Genome. Biol. 7(3), 211 (2006)

    Article  Google Scholar 

  53. Locksley, R.M., Killeen, N., Lenardo, M.J.: The tnf and tnf receptor superfamilies: integrating mammalian biology. Cell 104(4), 487–501 (2001)

    Article  Google Scholar 

  54. MacEwan, D.J.: TNF ligands and receptors–a matter of life and death. Br. J. Pharmacol. 135(4), 855–875 (2002)

    Article  Google Scholar 

  55. Inoue, J., Ishida, T., Tsukamoto, N., Kobayashi, N., Naito, A., Azuma, S., Yamamoto, T.: Tumor necrosis factor receptor-associated factor (TRAF) family: adapter proteins that mediate cytokine signaling. Exp. Cell Res. 254(1), 14–24 (2000)

    Article  Google Scholar 

  56. Wing, S.S.: Deubiquitinating enzymes–the importance of driving in reverse along the ubiquitin-proteasome pathway. Int. J. Biochem. Cell Biol. 35(5), 590–605 (2003)

    Article  Google Scholar 

  57. Kim, J.H., Park, K.C., Chung, S.S., Bang, O., Chung, C.H.: Deubiquitinating enzymes as cellular regulators. J. Biochem. (Tokyo) 134(1), 9–18 (2003)

    Google Scholar 

  58. DeLong, E.R., DeLong, D.M.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Song, N., Sedgewick, R.D., Durand, D. (2006). Domain Architecture in Homolog Identification. In: Bourque, G., El-Mabrouk, N. (eds) Comparative Genomics. RCG 2006. Lecture Notes in Computer Science(), vol 4205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11864127_2

Download citation

  • DOI: https://doi.org/10.1007/11864127_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44529-6

  • Online ISBN: 978-3-540-44530-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics