Skip to main content

Computational Approaches for Gene Identification

  • Protocol
Bioinformatics Methods and Protocols

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 132))

  • 2281 Accesses

Abstract

Genetics is gaining increasing significance as the discovery of new genes continues to have considerable impact in the field of medical sciences. The Human Genome Project is a multidisciplinary endeavor that aims at learning the identity of every single base stored in the human genome has been ongoing for some time now. The genome stores the blueprints for the synthesis of a variety of proteins the macromolecules that enable an organism to be structurally and functionally viable. The blueprint or the program for the synthesis of a single protein is called a gene, a unit of the DNA sequence that is generally between 1 x 103-1 x 106 bp in length based upon the complexity of the protein that it codes for. A higher level eukaryote contains as many as 30,000-40,000 genes. It has been estimated that gene coding region accounts only for 10-20% of the genome. The gene identification problem is to recognize these regions from an anonymous sequence of DNA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Singh, G. B. and Krawetz, S. A. (1994) Computer based EXON detection: an evaluation metric for comparison. Intl. J. Genome Res. 1, 321–338.

    CAS  Google Scholar 

  2. Fickett, J. (1996) Finding genes by the computer: the state of the art. Trends Genet. 12, 316–320.

    Article  PubMed  CAS  Google Scholar 

  3. Huang, X., Adams, M. D., Zhou, H., and Kerlavage, A. R. (1997) A tool for analyzing and annotating genomic sequences. Genomics 46, 37–45.

    Article  PubMed  CAS  Google Scholar 

  4. Huang, X. and Zhang, J. (1996) Methods for comparing a DNA sequence with a protein sequence. Comput. Applic. Biosci. 12, 497–506.

    CAS  Google Scholar 

  5. Altschul, S., Gish, W., Miller, W., and Myers, E. (1990) A basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  6. Huang, X. (1994) On global sequence alignment. Comput. Applic. Biosci. 10, 227–235.

    CAS  Google Scholar 

  7. Solovyev, V., Salamov, A., and Lawrence, C. (1994) The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames, in Proc. 2nd Intl. Conf. on Intelligent Systems in Molecular Biology, Altman, R., Brutlag, D., Karp, R., Latrop, R., and Searls, D., eds.) AAAI Press, Menlo Park, CA, pp. 354–362.

    Google Scholar 

  8. Zhang, M. (1997) Identification of protein coding regions in the human genome based on quadrati discriminant analysis. Proc. Natl. Acad. Sci. USA 94, 565–568.

    Article  PubMed  CAS  Google Scholar 

  9. Burge, C. and Karlin, S. (1997) Prediction of complete gene stuctures in human genomic DNA. J. Mol. Biol. 268, 78–94.

    Article  PubMed  CAS  Google Scholar 

  10. Henderson, J., Salzberg, S., and Fasman, K. (1997) Finding genes in human DNA with a hidden Markov model. J. Comp. Biol. 4, 127–141.

    Article  CAS  Google Scholar 

  11. Salzberg, S., Delcher, A., Kasif, S., and White, O. (1998) Microbial gene identification using interpolated markov models. Nucleic Acid Res. 26, 544–548.

    Article  PubMed  CAS  Google Scholar 

  12. Salzberg, S. (1995) Locating protein coding regions in human DNA using a decision tree algorithm. J. Comp. Biol. 2, 473–485.

    Article  CAS  Google Scholar 

  13. Salzberg, S., Delcher, A., Fasman, K., and Henderson, J. (1997) A decision tree system for finding genes in DNA. Technical Report 1997-03, Department of Computer. Science, Johns Hopkins University, March 1997.

    Google Scholar 

  14. Kulp, D., Haussler, D., Reese, M. G., and Eeckman, F. H. (1996) A generalized hidden Markov model for the recognition of human genes in DNA, in Proc. 4th Conf. on Intelligent Systems in Molecular Biology, June 1996. St. Louis, MO (States, D., Agarwal, P., Gaasterland, T., Hunter, L., Smith, R., eds.), AAAI Press, Menlo Park, CA.

    Google Scholar 

  15. Solovyev, V., Salamov, A., and Lawrence, C. (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acid Res. 22, 5156–5163.

    Article  PubMed  CAS  Google Scholar 

  16. Snyder, E. E. and Stormo, G. D. (1993) Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acid Res. 21, 607–613.

    Article  PubMed  CAS  Google Scholar 

  17. Snyder, E. E. and Stormo, G. D. (1995) Identification of coding regions in genomic DNA. J. Mol. Biol. 248, 1–18.

    Article  PubMed  CAS  Google Scholar 

  18. Dong, S. and Searls, D. B. (1994) Gene structure prediction by linguistic methods. Genomics 23, 540–551.

    Article  PubMed  CAS  Google Scholar 

  19. Uberbacher, E. and Mural, R. (1991) Locating protein coding regions in human DNA sequences using a multiple-sensor neural network approach. Proc. Natl. Acad. Sci. USA 88, 11,261–11,265.

    Article  PubMed  CAS  Google Scholar 

  20. Uberbacher, E. and Mural, R. (1991) GRAIL seeks out genes buried in DNA sequence. Science 254, 805.

    Article  Google Scholar 

  21. Uberbacher, E., Xu, Y., and Mural, R. (1996) Discovering and understanding genes in human DNA sequence using GRAIL. Comp. Meth. Macromol. Seq. Anal. 266, 259–281.

    Article  CAS  Google Scholar 

  22. Xu, Y. and Uberbacher, E. (1997) Automated gene identification in large-scale genomic sequences. J. Comp. Biol. 4, 325–338.

    Article  CAS  Google Scholar 

  23. Milanesi, L., Kolchanov, N., Rogozin, I., Ischenko, I., Kel, A., Orlov, Y., Ponomarenko, M., and Vezzoni, P. (1993) Gen View: a computing tool for protein-coding regions prediction in nucleotide sequences, in Proc. 2nd. Intl. Conf. on Bioinformatics, Supercomput. and Complex Genome Analysis (Lim, N., Fickett, J., Cantor, C., and Robbins, R. J., eds.) World Scientific Publishing, Singapore, pp. 573–588.

    Google Scholar 

  24. Milanesi, L., Kolchanov, N., Rogozin, I., Kel, A., and Titov, I. (1993) Sequence functional inference, in Guide to Human Genome Computing (Bishop, M. J., ed.) Academic, New York, pp. 249–312.

    Google Scholar 

  25. Rogozin, I. B., Milanesi, L., and Kolchanov, N. A. (1996) Gene structure prediction using information on homologous protein sequence. Comput. Applic. Biosci. 12, 161–170.

    CAS  Google Scholar 

  26. Thomas, A. and Skolnick, M. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol. 11, 149–168.

    Article  PubMed  CAS  Google Scholar 

  27. Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992) Prediction of gene structure. J. Mol. Biol. 226, 141–157.

    Article  PubMed  CAS  Google Scholar 

  28. Gelfand, M., Mironov, A., and Pevzner, P. (1996) Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066.

    Article  PubMed  CAS  Google Scholar 

  29. Borodovsky, M. and McIninch, J. (1993) GENMARK: parallel gene recognition for both DNA strands. Comp. Chem. 17, 123–133.

    Article  CAS  Google Scholar 

  30. Burset, M. and Guigo, R. (1996) Evalution of gene structure prediction programs. Genomics 34, 353–367.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Singh, G.B. (2000). Computational Approaches for Gene Identification. In: Misener, S., Krawetz, S.A. (eds) Bioinformatics Methods and Protocols. Methods in Molecular Biology™, vol 132. Humana Press, Totowa, NJ. https://doi.org/10.1385/1-59259-192-2:351

Download citation

  • DOI: https://doi.org/10.1385/1-59259-192-2:351

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-0-89603-732-8

  • Online ISBN: 978-1-59259-192-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics