Computational Approaches for Gene Identification

Singh, Gautam B.

doi:10.1385/1-59259-192-2:351

Gautam B. Singh³

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 132))

2281 Accesses

Abstract

Genetics is gaining increasing significance as the discovery of new genes continues to have considerable impact in the field of medical sciences. The Human Genome Project is a multidisciplinary endeavor that aims at learning the identity of every single base stored in the human genome has been ongoing for some time now. The genome stores the blueprints for the synthesis of a variety of proteins the macromolecules that enable an organism to be structurally and functionally viable. The blueprint or the program for the synthesis of a single protein is called a gene, a unit of the DNA sequence that is generally between 1 x 10³-1 x 10⁶ bp in length based upon the complexity of the protein that it codes for. A higher level eukaryote contains as many as 30,000-40,000 genes. It has been estimated that gene coding region accounts only for 10-20% of the genome. The gene identification problem is to recognize these regions from an anonymous sequence of DNA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Singh, G. B. and Krawetz, S. A. (1994) Computer based EXON detection: an evaluation metric for comparison. Intl. J. Genome Res. 1, 321–338.
CAS Google Scholar
Fickett, J. (1996) Finding genes by the computer: the state of the art. Trends Genet. 12, 316–320.
Article PubMed CAS Google Scholar
Huang, X., Adams, M. D., Zhou, H., and Kerlavage, A. R. (1997) A tool for analyzing and annotating genomic sequences. Genomics 46, 37–45.
Article PubMed CAS Google Scholar
Huang, X. and Zhang, J. (1996) Methods for comparing a DNA sequence with a protein sequence. Comput. Applic. Biosci. 12, 497–506.
CAS Google Scholar
Altschul, S., Gish, W., Miller, W., and Myers, E. (1990) A basic local alignment search tool. J. Mol. Biol. 215, 403–410.
PubMed CAS Google Scholar
Huang, X. (1994) On global sequence alignment. Comput. Applic. Biosci. 10, 227–235.
CAS Google Scholar
Solovyev, V., Salamov, A., and Lawrence, C. (1994) The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames, in Proc. 2nd Intl. Conf. on Intelligent Systems in Molecular Biology, Altman, R., Brutlag, D., Karp, R., Latrop, R., and Searls, D., eds.) AAAI Press, Menlo Park, CA, pp. 354–362.
Google Scholar
Zhang, M. (1997) Identification of protein coding regions in the human genome based on quadrati discriminant analysis. Proc. Natl. Acad. Sci. USA 94, 565–568.
Article PubMed CAS Google Scholar
Burge, C. and Karlin, S. (1997) Prediction of complete gene stuctures in human genomic DNA. J. Mol. Biol. 268, 78–94.
Article PubMed CAS Google Scholar
Henderson, J., Salzberg, S., and Fasman, K. (1997) Finding genes in human DNA with a hidden Markov model. J. Comp. Biol. 4, 127–141.
Article CAS Google Scholar
Salzberg, S., Delcher, A., Kasif, S., and White, O. (1998) Microbial gene identification using interpolated markov models. Nucleic Acid Res. 26, 544–548.
Article PubMed CAS Google Scholar
Salzberg, S. (1995) Locating protein coding regions in human DNA using a decision tree algorithm. J. Comp. Biol. 2, 473–485.
Article CAS Google Scholar
Salzberg, S., Delcher, A., Fasman, K., and Henderson, J. (1997) A decision tree system for finding genes in DNA. Technical Report 1997-03, Department of Computer. Science, Johns Hopkins University, March 1997.
Google Scholar
Kulp, D., Haussler, D., Reese, M. G., and Eeckman, F. H. (1996) A generalized hidden Markov model for the recognition of human genes in DNA, in Proc. 4th Conf. on Intelligent Systems in Molecular Biology, June 1996. St. Louis, MO (States, D., Agarwal, P., Gaasterland, T., Hunter, L., Smith, R., eds.), AAAI Press, Menlo Park, CA.
Google Scholar
Solovyev, V., Salamov, A., and Lawrence, C. (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acid Res. 22, 5156–5163.
Article PubMed CAS Google Scholar
Snyder, E. E. and Stormo, G. D. (1993) Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acid Res. 21, 607–613.
Article PubMed CAS Google Scholar
Snyder, E. E. and Stormo, G. D. (1995) Identification of coding regions in genomic DNA. J. Mol. Biol. 248, 1–18.
Article PubMed CAS Google Scholar
Dong, S. and Searls, D. B. (1994) Gene structure prediction by linguistic methods. Genomics 23, 540–551.
Article PubMed CAS Google Scholar
Uberbacher, E. and Mural, R. (1991) Locating protein coding regions in human DNA sequences using a multiple-sensor neural network approach. Proc. Natl. Acad. Sci. USA 88, 11,261–11,265.
Article PubMed CAS Google Scholar
Uberbacher, E. and Mural, R. (1991) GRAIL seeks out genes buried in DNA sequence. Science 254, 805.
Article Google Scholar
Uberbacher, E., Xu, Y., and Mural, R. (1996) Discovering and understanding genes in human DNA sequence using GRAIL. Comp. Meth. Macromol. Seq. Anal. 266, 259–281.
Article CAS Google Scholar
Xu, Y. and Uberbacher, E. (1997) Automated gene identification in large-scale genomic sequences. J. Comp. Biol. 4, 325–338.
Article CAS Google Scholar
Milanesi, L., Kolchanov, N., Rogozin, I., Ischenko, I., Kel, A., Orlov, Y., Ponomarenko, M., and Vezzoni, P. (1993) Gen View: a computing tool for protein-coding regions prediction in nucleotide sequences, in Proc. 2nd. Intl. Conf. on Bioinformatics, Supercomput. and Complex Genome Analysis (Lim, N., Fickett, J., Cantor, C., and Robbins, R. J., eds.) World Scientific Publishing, Singapore, pp. 573–588.
Google Scholar
Milanesi, L., Kolchanov, N., Rogozin, I., Kel, A., and Titov, I. (1993) Sequence functional inference, in Guide to Human Genome Computing (Bishop, M. J., ed.) Academic, New York, pp. 249–312.
Google Scholar
Rogozin, I. B., Milanesi, L., and Kolchanov, N. A. (1996) Gene structure prediction using information on homologous protein sequence. Comput. Applic. Biosci. 12, 161–170.
CAS Google Scholar
Thomas, A. and Skolnick, M. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol. 11, 149–168.
Article PubMed CAS Google Scholar
Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992) Prediction of gene structure. J. Mol. Biol. 226, 141–157.
Article PubMed CAS Google Scholar
Gelfand, M., Mironov, A., and Pevzner, P. (1996) Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066.
Article PubMed CAS Google Scholar
Borodovsky, M. and McIninch, J. (1993) GENMARK: parallel gene recognition for both DNA strands. Comp. Chem. 17, 123–133.
Article CAS Google Scholar
Burset, M. and Guigo, R. (1996) Evalution of gene structure prediction programs. Genomics 34, 353–367.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Oakland University, Rochester, MI
Gautam B. Singh

Authors

Gautam B. Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Queen’s University, Kingston, Ontario, Canada
Stephen Misener
Wayne State University, Detroit, MI
Stephen A. Krawetz

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Singh, G.B. (2000). Computational Approaches for Gene Identification. In: Misener, S., Krawetz, S.A. (eds) Bioinformatics Methods and Protocols. Methods in Molecular Biology™, vol 132. Humana Press, Totowa, NJ. https://doi.org/10.1385/1-59259-192-2:351

Download citation

DOI: https://doi.org/10.1385/1-59259-192-2:351
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-0-89603-732-8
Online ISBN: 978-1-59259-192-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics