Abstract
Genetics is gaining increasing significance as the discovery of new genes continues to have considerable impact in the field of medical sciences. The Human Genome Project is a multidisciplinary endeavor that aims at learning the identity of every single base stored in the human genome has been ongoing for some time now. The genome stores the blueprints for the synthesis of a variety of proteins the macromolecules that enable an organism to be structurally and functionally viable. The blueprint or the program for the synthesis of a single protein is called a gene, a unit of the DNA sequence that is generally between 1 x 103-1 x 106 bp in length based upon the complexity of the protein that it codes for. A higher level eukaryote contains as many as 30,000-40,000 genes. It has been estimated that gene coding region accounts only for 10-20% of the genome. The gene identification problem is to recognize these regions from an anonymous sequence of DNA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Singh, G. B. and Krawetz, S. A. (1994) Computer based EXON detection: an evaluation metric for comparison. Intl. J. Genome Res. 1, 321–338.
Fickett, J. (1996) Finding genes by the computer: the state of the art. Trends Genet. 12, 316–320.
Huang, X., Adams, M. D., Zhou, H., and Kerlavage, A. R. (1997) A tool for analyzing and annotating genomic sequences. Genomics 46, 37–45.
Huang, X. and Zhang, J. (1996) Methods for comparing a DNA sequence with a protein sequence. Comput. Applic. Biosci. 12, 497–506.
Altschul, S., Gish, W., Miller, W., and Myers, E. (1990) A basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Huang, X. (1994) On global sequence alignment. Comput. Applic. Biosci. 10, 227–235.
Solovyev, V., Salamov, A., and Lawrence, C. (1994) The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames, in Proc. 2nd Intl. Conf. on Intelligent Systems in Molecular Biology, Altman, R., Brutlag, D., Karp, R., Latrop, R., and Searls, D., eds.) AAAI Press, Menlo Park, CA, pp. 354–362.
Zhang, M. (1997) Identification of protein coding regions in the human genome based on quadrati discriminant analysis. Proc. Natl. Acad. Sci. USA 94, 565–568.
Burge, C. and Karlin, S. (1997) Prediction of complete gene stuctures in human genomic DNA. J. Mol. Biol. 268, 78–94.
Henderson, J., Salzberg, S., and Fasman, K. (1997) Finding genes in human DNA with a hidden Markov model. J. Comp. Biol. 4, 127–141.
Salzberg, S., Delcher, A., Kasif, S., and White, O. (1998) Microbial gene identification using interpolated markov models. Nucleic Acid Res. 26, 544–548.
Salzberg, S. (1995) Locating protein coding regions in human DNA using a decision tree algorithm. J. Comp. Biol. 2, 473–485.
Salzberg, S., Delcher, A., Fasman, K., and Henderson, J. (1997) A decision tree system for finding genes in DNA. Technical Report 1997-03, Department of Computer. Science, Johns Hopkins University, March 1997.
Kulp, D., Haussler, D., Reese, M. G., and Eeckman, F. H. (1996) A generalized hidden Markov model for the recognition of human genes in DNA, in Proc. 4th Conf. on Intelligent Systems in Molecular Biology, June 1996. St. Louis, MO (States, D., Agarwal, P., Gaasterland, T., Hunter, L., Smith, R., eds.), AAAI Press, Menlo Park, CA.
Solovyev, V., Salamov, A., and Lawrence, C. (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acid Res. 22, 5156–5163.
Snyder, E. E. and Stormo, G. D. (1993) Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acid Res. 21, 607–613.
Snyder, E. E. and Stormo, G. D. (1995) Identification of coding regions in genomic DNA. J. Mol. Biol. 248, 1–18.
Dong, S. and Searls, D. B. (1994) Gene structure prediction by linguistic methods. Genomics 23, 540–551.
Uberbacher, E. and Mural, R. (1991) Locating protein coding regions in human DNA sequences using a multiple-sensor neural network approach. Proc. Natl. Acad. Sci. USA 88, 11,261–11,265.
Uberbacher, E. and Mural, R. (1991) GRAIL seeks out genes buried in DNA sequence. Science 254, 805.
Uberbacher, E., Xu, Y., and Mural, R. (1996) Discovering and understanding genes in human DNA sequence using GRAIL. Comp. Meth. Macromol. Seq. Anal. 266, 259–281.
Xu, Y. and Uberbacher, E. (1997) Automated gene identification in large-scale genomic sequences. J. Comp. Biol. 4, 325–338.
Milanesi, L., Kolchanov, N., Rogozin, I., Ischenko, I., Kel, A., Orlov, Y., Ponomarenko, M., and Vezzoni, P. (1993) Gen View: a computing tool for protein-coding regions prediction in nucleotide sequences, in Proc. 2nd. Intl. Conf. on Bioinformatics, Supercomput. and Complex Genome Analysis (Lim, N., Fickett, J., Cantor, C., and Robbins, R. J., eds.) World Scientific Publishing, Singapore, pp. 573–588.
Milanesi, L., Kolchanov, N., Rogozin, I., Kel, A., and Titov, I. (1993) Sequence functional inference, in Guide to Human Genome Computing (Bishop, M. J., ed.) Academic, New York, pp. 249–312.
Rogozin, I. B., Milanesi, L., and Kolchanov, N. A. (1996) Gene structure prediction using information on homologous protein sequence. Comput. Applic. Biosci. 12, 161–170.
Thomas, A. and Skolnick, M. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol. 11, 149–168.
Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992) Prediction of gene structure. J. Mol. Biol. 226, 141–157.
Gelfand, M., Mironov, A., and Pevzner, P. (1996) Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066.
Borodovsky, M. and McIninch, J. (1993) GENMARK: parallel gene recognition for both DNA strands. Comp. Chem. 17, 123–133.
Burset, M. and Guigo, R. (1996) Evalution of gene structure prediction programs. Genomics 34, 353–367.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Singh, G.B. (2000). Computational Approaches for Gene Identification. In: Misener, S., Krawetz, S.A. (eds) Bioinformatics Methods and Protocols. Methods in Molecular Biology™, vol 132. Humana Press, Totowa, NJ. https://doi.org/10.1385/1-59259-192-2:351
Download citation
DOI: https://doi.org/10.1385/1-59259-192-2:351
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-0-89603-732-8
Online ISBN: 978-1-59259-192-3
eBook Packages: Springer Protocols