Abstract
Motivation: Motif detection for DNA sequences has many important applications in biological studies, e.g., locating binding sites and regulatory signals, and designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software.
Results: (1) We design a randomized algorithm for consensus pattern problem. We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most ε × l for each string, where l is the length of the motif and ε can be any positive number given by the user. (2) We design an improved EM (Expectation Maximization) algorithm that outperforms the original EM algorithm. (3) We develop a software MotifDetector that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search. We compare MotifDetector with Buhler and Tompa’s PROJECTION which is considered to be the best known software for motif detection. Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large.
Availability: Free from http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Blanchette, M.: Algorithms for phylogenetic footprinting. In: RECOMB 01: Proceedings of the Fifth Annual International Conference on Computational Molecular Biology, pp. 49–58 (2001)
Buhler, J., Tompa, M.: Finding motifs using random projections. Journal of Computational Biology 9, 225–242 (2002)
Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123–125 (1993)
Duret, L., Bucher, P.: Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7, 399–406 (1997)
Duret, L., Dorkeld, F., Gautier, C.: Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Research 21, 2315–2322 (1993)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Hertz, G., Stormo, G.: Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Proc. 3rd Int’l Conf. Bioinformatics and Genome Research, pp. 201–216 (1995)
Keich, U., Pevzner, P.: Finding motifs in the twilight zone. Bioinformatics 18, 1374–1381 (2002a)
Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)
Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proc. 10th ACM-SIAM Symp. on Discrete Algorithms, pp. 633–642 (1999); Also to appear in Information and Computation.
Lawrence, C., Reilly, A.: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51 (1990)
Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Sequences. J. Comput. Syst. Sci. 65, 73–96 (2002); special issue for Thirty-first Annual ACM Symposium on Theory of Computing
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. JACM 49(2), 157–171 (2002)
Lucas, K., Busch, M., Mössinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. In: CABIOS, vol. 7, pp. 525–529 (1991)
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 269–278 (2000)
Proutski, V., Holme, E.C.: Primer Master: a new program for the design and analysis of PCR primers. In: CABIOS, vol. 12, pp. 253–255 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Dong, L., Fan, H. (2004). Randomized Algorithms for Motif Detection. In: Fleischer, R., Trippen, G. (eds) Algorithms and Computation. ISAAC 2004. Lecture Notes in Computer Science, vol 3341. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30551-4_75
Download citation
DOI: https://doi.org/10.1007/978-3-540-30551-4_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24131-7
Online ISBN: 978-3-540-30551-4
eBook Packages: Computer ScienceComputer Science (R0)