Skip to main content

Randomized Algorithms for Motif Detection

  • Conference paper
Algorithms and Computation (ISAAC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3341))

Included in the following conference series:

Abstract

Motivation: Motif detection for DNA sequences has many important applications in biological studies, e.g., locating binding sites and regulatory signals, and designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software.

Results: (1) We design a randomized algorithm for consensus pattern problem. We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most ε × l for each string, where l is the length of the motif and ε can be any positive number given by the user. (2) We design an improved EM (Expectation Maximization) algorithm that outperforms the original EM algorithm. (3) We develop a software MotifDetector that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search. We compare MotifDetector with Buhler and Tompa’s PROJECTION which is considered to be the best known software for motif detection. Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large.

Availability: Free from http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  2. Blanchette, M.: Algorithms for phylogenetic footprinting. In: RECOMB 01: Proceedings of the Fifth Annual International Conference on Computational Molecular Biology, pp. 49–58 (2001)

    Google Scholar 

  3. Buhler, J., Tompa, M.: Finding motifs using random projections. Journal of Computational Biology 9, 225–242 (2002)

    Article  Google Scholar 

  4. Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123–125 (1993)

    Google Scholar 

  5. Duret, L., Bucher, P.: Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7, 399–406 (1997)

    Article  Google Scholar 

  6. Duret, L., Dorkeld, F., Gautier, C.: Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Research 21, 2315–2322 (1993)

    Article  Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  8. Hertz, G., Stormo, G.: Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Proc. 3rd Int’l Conf. Bioinformatics and Genome Research, pp. 201–216 (1995)

    Google Scholar 

  9. Keich, U., Pevzner, P.: Finding motifs in the twilight zone. Bioinformatics 18, 1374–1381 (2002a)

    Article  Google Scholar 

  10. Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)

    Article  Google Scholar 

  11. Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proc. 10th ACM-SIAM Symp. on Discrete Algorithms, pp. 633–642 (1999); Also to appear in Information and Computation.

    Google Scholar 

  12. Lawrence, C., Reilly, A.: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51 (1990)

    Article  Google Scholar 

  13. Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Sequences. J. Comput. Syst. Sci. 65, 73–96 (2002); special issue for Thirty-first Annual ACM Symposium on Theory of Computing

    Article  MathSciNet  Google Scholar 

  14. Li, M., Ma, B., Wang, L.: On the closest string and substring problems. JACM 49(2), 157–171 (2002)

    Article  MathSciNet  Google Scholar 

  15. Lucas, K., Busch, M., Mössinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. In: CABIOS, vol. 7, pp. 525–529 (1991)

    Google Scholar 

  16. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 269–278 (2000)

    Google Scholar 

  17. Proutski, V., Holme, E.C.: Primer Master: a new program for the design and analysis of PCR primers. In: CABIOS, vol. 12, pp. 253–255 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, L., Dong, L., Fan, H. (2004). Randomized Algorithms for Motif Detection. In: Fleischer, R., Trippen, G. (eds) Algorithms and Computation. ISAAC 2004. Lecture Notes in Computer Science, vol 3341. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30551-4_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30551-4_75

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24131-7

  • Online ISBN: 978-3-540-30551-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics