Copyright © 2002 Published by Elsevier Science (USA).
Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis
Received 20 January 2002;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
We study two fundamental problems concerning the search for interesting regions in sequences: (i) given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum and (ii) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We present an O(n)-time algorithm for the first problem and an O(n log L)-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions.
Author Keywords: Algorithm; Efficiency; Maximum consecutive subsequence; Length constraint; Biomolecular sequence analysis; Ungapped local alignment
Article Outline
- 1. Introduction
- 2. Applications to biomolecular sequence analysis
- 2.1. Locating GC-rich regions
- 2.2. Post-processing sequence alignments
- 2.3. Annotating multiple sequence alignments
- 2.4. Computing ungapped local alignments with length constraints
- 3. Maximum sum consecutive subsequence with length constraints
- 4. Maximum average consecutive subsequence with length constraints
- 5. Implementation and preliminary experiments
- 6. Concluding remarks
- Acknowledgements
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (26)







ai,…,aj
and length at most 2L−1.