ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Journal of Computer and System Sciences
Volume 65, Issue 1, August 2002, Pages 73-96
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Purchase PDF (234 K)

 
 
 
View Record in Scopus
 
doi:10.1006/jcss.2002.1823    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2002 Elsevier Science (USA). All rights reserved.

Regular Article

Finding Similar Regions in Many Sequences*1

Ming Lia, 2, Bin Mab, 3 and Lusheng Wangc, 4

a Department of Computer Science, University of California, Santa Barbara, California, 93106, f1 b Department of Computer Science, University of Western Ontario, London, Ontario, N6A5B7, Canadaf2 c City University of Hong Kong, Kowloon, Hong Kong, f3

Received 1 July 1999; 
revised 1 July 2001. 
Available online 7 November 2002.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core of many molecular biology problems. Assume that we are given n DNA sequences s1, …, sn. The Consensus Patterns problem, which has been widely studied in bioinformatics research, in its simplest form, asks for a region of length L in each si, and a median string s of length L so that the total Hamming distance from s to these regions is minimized. We show that the problem is NP-hard and give a polynomial time approximation scheme (PTAS) for it. We then present an efficient approximation algorithm for the consensus pattern problem under the original relative entropy measure. As an interesting application of our analysis, we further obtain a PTAS for a restricted (but still NP-hard) version of the important consensus alignment problem allowing at most constant number of gaps, each of arbitrary length, in each sequence.

Author Keywords: approximation algorithms; consensus patterns; consensus alignment; computational biology


 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.