Hello. Sign in to personalize your visit. New user? Register now.  
GEN Updates in Biotechnology - Next-Generation Sequencing
Journal of Computational Biology
Clustering Binary Fingerprint Vectors with Missing Values for DNA Array Data Analysis

To cite this paper:
Andres Figueroa, James Borneman, Tao Jiang. Journal of Computational Biology. October 1, 2004, 11(5): 887-901. doi:10.1089/cmb.2004.11.887.

Full Text PDF: • HiRes for printing (132.8 KB) • PDF Plus w/ links (167.2 KB)


Andres Figueroa
Department of Computer Science, University of California, Riverside.
James Borneman
Department of Plant Pathology, University of California, Riverside.
Tao Jiang
Department of Computer Science, University of California, Riverside.

Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on Minimum Clique Partition on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.

Free first page

This paper was cited by:

Bacteria and bacterial rRNA genes associated with the development of colitis in IL‐10 −/− Mice
Jingxiao Ye, Jimmy W. Lee, Laura L. Presley, Elizabeth Bent, Bo Wei, Jonathan Braun, Neal L. Schiller, Daniel S. Straus, James Borneman
Inflammatory Bowel Diseases. Sep 2008, Vol. 14, No. 8: 1041-1050
CrossRef
Clustering Binary Oligonucleotide Fingerprint Vectors for DNA Clone Classification Analysis
Zhipeng Cai, Maysam Heydari, Guohui Lin
Journal of Combinatorial Optimization. 2005, Vol. 9, No. 2: 199
CrossRef
All papers
Previous Next