doi:10.1016/j.compbiolchem.2006.09.002
Copyright © 2006 Elsevier Ltd All rights reserved.
Link test—A statistical method for finding prostate cancer biomarkers
aCollege of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE 68182, USA
bDepartment of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
cDepartment of Pediatrics, University of Nebraska Medical Center, Omaha, NE 68198, USA
Received 10 March 2006;
revised 29 September 2006;
accepted 29 September 2006.
Available online 14 September 2007.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
We present a new method, link-test, to select prostate cancer biomarkers from SELDI mass spectrometry and microarray data sets. Biomarkers selected by link-test are supported by data sets from both mRNA and protein levels, and therefore results in improved robustness. Link-test determines the level of significance of the association between a microarray marker and a specific mass spectrum marker by constructing background mass spectra distributions estimated by all human protein sequences in the SWISS-PROT database. The data set consist of both microarray and mass spectrometry data from prostate cancer patients and healthy controls. A list of statistically justified prostate cancer biomarkers is reported by link-test. Cross-validation results show high prediction accuracy using the identified biomarker panel. We also employ a text-mining approach with OMIM database to validate the cancer biomarkers. The study with link-test represents one of the first cross-platform studies of cancer biomarkers.
Keywords: Microarray; Mass spectrometry; Biomarker; Prostate cancer; Text mining
Fig. 1. Flowchart of biomarkers extraction and their application in disease prognosis. Microarray and mass spectrometry data are first pre-processed independently, and differentially expressed candidate biomarkers are extracted for each type of data. Then link tests were applied to the microarray markers and mass spectrum markers to identify significant biomarkers for building a classifier. Unknown samples can then be classified using the trained classifier.
Fig. 2. The distribution of θ(mδ) and frequency E(mδ). (a) A segment of E(mδ) distribution shows periodic behavior (δ = 0.01). (b) Periodic distribution of θ(mδ) (δ = 0.01), where the value 10 on the y axis denotes infinity, +∞. (c) Periodic distribution of θ(mδ) (δ = 1), where the trend line of −ln θ(mδ) increases as the molecular weight m increases.
Fig. 3. The distribution of P-value, δ = 0.01. (a) P-value depends on the length of protein and mass marker. (b) The distribution of P-value when the protein length is fixed at 2500 residues. (c) P-value decreases with protein length with fixed mass (mass = 2000 Da).
Fig. 4. The flow chart of text mining OMIM records for finding prostate-cancer-related genes. (1) Search seed records using NET by identifying the keywords; (2) Construct record graph from the OMIM data base; (3) Query record graph using seed record with the Dijkstra's algorithm; (4) Generate distributions of the minimum distances and Bayesian scores for all records in OMIM.
Fig. 5. The distribution of nodes’ degrees of OMIM record graph. The histogram represents the distribution of nodes’ degrees of the record graph. The curve represents the cumulative distribution (%) of the nodes’ degrees.
Fig. 6. Distribution of the minimum distance of OMIM records of the major subgraph. The minimum distances from seed records to all the records in the major subgraph were calculated using Dijkstra's algorithm.
Table 1.
Significant biomarkers found by link-testa
a αoverall = 0.1,
αindividual = 6.74e−3, the average number of mass markers
K ≈ 15.
b Also found in another microarray marker.