A compression algorithm for pre-simulated Monte Carlo p-value functions: Application to the ontological analysis of microarray studies
Section snippets
Background
Monte Carlo (MC) simulation (Metropolis and Ulam, 1949) is widely used for estimating p-values under circumstances when the distribution of the test statistic is unknown or cannot be computed exactly. In essence, the MC method artificially recreates the underlying chance process to generate a series of simulated test statistics under the null distribution, and defines the p-value for an observed statistic as the proportion of simulations greater than or equal to that value. Formally, given an
Proposed algorithm
Given a set of simulated test statistics , we seek to find a non-increasing function that approximates the Monte Carlo p-value function for the upper tail subject to the following competing constraints: First, the representation of should be storage-efficient, i.e. the amount of data needed for its evaluation should be small compared to the size of X; Second, it is desirable to have a way to control the approximation error. Here, we require that the relative
Results
In this section, we first explore the performance of the presented algorithm for different simulation sizes and requested error bounds. We next exemplify the effect of varying the function used for reconstructing the MC p-values from the compressed data. Finally, we illustrate the utility of the algorithm by applying it to the ontological analysis of microarray data.
Discussion
We have presented an algorithm designed to facilitate the use of Monte Carlo p-values in cases when it is feasible and desirable to pre-compute the simulations but the sheer size of the data complicates its distribution. This problem arises in specialized data analysis applications, including, as shown, certain problems in bioinformatics.
The benefits of the proposed algorithm can be summarized as follows: first, the reconstruction error is explicitly controllable at design time via the
Conclusion
In conclusion, we propose an new algorithm for obtaining size-reduced representations of MC p-value functions, originally motivated by the ontological analysis of microarray data. Hence, this work contributes to the further development of pattern analysis applications in bioinformatics.
References (12)
A mechanism of cyclin D1 action encoded in the patterns of gene expression in human cancer
Cell
(2003)Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium
Nat. Genet.
(2000)Significance analysis of functional categories in gene expression studies: A structured permutation approach
Bioinformatics
(2005)- et al.
Controlling the false discovery rate: A practical and powerful approach to multiple testing
J. Roy. Statist. Soc. B.
(1995) Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression
Bioinformatics
(2005)Improved statistical tests for differential gene expression by shrinking variance components estimates
Biostatistics
(2005)
Cited by (2)
Retreatment Predictions in Odontology by means of CBR Systems
2016, Computational Intelligence and NeuroscienceCase-based reasoning to classify endodontic retreatments
2012, Advances in Intelligent and Soft Computing