Computer Science and Information Systems 2009 Volume 6, Issue 2, Pages: 165-190
https://doi.org/10.2298/CSIS0902165H
Full text ( 304 KB)
Cited by


Microarray missing values imputation methods: Critical analysis review

Hourani Mou'ath (Faculty of Information Technology, Al Ahliyya Amman University, Al Saro St, Amman, Jordan)
El Emary Ibrahiem M.M. (Faculty of Engineering, Al Ahliyya Amman University, Al Saro St, Amman, Jordan)

Gene expression data often contain missing expression values. For the purpose of conducting an effective clustering analysis and since many algorithms for gene expression data analysis require a complete matrix of gene array values, choosing the most effective missing value estimation method is necessary. In this paper, the most commonly used imputation methods from literature are critically reviewed and analyzed to explain the proper use, weakness and point the observations on each published method. From the conducted analysis, we conclude that the Local Least Square (LLS) and Support Vector Regression (SVR) algorithms have achieved the best performances. SVR can be considered as a complement algorithm for LLS especially when applied to noisy data. However, both algorithms suffer from some deficiencies presented in choosing the value of Number of Selected Genes (K) and the appropriate kernel function. To overcome these drawbacks, the need for new method that automatically chooses the parameters of the function and it also has an appropriate computational complexity is imperative.

Keywords: Completely at Random (MCAR), Missing At Random (MAR), Sequential K-Nearest Neighbors (SKNN), Gene Ontology (GO), Singular Value Decomposition (SVD), Least Squares Imputation (LSI), Local Least Square Imputation (LLSI), Bayesian Principal Component Analysis(