research-article

Open Access

Do We Really Need Imputation in AutoML Predictive Modeling?

Authors:
George Paterakis

Computer Science Department, University of Crete, Heraklion, Greece

Computer Science Department, University of Crete, Heraklion, Greece

0009-0005-8856-8809
View Profile

,
Stefanos Fafalios

JADBio Gnosis DA S.A, Heraklion, Greece

JADBio Gnosis DA S.A, Heraklion, Greece

0009-0007-6722-0373
View Profile

,
Paulos Charonyktakis

JADBio Gnosis DA S.A., Heraklion, Greece

JADBio Gnosis DA S.A., Heraklion, Greece

0000-0002-6899-4262
View Profile

,
Vassilis Christophides

ENSEA, ETIS, Cergy, France

ENSEA, ETIS, Cergy, France

0000-0002-2076-1881
View Profile

,
Ioannis Tsamardinos

Computer Science Department, University of Crete, Heraklion, Greece

Computer Science Department, University of Crete, Heraklion, Greece

0000-0002-2492-959X
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 18 Issue 6Article No.: 147pp 1–64https://doi.org/10.1145/3643643

Published:12 April 2024Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an Automated Machine Learning context (AutoML) setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this article, we experimentally compare six state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder on real-world datasets and the MissForest in simulated datasets, followed closely by MM. In addition, binary indicator variables encoding missingness patterns actually improve predictive performance, on average. Last, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.

1 INTRODUCTION

Real-world data often contain missing values, stemming from faulty sensors, non-responders in questionnaires, incomplete data entry, or other reasons. For example, in the openml portal, as of March 2022, 364 of the 3,487 active datasets contain missing values. Unfortunately, most Machine Learning (ML) algorithms demand complete datasets on which to operate.¹ To address this problem, a plethora of imputation algorithms, ranging from simple to very advanced, have been developed to predict the missing values and allow the remaining algorithms in the analysis pipeline to complete.

The problem of imputation has been under study for decades [28, 47, 62]. Initially, it was studied in the context of estimating the coefficients of linear models, call it estimation perspective. In contrast, we study imputation from a predictive modeling perspective where the goal is to create an accurate model to predict a specific outcome of interest (target variable) in new samples. There are important differences in approaching the subject, under these two perspectives. Under the estimation perspective, (a) some methods would impute the missing values in the training data but would not create an imputation model that is able to impute test data [15, 77]. Hence, these methods cannot be applied to predictive modeling. In addition, (b) standard guidelines [67] suggest using the outcome in imputing feature values, e.g., to differentiate imputation values in cases vs. controls. This technique is not applicable in predictive modeling where the outcome is unknown in test samples. Finally, (c) a useful metric of imputation efficacy under the estimation perspective is the imputation accuracy [29, 34], i.e., the accuracy of predicting the missing values. Imputation accuracy is important for estimation purposes but may not be indicative of the impact of imputation on predictive performance.

Under the predictive modeling perspective, several interesting questions arise as follows:

—	Are advanced predictive modeling algorithms in need of imputation beyond the simple Mean/Mode (MM) technique? A non-linear algorithm could potentially learn a rule of the sort “if a feature value equals its mean (i.e., it is missing), then do not use it but instead rely on other observed features values for prediction.” Hence, it is questionable whether imputation would provide an advantage to such an algorithm.
—	Is the need for sophisticated imputation further reduced in Automated Machine Learning context (AutoML) whereby the most appropriate combination of algorithm and hyper-parameter values (combined algorithm and hyper-parameter selection (CASH) optimization) [68] is taking place?
—	Do Binary Indicator (BI) variables (1 if the value of a feature is missing and 0 otherwise) encoding the missingness patterns provide additional information to a classifier to learn a predictive model?
—	How does the feature selection step interact with imputation? Feature selection aims to reduce the number of features that enter the model without sacrificing predictive performance and leads to more interpretable models by providing insights regarding the underlying data generation. It remains open how the benefits of feature selection are impacted when we impute the missing values.
—	What is the tradeoff between the computational overhead of imputation and the improvement in predictive performance? Imputation algorithms impute all the missing values, independently of whether they contribute to the predictions of the model. In other words, imputation is unsupervised and not guided by the outcome to predict. Hence, they potentially perform a significant amount of unnecessary computations.
—	If imputation algorithms indeed improve performance, then are there any characteristics of the datasets (called meta-features) that allow us to predict the value of imputation prior to their analysis and decide whether imputation is worth the computational overhead?

To the best of our knowledge, this is the first empirical study that answers all the above research questions via an experimental evaluation over 25 binary classification real-world datasets, as well as 10 complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The MM imputation is used as a baseline and is compared against state-of-the-art representatives of different imputation algorithmic families, namely Discriminative, such as Miss-Forest [66], and Generative, such as SoftImpute [44] and probabilistic principal component analysis (PPCA) [70] exploiting matrix-factorization, or Generative Adversarial Imputation Nets (GAIN) [83], and Denoise AutoEncoder (DAE) [21] based on Neural Networks. The imputation algorithms are integrated into the Just Add Data Bio (JADBio) AutoML platform [73], which performs CASH and it includes a feature selection step.

In summary, the results show that the single best-performing algorithm is DAE and MissForest for the real and the simulated datasets, respectively. For five of the six imputation algorithms studied, the inclusion of BI variables is beneficial, on average. MM, when BI variables are included and CASH is taking place, is a close competitor and places as the second-best algorithm. Advanced imputation methods do offer a significant advantage but only in a few datasets. In contrast, they require the measurements of all feature values to impute new samples, which in some way invalidates the feature selection step and leads to models of high dimensionality. In addition, they require orders of magnitude more computational time. Meta-level analysis has indicated that only one feature is correlated with the relative performance of the algorithms; unfortunately, the correlation is not statistically significant when corrected for multiple testing. More datasets and new meta-features are needed to extract patterns of when sophisticated imputation should be used over the simple MM.

Overall, in an AutoML setting where optimization is taking place and BI variables are included, MM is a reasonable option; other algorithms should be used only if feature selection is not required and computational time is of little importance relative to improving predictive performance.

The article is organized as follows. Section 2 introduces missing data mechanisms and a taxonomy of imputation families. In Section 3, we present the experimental environment, the selected datasets for evaluation, and the metrics and hyper-parameters tuned. Section 4 describes the missing data generation procedure. The experimental results for real-world data with missing data and simulated missing data are presented in Sections 5 and 6, respectively. In Section 7, we discuss the results of the meta-level analysis on real-world datasets. Related work is discussed in Section 8, followed by the contributions and lessons learned in Section 9. Finally, Section 10 presents the conclusions and limitations of the study. The detailed information about the datasets, missing value simulation setup, and experimental results are provided in Appendices A, B, and C, respectively.

2 BACKGROUND AND CONTEXT

2.1 Missingness Mechanisms

The concept of a missing mechanism [62] formalizes the generation process of missing data. In this respect, the BI are modeled as random variables and assigned a distribution. There are three types of underlying mechanisms that generate missing data, namely, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). For formal definitions of these mechanisms, readers are referred to Reference [40]. Intuitively, MCAR implies that the probability of a value missing is independent of the actual value, the other observed quantities, and any latent variables. MAR implies that the missingness only depends on the observed data (so it can be predicted). MNAR refers to the case that the missing values are related to both the observed and unobserved variables, including the missing value itself. When missingness is MNAR, it is in principle and in general not possible to impute the missing values in a way that follows the unknown underlying data distribution.²

An illustrative example is given in Figure 1, which is adapted from Reference [46]. The missingness mechanisms can be described using a causal graph. Let us assume A and B are observed random variables and O a latent variable. Each variable is depicted as a node of the graph. Assume that A and B have a direct connection to O, which is the variable (node) of interest. The \(R_o\) node is a mask variable that denotes the missingness inserted into O, which causes \(O^*\). \(O^*\) is a surrogate of O but with missing values inserted in the positions specified by \(R_o\). As seen in Figure 1(a), MCAR missing values do not depend on any of the variables A, B, 0. In contrast, missingness depends on B for the MAR mechanism, and in O itself for the MNAR data, as seen in Figure 1(b) and (c).

Fig. 1. Panel (a) illustrates the MCAR missingness mechanism. Panel (b) denotes the MAR missingness and panel (c) the MNAR missingness. A, B are random variables, while O is the observed variable of interest. \(R_o\) is variable that represents the missingness in O in the form of mask variable. \(O^*\) is the result of applying the \(R_o\) mask to the O variable.

2.2 An Imputation Family Taxonomy

There are numerous imputation algorithms and approaches in the literature, and we do not attempt a full review. Readers are encouraged to explore comprehensive surveys available in the field for a more in-depth understanding [2, 4, 12, 13, 24]. Imputation approaches can be partitioned into various distinct families/groups of methods. A taxonomy is attempted in Figure 2. First, imputed values can be decided based on only the feature with the missing value (Univariate imputation) or several features (multivariate imputation). The former methods include Mean/Median/Max imputation for continuous data and Mode imputation for categorical data. Multivariate methods can be partitioned into Iterative and Distance-Based, also known as Hot-Deck methods [6]. Distance-based methods employ a distance or a similarity metric for samples to find neighbors or cluster them. A commonly used algorithm in this category is the K-nearest neighbors imputation (KNNi) [71], which imputes values based on the neighbors of the sample with missing values. K-means-based methods cluster the samples before imputation [37].

Fig. 2. Taxonomy of imputation families: rectangular nodes represent families, while oval nodes represent algorithms of that family.

Iterative methods, start with a simple initial guess (e.g., using MM imputation) and, in each iteration, try to improve the imputed values. We further split iterative methods into Discriminative and Generative. Discriminative methods, build a predictive model per feature with missing values, given the other features in the dataset. This model is used to predict the missing values of the corresponding feature, in each iteration. The Discriminative family can either utilize a (generalized) linear model or a non-linear model. Linear discriminative methods include Multivariate Imputation by Chained Equations (MICE) [76]. Non-linear discriminative methods include the MissForest algorithm [66] employing Random Forests, and Datawig [9] that can impute continuous, categorical, and text data by employing different loss functions according to the missing features’ datatype.

Generative methods try to model the joint distribution of the data and use the generative model to impute values. They can be split into two categories, methods that employ matrix factorization and methods that use neural networks. The matrix-factorization family includes low-rank matrix decomposition methods: First, missing values are imputed with an initial guess, and the matrix is decomposed (factorized) and used to predict the missing values. Imputation is improved in each cycle via expectation-maximization steps. Examples of this family include the PPCA [70], SVDImpute [71], bPCA [10], and SoftImpute [44]. Such algorithms scale better w.r.t. to the number of features than MICE or MissForest that train a different model for each feature with missing values in each iteration. Recently, neural networks have also been tried as generative models. These algorithms are essentially non-linear alternatives to matrix factorization. These methods start with an initial guess and then train a neural network that learns the joint distribution. This family includes methods based on AutoEncoders (AE), such as DAE) [21, 35, 41] and Variational Autoencoder (VAE) [17, 45, 61]. Also, it includes generative adversarial networks (GAIN) [64, 83]). Finally, HoloClean, a data cleaning tool, implements an attention-based neural network for imputation, named Aimnet [82]. A detailed comparison of imputation methods is detailed in Section 8. In the next subsection, we will explain the rationale for our choice to include in our empirical study a subset of the aforementioned imputation methods.

2.3 Description of the Selected Imputation Methods

In this section, we present the main characteristics of the imputation methods given in Table 1 that we included in our testbed. In the analysis of their computational complexity, n denotes the number of samples, m the number of features, \(\#comp\) the number of principal components, \(\#sing\) the number of singular values, and \(\#trees\) for the number of trees.

Table 1.

Algorithm	Model Family	Base Model	Learning Procedure	Categorical Handling	Approx. Complexity
MM	Univariate	—	No Iterative	Native	O(\(n \cdot m\))
SOFT	Generative	SVD	Iterative	One-hot-encoding	O(\(k \cdot n \cdot m \cdot \#sing\))
PPCA	Generative	PCA	Iterative	One-hot-encoding	O(\(k \cdot n \cdot m \cdot \#comp\))
MF	Discriminative	RF	Iterative	Native	O(\(k \cdot m^2 \cdot n \cdot \log (n) \cdot \#\text{trees}\))
GAIN	Generative	GAN	Iterative	Ordinal-Encoding	—
DAE	Generative	AE	Iterative	One-hot-encoding	—

Abbreviations: n is the number of samples, m is the number of features, k is the number of iterations, \(\#trees\) stands for the number of trees in the forest(hp), \(\#comp\) stands for the number of principal components employed in the matrix factorization, and \(\#sing\) for the number of singular values of the SVD.

Algorithm	Hyper-parameter	Value
Mean/Mode	—	—
MissForest	n-trees	250
	maxDepth	20, 30
	maxLeafNodes	30
SoftImpute	variance-explained	50%, 70%, 90%
PPCA	variance-explained	50%, 70%, 90%
DAE	dropout	0.25, 0.4, 0.5
	batch-size	64
	\(\theta\)	5, 7, 10
	epochs	500
GAIN	alpha	0.1, 1, 10
	hint-rate	0.5, 0.9
	batch-size	64
	epochs	10.000

Base Imp.Method	p-value	q-value
MF	0.036	0.182
PPCA	0.064	0.182
GAIN	0.091	0.182
MM	0.148	0.222
SOFT	0.31	0.375
DAE	0.552	0.552

Name	Category	Description
inst_to_attr	General	Samples to features ratio
Minority Class %	General	% of minority class
nr_attr	General	Number of features
nr_inst	General	Number of samples
n_num	General	Number of numerical features
n_cat	General	Number of categorical features
% NA	Missing	% of missing values in data
% samples /w NA	Missing	% of samples with missing values
% features /w NA	Missing	% of features with missing values
% NA/Feat. /w (NA 1+%)	Missing	Mean % of missing values per feature with more than 1% missing
# components 50%	Clustering	Number of components that explain 50% of data variance
# components 70%	Clustering	Number of components that explain 70% of data variance
# components 90%	Clustering	Number of components that explain 90% of data variance
Slh(k=2)	Clustering	Mean Silhouette Coefficient of all samples when using 2 clusters
Slh (k=3)	Clustering	Mean Silhouette Coefficient of all samples when using 3 clusters
Slh (k=4)	Clustering	Mean Silhouette Coefficient of all samples when using 4 clusters

Study	#Datasets	Mechanism	% Missing values	#Imp. methods	NNs	BI	System	FS	Tuning	#Models	Eval	Metric	Meta
[38]	6B	Nat	7–84%	3N, 2C, 2M.	No	No	Adhoc	No	Imp+Pred	7	R-TT (70-30)	ACC-F1	No
[81]	13B	Nat	0.6–33.6%	2N,1C,4M	No	No	Adhoc	No	No	5	TT (80-20)	AUC-F1	No
[57]	2B	MC,MR	0–40% \(^{**}\)	7C	No	No	Adhoc	No	No	3	TT(66.6-33.3)	ACC	No
[8]	5B,5R	MC-MN	10–50%	8M	No	No	Adhoc	No	Imp	4	R-TT(50-50)	ACC-R2	No
[30]	31B, 21R, 17M	MC,MR,MN	1–50%\(^{*}\)	6M	Yes	No	Adhoc	No	Imp	2	CV(3-5)	RMSE-F1	No
[55]	10B, 3R	Nat	—	7M	No	Yes	Adhoc	No	Pred	3	NCV	ACC	No
[19]	23B,M	MC	7%	6M	No	No	AutoML	Yes	—	10	R-TT(75-25)	ACC	No
[49]	5B	Nat	—	4N, 2C, 1M	No	No	AutoML	—	Imp+Pred	Ensemble	CV(5)	B-ACC	No
Ours	35B	N,MC,MR	1–72%	6M	Yes	Yes	AutoML	Yes	Imp+Pred	4	TT(50-50)	ACC-F1-AUC	Yes

Dataset	ID	Samples	Features	#Numerical	#Categorical	Missing %	Imbalance ratio	#Feat with miss>1%	%Missing/Feature
analcatdata_reviewer	1,008	379	7	0	7	51.56	0.43	7	51.56
audiology	999	226	69	0	69	2.03	0.25	6	23.23
anneal	989	898	38	16	22	64.98	0.24	29	85.15
autoHorse	840	205	25	17	8	1.11	0.40	4	6.46
braziltourism	957	412	8	7	1	2.91	0.23	2	10.68
bridges	328	107	11	4	7	6.03	0.41	7	9.35
cjs	1,024	2,796	34	32	2	71.64	0.24	28	86.97
colic	27	368	22	7	15	23.80	0.37	19	27.50
colleges_aaup	897	1,161	15	13	2	1.47	0.30	6	3.68
cylinder-bands	6,332	540	39	24	15	4.74	0.42	23	7.93
dresses-sales	23,381	500	12	1	11	13.92	0.42	5	33.04
eucalyptus	990	736	19	14	5	3.21	0.29	6	9.95
hepatitis	55	155	19	6	13	5.67	0.21	11	9.56
hungarian	231	294	13	12	1	20.46	0.36	5	52.93
kdd_el_nino-small	839	782	8	8	0	7.45	0.35	4	14.90
mushroom	24	8,124	22	0	22	1.39	0.48	1	30.53
pbcseq	802	1,945	17	13	4	3.43	0.50	6	9.71
primary-tumor	1,003	339	17	0	17	3.90	0.25	2	32.74
profb	470	672	9	5	4	19.84	0.33	2	89.29
schizo	466	340	14	12	2	17.52	0.48	11	22.30
sick	38	3,772	29	7	22	5.54	0.06	7	22.96
soybean	1,023	683	35	0	35	9.78	0.13	32	10.68
stress	42,167	199	12	8	4	8.29	0.20	7	14.22
vote	56	435	16	0	16	5.63	0.39	16	5.63
water-treatment	940	527	36	36	0	2.86	0.15	22	4.53

Dataset	ID	#Samples	#Features	#Numerical	#Categorical	Minority Class %
Australian	40,981	690	14	8	6	0.44
boston	853	506	13	12	1	0.41
churn	40,701	5,000	20	16	4	0.14
compas-two-years	42,193	5,278	13	7	6	0.47
image	40,592	2,000	135	135	0	0.21
page-blocks	1,021	5,473	10	10	0	0.1
parkinsons	1,488	195	22	22	0	0.25
segment	958	2,310	19	19	0	0.14
stock	841	950	9	9	0	0.49
zoo	965	101	16	1	15	0.41

Dataset	ID	Samples	Features	#Numerical	#Categorical	Missing %	Minority Class %	#Feat miss>1%	%Missing/Feature	Type
adult	179	48,842	14	6	8	0.95	0.24	3	4.41	Binary
albert	41,147	425,240	78	78	0	13.64	0.50	43	24.73	Binary
analcatdata_reviewer	1,008	379	7	0	7	51.56	0.43	7	51.56	Binary
anneal	989	898	38	16	22	64.98	0.24	29	85.15	Binary
aps_failure	41,138	76,000	170	170	0	8.35	0.02	160	8.83	Binary
ASP-POTASSCO-classification	41,705	1,294	142	139	3	9.94	0.02	138	10.23	MultiClass
ASP-POTASSCO-regression	41,704	14,234	142	138	4	9.94	0.00	138	10.23	Regression
audiology	999	226	69	0	69	2.03	0.25	6	23.23	Binary
autoHorse	840	205	25	17	8	1.11	0.40	4	6.46	Binary
braziltourism	957	412	8	7	1	2.91	0.23	2	10.68	Binary
bridges	328	107	11	4	7	6.03	0.41	7	9.35	Binary
Census-Income-KDD	42,750	199,523	41	13	28	5.08	0.06	7	29.72	Binary
cjs	1,024	2,796	34	32	2	71.64	0.24	28	86.97	Binary
Code_Smells_Data_Class	43,079	86,467	66	66	0	49.99	0.00	62	53.20	Regression
colic	27	368	22	7	15	23.80	0.37	19	27.50	Binary
colleges	42,727	7,063	47	31	16	31.42	0.00	30	49.19	Regression
colleges_aaup	897	1,161	15	13	2	1.47	0.30	6	3.68	Binary
colleges_usnews	930	1,302	33	32	1	18.22	0.47	25	23.96	Binary
cylinder-bands	6,332	540	39	24	15	4.74	0.42	23	7.93	Binary
Domainome	41,533	1,623	9838	9838	0	82.17	0.35	9688	83.44	Binary
dresses-sales	23,381	500	12	1	11	13.92	0.42	5	33.04	Binary
echoMonths	222	130	9	7	2	8.29	0.00	6	12.31	Regression
eucalyptus	990	736	19	14	5	3.21	0.29	6	9.95	Binary
fishcatch	232	158	7	7	0	7.87	0.00	1	55.06	Regression
fps-in-video-games	42,737	425,833	44	33	11	6.94	0.00	12	25.44	Regression
hepatitis	55	155	19	6	13	5.67	0.21	11	9.56	Binary
house_prices_nominal	42,563	1,460	79	36	43	6.04	0.00	16	29.74	Regression
hungarian	231	294	13	12	1	20.46	0.36	5	52.93	Binary
ipums_la_97-small	993	7,019	60	34	26	11.42	0.04	18	38.06	MultiClass
ipums_la_98-small	381	7,485	60	34	26	11.59	0.01	17	40.91	MultiClass
ipums_la_99-small	378	8,844	60	34	26	9.71	0.02	18	32.36	MultiClass
jungle_chess_2pcs_endgame_rat_panther	41,002	5,880	46	18	28	1.30	0.23	6	10.00	MultiClass
KDD98	42,343	82,318	477	358	119	11.30	0.12	87	61.98	Binary
KDDCup09-Upselling	1,112	50,000	15000	13,391	1609	3.35	0.07	608	82.59	Binary
KDDCup09_churn	42,759	50,000	230	192	38	69.78	0.07	205	78.28	Binary
kdd_coil_1	567	316	11	8	3	1.61	0.00	3	4.85	Regression
kdd_el_nino-small	839	782	8	8	0	7.45	0.35	4	14.90	Binary
kick	41,162	72,983	32	17	15	6.39	0.12	5	40.51	Binary
lymphoma_2classes	1,101	45	4026	4,026	0	3.28	0.49	2116	6.25	Binary
meta	566	528	21	18	3	4.55	0.00	3	31.82	Regression
MiceProtein	40,966	1,080	81	77	4	1.60	0.10	8	14.69	MultiClass
Midwest_Survey_nominal	42,532	2,778	27	1	26	1.95	0.03	5	10.51	MultiClass
mlr_ranger_rng	42,458	278,863	14	8	6	3.56	0.00	1	49.69	Regression
mlr_svm_rng	42,456	540,576	13	7	6	9.38	0.00	2	60.95	Regression
Moneyball	41,021	1,232	14	11	3	20.87	0.00	4	73.05	Regression
mushroom	24	8,124	22	0	22	1.39	0.48	1	30.53	Binary
NewFuelCar	41,506	36,203	17	17	0	1.46	0.00	1	24.78	Regression
okcupid-stem	42,734	50,789	19	3	16	15.97	0.10	12	25.28	MultiClass
pbc	524	418	18	17	1	16.47	0.00	12	24.66	Regression
pbcseq	802	1,945	17	13	4	3.43	0.50	6	9.71	Binary
porto-seguro	42,206	595,212	37	25	12	3.84	0.04	5	28.21	Binary
primary-tumor	1,003	339	17	0	17	3.90	0.25	2	32.74	Binary
profb	470	672	9	5	4	19.84	0.33	2	89.29	Binary
rl	41,160	31,406	22	22	0	10.45	0.10	8	28.71	Binary
road-safety	42,803	363,243	66	61	5	9.10	0.05	41	14.62	MultiClass
SAT11-HAND-runtime-regression	41,980	4,440	116	113	3	5.27	0.00	10	61.15	Regression
schizo	466	340	14	12	2	17.52	0.48	11	22.30	Binary
sick	38	3,772	29	7	22	5.54	0.06	7	22.96	Binary
soybean	1,023	683	35	0	35	9.78	0.13	32	10.68	Binary
speeddating	40,536	8,378	122	61	61	2.87	0.00	109	3.17	Binary
stress	42,167	199	12	8	4	8.29	0.20	7	14.22	Binary
us_crime	315	1,994	127	126	1	15.48	0.00	24	81.91	Regression
vote	56	435	16	0	16	5.63	0.39	16	5.63	Binary
water-treatment	940	527	36	36	0	2.86	0.15	22	4.53	Binary

Dataset	Missing Feature (target)	#Selected Features
aps_failure	cn_006	25
colleges_aaup	Average_salary-full_professors	5
colleges_usnews	Out-of-state_tuition	18
dresses-sales	V3	7
eucalyptus	PMCno	6
hepatitis	ALBUMIN	6
hungarian	thalach	3
mushroom	stalk-root	16
pbcseq	presence_of_asictes	7
speeddating	attractive	24

—	Including BI in the dataset improves the predictive performance of the machine learning pipeline for most algorithms (see Section 5.1). The inclusion of BIs does increase the dimensionality—and difficulty—of the machine learning task. However, it does encode the information about which missing values are missing; this allows a classifier to learn which values to trust or not. Results indicate that encoding this information turns out to be more beneficial than harmful.
—	BI+DAE is found to be the single best imputation method in real-world data with native missing values followed by BI+MM, which is the standard in AutoML tools. As seen in Section 5.2, both methods have the same number of wins (when comparing only BI extended methods) across datasets with BI+DAE having higher mean AUC. The worst performance is exhibited by matrix-factorization (linear dimensionality reduction) methods such as PPCA. These methods do scale with the number of features and may be more suitable for high-dimensional, low-sample datasets.
—	BI+MM exhibits the best tradeoff between efficiency and effectiveness. As expected (see Sections 5.3 and 6.3 and Appendix C.2.4), BI+MM is the fastest method to train and also is more effective in the majority of the comparisons. MF, due to its iterative nature, is the slowest among all closely followed by GAIN. GAIN’s main bottleneck is the number of epochs required to train the network. The authors’ suggestion was 10,000 epochs, which is 20 times more than the 500 epochs suggested by the authors of the DAE method.
—	Based on the results of Section 5.4, we would suggest practitioners to optimize their models over the BI+MM and BI+DAE algorithms. BI+MM and BI+DAE score over 99% of the maximum AUC in real-world data as shown in Section 5.4. Specifically, BI+MM scores 98.68% of the maximum AUC. Adding BI+DAE to the pipeline leads to 99.69% of the maximum AUC. However, this comes at the cost of increasing the configuration space by 10\(\times\), as DAE has nine tuning configurations compared to one of BI+MM. Also, to reach 100% of the optimal performance, we have to train 24-times more configurations than by simply using BI+MM.
—	BI+MF is the best method in datasets with simulated missing values. As shown in Section 6.1, in both MCAR and MAR simulations, BI+MF is on average the best. In contrast, BI+MF is the third best with real-world data, falling behind BI+DAE. Despite our best efforts to realistically simulate missing values, there may still be differences between real-world missing-data generative mechanisms and our simulations. First, we simulated MCAR and MAR missing values. Real-world missing values may be NMAR. Second, the missingness probability for MAR data is determined by a generalized linear model (logistic regression model). Real-world missing values may follow non-linear models. The majority of the literature employs similar simulations for comparing imputation algorithms. However, as indicated by this study, results with simulated missingness may not generalize to real-world datasets. New simulation methodologies need to be proposed to this end.
—	Missingness increase leads to a deterioration in predictive performance. As shown in Section 6.1, increasing missingness causes a drop in the AutoML tool’s capability of predicting the outcome. Missingness at 10% leads to a 0.024 AUC drop compared to the complete dataset. Similarly, 25% missingness leads to 0.05 AUC drop, while at 50% we can inspect up to 0.144 drop average as seen in Tables 12(a) and (b).
—	The set containing BI+MM and BI+MF reaches 99% of maximum AUC for simulated data as shown in Section 6.2. BI+MM scores the 98.7% of the maximum AUC for MCAR data and 98.99% for MAR data. To surpass 99% of the maximum AUC, the addition of BI+MF is needed. This addition allows the tool to reach 99.62% and 99.43% on MCAR and MAR data, respectively. However, BI+MF has to be tuned, leading to a total 3\(\times\) increase in pipeline complexity.
—	A meta-learning methodology to correlate meta-features with performance is presented in Section 7. It could allow scientists to select the appropriate sophisticated methods based on meta-features, saving training time and improving overall performance. In addition, it could provide insight into the design choice of an algorithm that leads to better or worse performance on a given dataset. Unfortunately, no statistically significant results were found. This means that either there are no correlations present with the selected meta-features, or these correlations are not strong enough to be found significant with the given sample size of 25 datasets.

Dataset	MM	BI+MM	MF	BI+MF	GAIN	BI+GAIN	SOFT	BI+SOFT	PPCA	BI+PPCA	DAE	BI+DAE
analcatdata_reviewer-FS	0.585	0.585	0.5	0.597	0.561	0.595	0.602	0.602	0.558	0.558	0.585	0.585
analcatdata_reviewer-NOFS	0.661	0.668	0.599	0.646	0.602	0.61	0.597	0.659	0.606	0.656	0.661	0.668
analcatdata_reviewer-Overall	0.661	0.668	0.605	0.643	0.607	0.63	0.597	0.659	0.606	0.656	0.661	0.668
anneal-FS	0.883	0.988	0.761	0.97	0.916	0.973	0.987	0.967	0.995	0.986	0.975	0.968
anneal-NOFS	0.881	0.996	0.938	0.97	0.931	0.983	0.972	0.991	0.996	0.996	0.982	0.982
anneal-Overall	0.883	0.996	0.943	0.969	0.896	0.991	0.972	0.991	0.996	0.996	0.975	0.982
audiology-FS	0.986	0.986	0.98	0.98	0.98	0.974	0.98	0.98	0.98	0.98	0.98	0.98
audiology-NOFS	0.998	0.998	0.992	0.993	0.994	0.991	0.992	0.998	0.995	0.989	0.993	0.995
audiology-Overall	0.998	0.998	0.98	0.981	0.993	0.992	0.992	0.98	0.995	0.989	0.993	0.98
autoHorse-FS	0.967	0.967	0.966	0.966	0.966	0.966	0.966	0.966	0.901	0.996	0.966	0.966
autoHorse-NOFS	0.99	0.989	0.983	0.988	0.982	0.988	0.981	0.99	0.989	0.993	0.976	0.976
autoHorse-Overall	0.99	0.989	0.966	0.966	0.987	0.988	0.981	0.99	0.989	0.993	0.966	0.966
braziltourism-FS	0.634	0.634	0.64	0.64	0.643	0.634	0.64	0.64	0.725	0.725	0.643	0.643
braziltourism-NOFS	0.616	0.727	0.716	0.725	0.721	0.709	0.709	0.725	0.669	0.668	0.731	0.715
braziltourism-Overall	0.616	0.727	0.643	0.64	0.632	0.64	0.709	0.64	0.669	0.668	0.731	0.715
bridges-FS	0.844	0.844	0.853	0.857	0.891	0.849	0.891	0.891	0.892	0.885	0.88	0.847
bridges-NOFS	0.909	0.911	0.902	0.901	0.882	0.916	0.902	0.889	0.901	0.916	0.915	0.912
bridges-Overall	0.909	0.911	0.902	0.909	0.905	0.909	0.902	0.889	0.901	0.916	0.915	0.912
cjs-FS	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.987	0.997	1.0	1.0
cjs-NOFS	1.0	1.0	0.994	1.0	0.985	0.998	0.996	0.991	0.987	0.99	0.999	0.999
cjs-Overall	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.987	0.997	1.0	1.0
colic-FS	0.829	0.829	0.837	0.855	0.845	0.839	0.836	0.836	0.839	0.839	0.83	0.83
colic-NOFS	0.839	0.838	0.853	0.863	0.845	0.872	0.842	0.865	0.848	0.858	0.852	0.856
colic-Overall	0.829	0.829	0.849	0.881	0.846	0.862	0.836	0.865	0.839	0.858	0.83	0.83
colleges_aaup-FS	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.996	0.996	0.999	0.999
colleges_aaup-NOFS	0.999	0.999	0.997	0.999	0.997	0.997	0.998	0.997	0.998	0.998	0.998	0.997
colleges_aaup-Overall	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.999	0.998	0.998	0.999	0.999
cylinder-bands-FS	0.808	0.785	0.81	0.788	0.808	0.797	0.808	0.798	0.723	0.723	0.832	0.826
cylinder-bands-NOFS	0.82	0.819	0.826	0.832	0.826	0.828	0.815	0.821	0.807	0.824	0.848	0.858
cylinder-bands-Overall	0.82	0.819	0.832	0.831	0.813	0.816	0.815	0.821	0.807	0.824	0.848	0.858
dresses-sales-FS	0.562	0.562	0.564	0.561	0.56	0.552	0.565	0.565	0.5	0.549	0.562	0.562
dresses-sales-NOFS	0.619	0.605	0.6	0.597	0.569	0.583	0.597	0.601	0.545	0.539	0.631	0.62
dresses-sales-Overall	0.619	0.562	0.564	0.561	0.567	0.552	0.565	0.565	0.545	0.539	0.631	0.562
eucalyptus-FS	0.833	0.833	0.821	0.82	0.823	0.835	0.842	0.817	0.75	0.816	0.807	0.834
eucalyptus-NOFS	0.777	0.777	0.778	0.778	0.778	0.777	0.779	0.844	0.82	0.824	0.849	0.855
eucalyptus-Overall	0.833	0.833	0.778	0.778	0.832	0.836	0.842	0.817	0.82	0.824	0.849	0.855
hepatitis-FS	0.748	0.748	0.834	0.83	0.803	0.807	0.673	0.673	0.799	0.799	0.826	0.826
hepatitis-NOFS	0.826	0.848	0.866	0.866	0.844	0.85	0.869	0.864	0.876	0.866	0.852	0.869
hepatitis-Overall	0.826	0.848	0.867	0.858	0.858	0.866	0.869	0.864	0.799	0.799	0.852	0.869
hungarian-FS	0.899	0.899	0.883	0.884	0.893	0.867	0.871	0.871	0.897	0.897	0.875	0.875
hungarian-NOFS	0.918	0.915	0.901	0.901	0.917	0.915	0.881	0.895	0.895	0.898	0.914	0.912
hungarian-Overall	0.918	0.915	0.887	0.888	0.913	0.918	0.871	0.871	0.895	0.897	0.914	0.912
kdd_el_nino-small-FS	0.983	0.983	0.98	0.981	0.988	0.987	0.983	0.981	0.95	0.926	0.983	0.986
kdd_el_nino-small-NOFS	0.987	0.989	0.984	0.985	0.988	0.988	0.985	0.988	0.98	0.985	0.986	0.987
kdd_el_nino-small-Overall	0.987	0.989	0.982	0.986	0.988	0.987	0.985	0.988	0.98	0.985	0.986	0.987
mushroom-FS	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
mushroom-NOFS	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
mushroom-Overall	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
pbcseq-FS	0.849	0.849	0.851	0.852	0.84	0.849	0.836	0.831	0.849	0.849	0.846	0.843
pbcseq-NOFS	0.849	0.85	0.857	0.844	0.856	0.848	0.846	0.845	0.841	0.838	0.848	0.842
pbcseq-Overall	0.849	0.85	0.851	0.85	0.851	0.849	0.846	0.845	0.841	0.838	0.848	0.842
primary-tumor-FS	0.875	0.887	0.855	0.875	0.846	0.874	0.829	0.882	0.829	0.864	0.786	0.887
primary-tumor-NOFS	0.88	0.892	0.875	0.875	0.871	0.88	0.877	0.889	0.861	0.87	0.88	0.892
primary-tumor-Overall	0.88	0.892	0.867	0.892	0.886	0.87	0.877	0.889	0.861	0.87	0.88	0.892
profb-FS	0.642	0.642	0.642	0.642	0.642	0.642	0.642	0.642	0.631	0.631	0.642	0.642
profb-NOFS	0.695	0.696	0.691	0.693	0.695	0.692	0.695	0.696	0.579	0.58	0.693	0.695
profb-Overall	0.695	0.696	0.692	0.694	0.692	0.686	0.695	0.696	0.631	0.631	0.693	0.695
schizo-FS	0.526	0.526	0.556	0.557	0.534	0.515	0.556	0.556	0.543	0.543	0.623	0.623
schizo-NOFS	0.719	0.684	0.764	0.765	0.717	0.74	0.76	0.76	0.56	0.559	0.794	0.805
schizo-Overall	0.719	0.684	0.781	0.772	0.736	0.751	0.76	0.76	0.543	0.543	0.794	0.805
sick-FS	0.984	0.984	0.992	0.993	0.993	0.992	0.977	0.969	0.991	0.991	0.992	0.992
sick-NOFS	0.99	0.989	0.994	0.994	0.992	0.995	0.986	0.988	0.992	0.989	0.993	0.992
sick-Overall	0.99	0.989	0.994	0.994	0.992	0.993	0.986	0.988	0.992	0.989	0.993	0.992
soybean-FS	0.981	0.983	0.983	0.992	0.991	0.981	0.985	0.985	0.973	0.973	0.991	0.991
soybean-NOFS	0.991	0.994	0.989	0.986	0.989	0.992	0.987	0.991	0.991	0.985	0.989	0.993
soybean-Overall	0.981	0.994	0.985	0.991	0.99	0.991	0.987	0.991	0.991	0.985	0.989	0.993
stress-FS	0.916	0.916	0.932	0.932	0.932	0.932	0.932	0.932	0.933	0.933	0.932	0.932
stress-NOFS	0.902	0.904	0.899	0.903	0.906	0.904	0.902	0.901	0.948	0.946	0.909	0.909
stress-Overall	0.916	0.916	0.932	0.932	0.932	0.932	0.932	0.932	0.933	0.933	0.909	0.932
vote-FS	0.983	0.985	0.992	0.995	0.986	0.99	0.991	0.991	0.989	0.991	0.978	0.986
vote-NOFS	0.992	0.991	0.994	0.991	0.994	0.99	0.995	0.991	0.995	0.992	0.992	0.992
vote-Overall	0.992	0.991	0.992	0.991	0.995	0.992	0.991	0.991	0.989	0.991	0.992	0.992
water-treatment-FS	0.916	0.988	0.986	0.987	0.958	0.988	0.943	0.943	0.5	0.5	0.962	0.979
water-treatment-NOFS	0.988	0.988	0.988	0.987	0.988	0.988	0.954	0.986	0.788	0.774	0.98	0.981
water-treatment-Overall	0.988	0.988	0.987	0.987	0.988	0.988	0.954	0.986	0.788	0.774	0.98	0.981

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.073	0.0	0.171	0.08	0.0	0.0
Australian-Train	0.033	0.0	0.174	0.118	0.0	0.0
boston-Test	0.246	0.0	0.636	0.491	0.0	0.302
boston-Train	0.224	0.252	0.651	0.521	0.0	0.336
churn-Test	0.292	0.0	0.474	0.33	0.0	0.348
churn-Train	0.259	0.429	0.46	0.322	0.0	0.337
compas-two-years-Test	0.06	0.0	0.235	0.209	0.0	0.0
compas-two-years-Train	0.058	0.139	0.316	0.225	0.0	0.0
image-Test	0.711	0.0	—	0.798	0.0	0.827
image-Train	0.46	0.938	—	0.775	0.0	0.829
page-blocks-Test	0.299	0.0	0.815	0.15	0.0	0.4
page-blocks-Train	0.319	0.309	0.796	0.168	0.0	0.431
parkinsons-Test	0.194	0.04	0.697	0.578	0.0	0.489
parkinsons-Train	0.221	0.592	0.815	0.6	0.0	0.628
segment-Test	0.364	0.0	0.665	0.483	0.053	0.48
segment-Train	0.433	0.41	0.729	0.539	0.053	0.517
stock-Test	0.514	0.0	0.931	0.699	0.0	0.546
stock-Train	0.559	0.656	0.951	0.701	0.0	0.588
zoo-Test	0.0	0.0	0.687	0.0	0.0	0.0
zoo-Train	0.0	0.0	0.0	0.0	0.0	0.0

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.653	0.594	0.745	0.682	0.659	0.36
Australian-Train	0.679	0.743	0.825	0.635	0.605	0.364
boston-Test	0.955	0.818	1.0	1.0	1.0	0.5
boston-Train	0.773	0.364	0.955	0.955	0.955	0.682
churn-Test	0.77	0.642	0.795	0.742	0.689	0.708
churn-Train	0.746	0.747	0.769	0.716	0.673	0.715
compas-two-years-Test	0.858	0.586	0.955	0.766	0.682	0.835
compas-two-years-Train	0.832	0.94	0.953	0.75	0.668	0.821
zoo-Test	0.837	0.572	0.861	0.806	0.689	0.73
zoo-Train	0.774	0.874	0.909	0.863	0.734	0.743

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.043	0.0	0.125	0.097	0.0	0.01
Australian-Train	0.034	0.0	0.101	0.069	0.0	0.001
boston-Test	0.14	0.0	0.585	0.413	0.0	0.201
boston-Train	0.047	0.159	0.604	0.382	0.0	0.175
churn-Test	0.237	0.0	0.378	0.211	0.0	0.283
churn-Train	0.243	0.376	0.391	0.213	0.0	0.286
compas-two-years-Test	0.003	0.0	0.152	0.129	0.0	0.0
compas-two-years-Train	0.007	0.031	0.162	0.153	0.0	0.0
image-Test	0.256	0.0	—	0.712	0.0	0.757
image-Train	0.181	0.88	—	0.644	0.0	0.765
page-blocks-Test	0.183	0.0	0.614	0.228	0.0	0.279
page-blocks-Train	0.227	0.292	0.768	0.236	0.0	0.399
parkinsons-Test	0.01	0.005	0.723	0.561	0.0	0.466
parkinsons-Train	0.067	0.54	0.764	0.353	0.0	0.497
segment-Test	0.256	0.0	0.696	0.452	0.053	0.473
segment-Train	0.243	0.412	0.677	0.446	0.053	0.46
stock-Test	0.423	0.0	0.902	0.562	0.0	0.504
stock-Train	0.443	0.496	0.901	0.554	0.0	0.516
zoo-Test	0.059	0.0	0.0	0.107	0.0	0.0
zoo-Train	0.0	0.0	0.421	0.407	0.0	0.0

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.637	0.593	0.754	0.632	0.624	0.622
Australian-Train	0.646	0.635	0.745	0.623	0.616	0.592
boston-Test	0.857	0.814	0.9	0.9	0.9	0.457
boston-Train	0.869	0.443	0.951	0.951	0.951	0.541
churn-Test	0.719	0.617	0.764	0.71	0.703	0.679
churn-Train	0.721	0.704	0.776	0.712	0.706	0.678
compas-two-years-Test	0.831	0.55	0.925	0.683	0.682	0.8
compas-two-years-Train	0.831	0.758	0.922	0.687	0.686	0.814
zoo-Test	0.824	0.555	0.897	0.795	0.661	0.564
zoo-Train	0.759	0.799	0.893	0.842	0.741	0.582

Dataset	MM	BI+MM	MF	BI+MF	GAIN	BI+GAIN	SOFT	BI+SOFT	PPCA	BI+PPCA	DAE	BI+DAE
analcatdata_reviewer-FS	0.603	0.603	0.603	0.603	0.603	0.603	0.608	0.608	0.603	0.603	0.603	0.603
analcatdata_reviewer-NOFS	0.635	0.656	0.609	0.653	0.606	0.636	0.624	0.653	0.632	0.645	0.635	0.656
analcatdata_reviewer-Overall	0.635	0.656	0.614	0.635	0.612	0.607	0.624	0.653	0.632	0.645	0.635	0.656
anneal-FS	0.905	0.988	0.88	0.99	0.928	0.991	0.971	0.987	0.981	0.976	0.966	0.99
anneal-NOFS	0.903	0.99	0.93	0.987	0.942	0.977	0.961	0.99	0.985	0.993	0.975	0.983
anneal-Overall	0.905	0.99	0.933	0.987	0.924	0.99	0.961	0.99	0.985	0.993	0.966	0.983
audiology-FS	0.931	0.931	0.926	0.926	0.926	0.909	0.926	0.926	0.926	0.926	0.926	0.926
audiology-NOFS	0.966	0.966	0.931	0.945	0.931	0.931	0.933	0.966	0.949	0.918	0.931	0.966
audiology-Overall	0.966	0.966	0.926	0.893	0.926	0.926	0.933	0.926	0.949	0.918	0.931	0.926
autoHorse-FS	0.976	0.976	0.968	0.968	0.968	0.968	0.968	0.968	0.889	0.984	0.968	0.968
autoHorse-NOFS	0.984	0.984	0.976	0.984	0.976	0.984	0.976	0.976	0.976	0.976	0.976	0.976
autoHorse-Overall	0.984	0.984	0.968	0.968	0.976	0.984	0.976	0.976	0.976	0.976	0.968	0.968
braziltourism-FS	0.893	0.893	0.885	0.885	0.893	0.891	0.885	0.885	0.895	0.895	0.893	0.893
braziltourism-NOFS	0.872	0.877	0.883	0.876	0.882	0.875	0.879	0.881	0.881	0.88	0.878	0.876
braziltourism-Overall	0.872	0.877	0.885	0.885	0.885	0.885	0.879	0.885	0.881	0.88	0.878	0.876
bridges-FS	0.778	0.778	0.8	0.8	0.837	0.809	0.829	0.829	0.809	0.816	0.818	0.783
bridges-NOFS	0.8	0.824	0.829	0.8	0.792	0.83	0.83	0.808	0.816	0.821	0.83	0.8
bridges-Overall	0.8	0.824	0.826	0.815	0.83	0.808	0.83	0.808	0.816	0.821	0.83	0.8
cjs-FS	1.0	1.0	1.0	1.0	0.999	0.997	1.0	1.0	0.931	0.949	0.988	1.0
cjs-NOFS	1.0	1.0	0.984	0.981	0.973	0.974	0.974	0.956	0.931	0.943	0.976	0.981
cjs-Overall	1.0	1.0	1.0	1.0	0.999	0.999	1.0	1.0	0.931	0.949	0.988	1.0
colic-FS	0.906	0.906	0.891	0.895	0.886	0.899	0.876	0.876	0.877	0.877	0.896	0.896
colic-NOFS	0.886	0.892	0.877	0.897	0.902	0.89	0.893	0.909	0.878	0.889	0.881	0.873
colic-Overall	0.906	0.906	0.898	0.902	0.902	0.898	0.876	0.909	0.877	0.889	0.896	0.896
colleges_aaup-FS	0.994	0.988	0.993	0.988	0.988	0.988	0.993	0.993	0.985	0.985	0.994	0.993
colleges_aaup-NOFS	0.987	0.987	0.985	0.987	0.985	0.983	0.987	0.985	0.989	0.989	0.989	0.983
colleges_aaup-Overall	0.994	0.988	0.993	0.988	0.988	0.988	0.993	0.993	0.989	0.989	0.994	0.993
cylinder-bands-FS	0.796	0.81	0.8	0.81	0.797	0.813	0.793	0.819	0.81	0.81	0.832	0.827
cylinder-bands-NOFS	0.816	0.805	0.827	0.826	0.814	0.808	0.806	0.802	0.799	0.825	0.844	0.852
cylinder-bands-Overall	0.816	0.805	0.829	0.832	0.81	0.8	0.806	0.802	0.799	0.825	0.844	0.852
dresses-sales-FS	0.592	0.592	0.592	0.592	0.595	0.592	0.592	0.592	0.592	0.592	0.592	0.592
dresses-sales-NOFS	0.597	0.601	0.592	0.598	0.594	0.592	0.605	0.601	0.593	0.596	0.602	0.602
dresses-sales-Overall	0.597	0.592	0.592	0.592	0.592	0.592	0.592	0.592	0.593	0.596	0.602	0.592
eucalyptus-FS	0.672	0.672	0.644	0.644	0.664	0.677	0.68	0.659	0.603	0.634	0.649	0.654
eucalyptus-NOFS	0.638	0.638	0.638	0.638	0.638	0.638	0.64	0.678	0.65	0.652	0.697	0.691
eucalyptus-Overall	0.672	0.672	0.638	0.638	0.664	0.676	0.68	0.659	0.65	0.652	0.697	0.691
hepatitis-FS	0.912	0.912	0.902	0.932	0.932	0.917	0.912	0.912	0.896	0.896	0.932	0.932
hepatitis-NOFS	0.91	0.91	0.916	0.928	0.91	0.904	0.919	0.924	0.932	0.925	0.912	0.913
hepatitis-Overall	0.91	0.91	0.931	0.912	0.913	0.917	0.919	0.924	0.896	0.896	0.912	0.913
hungarian-FS	0.81	0.81	0.79	0.789	0.803	0.752	0.78	0.78	0.783	0.783	0.77	0.77
hungarian-NOFS	0.826	0.824	0.804	0.804	0.814	0.826	0.817	0.807	0.8	0.81	0.81	0.814
hungarian-Overall	0.826	0.824	0.79	0.79	0.806	0.821	0.78	0.78	0.8	0.783	0.81	0.814
kdd_el_nino-small-FS	0.897	0.897	0.908	0.91	0.917	0.913	0.919	0.897	0.83	0.777	0.918	0.932
kdd_el_nino-small-NOFS	0.925	0.928	0.919	0.915	0.924	0.927	0.9	0.923	0.898	0.916	0.922	0.921
kdd_el_nino-small-Overall	0.925	0.928	0.911	0.918	0.919	0.918	0.9	0.923	0.898	0.916	0.922	0.921
mushroom-FS	1.0	1.0	0.999	1.0	1.0	1.0	0.999	0.999	0.996	0.997	1.0	1.0
mushroom-NOFS	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.999	0.998	1.0	1.0
mushroom-Overall	1.0	1.0	0.999	1.0	1.0	1.0	0.999	0.999	0.996	0.998	1.0	1.0
pbcseq-FS	0.776	0.776	0.788	0.796	0.77	0.779	0.763	0.772	0.779	0.779	0.786	0.767
pbcseq-NOFS	0.775	0.779	0.787	0.779	0.785	0.792	0.779	0.78	0.773	0.774	0.785	0.775
pbcseq-Overall	0.775	0.779	0.781	0.783	0.785	0.777	0.779	0.78	0.773	0.774	0.785	0.775
primary-tumor-FS	0.753	0.703	0.688	0.692	0.705	0.711	0.652	0.704	0.652	0.673	0.585	0.703
primary-tumor-NOFS	0.735	0.712	0.717	0.717	0.744	0.703	0.736	0.703	0.736	0.702	0.735	0.712
primary-tumor-Overall	0.735	0.712	0.714	0.738	0.744	0.699	0.736	0.703	0.736	0.702	0.735	0.712
profb-FS	0.548	0.548	0.548	0.548	0.548	0.548	0.548	0.548	0.552	0.552	0.548	0.548
profb-NOFS	0.573	0.573	0.562	0.576	0.562	0.571	0.573	0.573	0.514	0.51	0.572	0.573
profb-Overall	0.573	0.573	0.569	0.576	0.568	0.56	0.573	0.573	0.552	0.552	0.572	0.573
schizo-FS	0.653	0.653	0.653	0.653	0.645	0.65	0.648	0.648	0.658	0.658	0.648	0.648
schizo-NOFS	0.684	0.645	0.735	0.729	0.694	0.701	0.728	0.722	0.653	0.653	0.75	0.763
schizo-Overall	0.684	0.645	0.719	0.722	0.696	0.71	0.728	0.722	0.658	0.658	0.75	0.763
sick-FS	0.837	0.837	0.838	0.836	0.86	0.851	0.841	0.823	0.843	0.843	0.849	0.849
sick-NOFS	0.839	0.839	0.847	0.849	0.842	0.851	0.855	0.851	0.843	0.839	0.845	0.848
sick-Overall	0.839	0.839	0.837	0.854	0.842	0.849	0.855	0.851	0.843	0.839	0.845	0.848
soybean-FS	0.807	0.839	0.889	0.901	0.918	0.833	0.843	0.843	0.833	0.833	0.897	0.905
soybean-NOFS	0.913	0.898	0.884	0.867	0.894	0.882	0.875	0.897	0.911	0.871	0.882	0.899
soybean-Overall	0.807	0.898	0.86	0.889	0.887	0.884	0.875	0.897	0.911	0.871	0.882	0.899
stress-FS	0.792	0.792	0.844	0.844	0.844	0.844	0.844	0.844	0.85	0.85	0.844	0.844
stress-NOFS	0.75	0.75	0.769	0.756	0.756	0.75	0.756	0.75	0.878	0.837	0.75	0.75
stress-Overall	0.792	0.792	0.844	0.844	0.844	0.844	0.844	0.844	0.85	0.85	0.75	0.844
vote-FS	0.948	0.948	0.948	0.948	0.948	0.949	0.948	0.948	0.949	0.948	0.929	0.948
vote-NOFS	0.943	0.954	0.948	0.949	0.949	0.953	0.959	0.955	0.959	0.954	0.948	0.959
vote-Overall	0.943	0.954	0.943	0.955	0.955	0.965	0.948	0.948	0.949	0.948	0.948	0.959
water-treatment-FS	0.8	0.987	0.975	0.975	0.821	0.987	0.732	0.732	0.263	0.263	0.961	0.94
water-treatment-NOFS	0.987	0.987	0.987	0.975	0.987	0.987	0.785	0.95	0.571	0.5	0.951	0.918
water-treatment-Overall	0.987	0.987	0.975	0.975	0.987	0.987	0.785	0.95	0.571	0.5	0.951	0.918

Dataset	MM	BI+MM	MF	BI+MF	GAIN	BI+GAIN	SOFT	BI+SOFT	PPCA	BI+PPCA	DAE	BI+DAE
analcatdata_reviewer-FS	0.589	0.589	0.568	0.611	0.6	0.6	0.6	0.6	0.568	0.568	0.589	0.589
analcatdata_reviewer-NOFS	0.632	0.647	0.621	0.637	0.626	0.6	0.6	0.647	0.595	0.637	0.632	0.647
analcatdata_reviewer-Overall	0.632	0.647	0.621	0.642	0.6	0.637	0.6	0.647	0.595	0.637	0.632	0.647
anneal-FS	0.849	0.982	0.795	0.984	0.884	0.987	0.955	0.98	0.971	0.964	0.947	0.984
anneal-NOFS	0.844	0.984	0.895	0.98	0.906	0.964	0.94	0.984	0.978	0.989	0.962	0.973
anneal-Overall	0.849	0.984	0.895	0.98	0.88	0.984	0.94	0.984	0.978	0.989	0.947	0.973
audiology-FS	0.965	0.965	0.965	0.965	0.965	0.956	0.965	0.965	0.965	0.965	0.965	0.965
audiology-NOFS	0.982	0.982	0.965	0.973	0.965	0.965	0.965	0.982	0.973	0.956	0.965	0.982
audiology-Overall	0.982	0.982	0.965	0.947	0.965	0.965	0.965	0.965	0.973	0.956	0.965	0.965
autoHorse-FS	0.971	0.971	0.961	0.961	0.961	0.961	0.961	0.961	0.874	0.981	0.961	0.961
autoHorse-NOFS	0.981	0.981	0.971	0.981	0.971	0.981	0.971	0.971	0.971	0.971	0.971	0.971
autoHorse-Overall	0.981	0.981	0.961	0.961	0.971	0.981	0.971	0.971	0.971	0.971	0.961	0.961
braziltourism-FS	0.816	0.816	0.806	0.806	0.816	0.811	0.806	0.806	0.82	0.82	0.816	0.816
braziltourism-NOFS	0.777	0.791	0.801	0.791	0.801	0.791	0.796	0.796	0.796	0.791	0.796	0.786
braziltourism-Overall	0.777	0.791	0.806	0.806	0.806	0.806	0.796	0.806	0.796	0.791	0.796	0.786
bridges-FS	0.796	0.796	0.833	0.815	0.87	0.833	0.87	0.87	0.833	0.833	0.852	0.815
bridges-NOFS	0.833	0.833	0.87	0.833	0.815	0.852	0.833	0.815	0.833	0.87	0.833	0.833
bridges-Overall	0.833	0.833	0.852	0.815	0.833	0.833	0.833	0.815	0.833	0.87	0.833	0.833
cjs-FS	1.0	1.0	1.0	1.0	0.999	0.999	1.0	1.0	0.967	0.976	0.994	1.0
cjs-NOFS	1.0	1.0	0.992	0.991	0.987	0.988	0.987	0.979	0.967	0.973	0.989	0.991
cjs-Overall	1.0	1.0	1.0	1.0	0.999	0.999	1.0	1.0	0.967	0.976	0.994	1.0
colic-FS	0.875	0.875	0.853	0.859	0.848	0.864	0.837	0.837	0.837	0.837	0.864	0.864
colic-NOFS	0.848	0.853	0.837	0.859	0.87	0.853	0.859	0.88	0.837	0.853	0.837	0.837
colic-Overall	0.875	0.875	0.864	0.87	0.87	0.864	0.837	0.88	0.837	0.853	0.864	0.864
colleges_aaup-FS	0.991	0.983	0.99	0.983	0.983	0.983	0.99	0.99	0.979	0.979	0.991	0.99
colleges_aaup-NOFS	0.981	0.981	0.979	0.981	0.979	0.976	0.981	0.979	0.985	0.985	0.985	0.976
colleges_aaup-Overall	0.991	0.983	0.99	0.983	0.983	0.983	0.99	0.99	0.985	0.985	0.991	0.99
cylinder-bands-FS	0.752	0.741	0.748	0.741	0.756	0.741	0.756	0.756	0.73	0.73	0.785	0.778
cylinder-bands-NOFS	0.781	0.77	0.793	0.796	0.781	0.774	0.767	0.763	0.759	0.781	0.807	0.807
cylinder-bands-Overall	0.781	0.77	0.804	0.8	0.774	0.767	0.767	0.763	0.759	0.781	0.807	0.807
dresses-sales-FS	0.632	0.632	0.632	0.632	0.632	0.632	0.632	0.632	0.58	0.596	0.632	0.632
dresses-sales-NOFS	0.644	0.648	0.632	0.624	0.628	0.608	0.644	0.648	0.624	0.612	0.644	0.648
dresses-sales-Overall	0.644	0.632	0.632	0.632	0.632	0.632	0.632	0.632	0.624	0.612	0.644	0.632
eucalyptus-FS	0.791	0.791	0.774	0.774	0.772	0.783	0.783	0.769	0.712	0.761	0.75	0.774
eucalyptus-NOFS	0.769	0.769	0.769	0.769	0.769	0.769	0.758	0.785	0.791	0.791	0.785	0.788
eucalyptus-Overall	0.791	0.791	0.769	0.769	0.777	0.78	0.783	0.769	0.791	0.791	0.785	0.788
hepatitis-FS	0.846	0.846	0.833	0.885	0.885	0.859	0.846	0.846	0.821	0.821	0.885	0.885
hepatitis-NOFS	0.846	0.846	0.859	0.885	0.846	0.833	0.859	0.872	0.885	0.872	0.846	0.859
hepatitis-Overall	0.846	0.846	0.885	0.859	0.859	0.859	0.859	0.872	0.821	0.821	0.846	0.859
hungarian-FS	0.844	0.844	0.83	0.837	0.837	0.823	0.85	0.85	0.844	0.844	0.83	0.83
hungarian-NOFS	0.857	0.857	0.857	0.857	0.857	0.864	0.857	0.864	0.844	0.85	0.85	0.857
hungarian-Overall	0.857	0.857	0.837	0.837	0.857	0.857	0.85	0.85	0.844	0.844	0.85	0.857
kdd_el_nino-small-FS	0.928	0.928	0.934	0.939	0.944	0.936	0.944	0.931	0.88	0.849	0.944	0.954
kdd_el_nino-small-NOFS	0.946	0.949	0.944	0.939	0.946	0.949	0.931	0.946	0.926	0.941	0.944	0.944
kdd_el_nino-small-Overall	0.946	0.949	0.936	0.941	0.944	0.941	0.931	0.946	0.926	0.941	0.944	0.944
mushroom-FS	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.996	0.998	1.0	1.0
mushroom-NOFS	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.998	1.0	1.0
mushroom-Overall	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.996	0.998	1.0	1.0
pbcseq-FS	0.773	0.773	0.779	0.783	0.764	0.78	0.758	0.757	0.782	0.782	0.772	0.769
pbcseq-NOFS	0.776	0.772	0.777	0.776	0.782	0.777	0.776	0.776	0.766	0.764	0.782	0.775
pbcseq-Overall	0.776	0.772	0.781	0.779	0.781	0.781	0.776	0.776	0.766	0.764	0.782	0.775
primary-tumor-FS	0.865	0.871	0.835	0.841	0.853	0.871	0.841	0.876	0.841	0.859	0.812	0.871
primary-tumor-NOFS	0.859	0.876	0.847	0.853	0.871	0.841	0.865	0.871	0.865	0.853	0.859	0.876
primary-tumor-Overall	0.859	0.876	0.847	0.841	0.871	0.859	0.865	0.871	0.865	0.853	0.859	0.876
profb-FS	0.673	0.67	0.67	0.673	0.673	0.673	0.673	0.673	0.676	0.676	0.673	0.673
profb-NOFS	0.705	0.705	0.708	0.702	0.711	0.711	0.705	0.705	0.667	0.67	0.705	0.705
profb-Overall	0.705	0.705	0.705	0.708	0.705	0.702	0.705	0.705	0.676	0.676	0.705	0.705
schizo-FS	0.535	0.535	0.594	0.594	0.576	0.547	0.588	0.588	0.553	0.559	0.659	0.659
schizo-NOFS	0.718	0.706	0.741	0.776	0.735	0.759	0.724	0.741	0.606	0.594	0.776	0.788
schizo-Overall	0.718	0.706	0.753	0.753	0.753	0.765	0.724	0.741	0.553	0.559	0.776	0.788
sick-FS	0.979	0.979	0.981	0.981	0.983	0.981	0.979	0.977	0.98	0.98	0.981	0.981
sick-NOFS	0.979	0.979	0.982	0.981	0.98	0.98	0.981	0.981	0.98	0.98	0.98	0.981
sick-Overall	0.979	0.979	0.98	0.98	0.98	0.98	0.981	0.981	0.98	0.98	0.98	0.981
soybean-FS	0.944	0.956	0.971	0.974	0.98	0.956	0.962	0.962	0.959	0.959	0.974	0.974
soybean-NOFS	0.977	0.971	0.968	0.965	0.971	0.968	0.965	0.974	0.977	0.968	0.968	0.974
soybean-Overall	0.944	0.971	0.962	0.971	0.968	0.971	0.965	0.974	0.977	0.968	0.968	0.974
stress-FS	0.91	0.91	0.93	0.93	0.93	0.93	0.93	0.93	0.94	0.94	0.93	0.93
stress-NOFS	0.9	0.9	0.91	0.9	0.9	0.9	0.89	0.9	0.95	0.93	0.9	0.9
stress-Overall	0.91	0.91	0.93	0.93	0.93	0.93	0.93	0.93	0.94	0.94	0.9	0.93
vote-FS	0.959	0.959	0.959	0.959	0.959	0.959	0.959	0.959	0.959	0.959	0.945	0.959
vote-NOFS	0.954	0.963	0.959	0.959	0.959	0.963	0.968	0.963	0.968	0.963	0.959	0.968
vote-Overall	0.954	0.963	0.954	0.963	0.963	0.972	0.959	0.959	0.959	0.959	0.959	0.968
water-treatment-FS	0.943	0.996	0.992	0.992	0.947	0.996	0.92	0.92	0.848	0.848	0.989	0.981
water-treatment-NOFS	0.996	0.996	0.996	0.992	0.996	0.996	0.936	0.985	0.879	0.864	0.985	0.973
water-treatment-Overall	0.996	0.996	0.992	0.992	0.996	0.996	0.936	0.985	0.879	0.864	0.985	0.973

Dataset	MM	BI+MM	MF	BI+MF	GAIN	BI+GAIN	SOFT	BI+SOFT	PPCA	BI+PPCA	DAE	BI+DAE
Australian-FS	0.902	0.905	0.922	0.914	0.913	0.919	0.885	0.915	0.89	0.897	0.906	0.906
Australian-NOFS	0.916	0.911	0.917	0.915	0.911	0.904	0.886	0.896	0.892	0.896	0.915	0.91
Australian-Overall	0.902	0.911	0.922	0.914	0.909	0.908	0.885	0.896	0.892	0.896	0.906	0.906
boston-FS	0.933	0.933	0.926	0.926	0.924	0.924	0.912	0.917	0.89	0.89	0.926	0.926
boston-NOFS	0.931	0.928	0.94	0.935	0.927	0.931	0.933	0.908	0.913	0.921	0.928	0.926
boston-Overall	0.933	0.933	0.938	0.941	0.922	0.921	0.912	0.908	0.913	0.921	0.926	0.926
churn-FS	0.914	0.914	0.92	0.916	0.907	0.916	0.887	0.883	0.884	0.885	0.906	0.906
churn-NOFS	0.909	0.912	0.914	0.916	0.91	0.916	0.904	0.91	0.886	0.886	0.906	0.909
churn-Overall	0.909	0.912	0.917	0.916	0.908	0.919	0.904	0.91	0.884	0.886	0.906	0.909
compas-two-years-FS	0.704	0.704	0.698	0.692	0.699	0.705	0.693	0.693	0.688	0.688	0.697	0.697
compas-two-years-NOFS	0.702	0.698	0.703	0.701	0.701	0.698	0.702	0.702	0.703	0.699	0.692	0.692
compas-two-years-Overall	0.702	0.704	0.703	0.696	0.7	0.699	0.693	0.702	0.688	0.688	0.692	0.697
image-FS	0.87	0.845	—	—	0.883	0.875	0.85	0.85	0.845	0.849	0.877	0.878
image-NOFS	0.885	0.884	—	—	0.881	0.878	0.877	0.877	0.863	0.859	0.892	0.887
image-Overall	0.885	0.884	—	—	0.882	0.883	0.877	0.877	0.863	0.859	0.892	0.887
page-blocks-FS	0.99	0.989	0.99	0.989	0.988	0.99	0.983	0.985	0.969	0.969	0.988	0.988
page-blocks-NOFS	0.988	0.988	0.99	0.99	0.989	0.988	0.983	0.982	0.973	0.974	0.987	0.987
page-blocks-Overall	0.988	0.989	0.99	0.99	0.987	0.989	0.983	0.982	0.973	0.974	0.988	0.987
parkinsons-FS	0.85	0.846	0.871	0.871	0.849	0.849	0.85	0.845	0.848	0.861	0.868	0.866
parkinsons-NOFS	0.896	0.896	0.935	0.923	0.896	0.895	0.898	0.892	0.899	0.907	0.919	0.914
parkinsons-Overall	0.896	0.896	0.932	0.925	0.907	0.894	0.898	0.892	0.899	0.907	0.919	0.914
segment-FS	0.999	0.999	1.0	1.0	0.999	0.999	0.996	0.997	0.978	0.977	0.999	0.999
segment-NOFS	0.999	1.0	1.0	1.0	0.999	1.0	0.998	0.999	0.985	0.986	1.0	0.999
segment-Overall	0.999	1.0	1.0	1.0	1.0	1.0	0.998	0.999	0.985	0.986	1.0	0.999
stock-FS	0.99	0.99	0.993	0.993	0.984	0.987	0.975	0.975	0.954	0.954	0.99	0.99
stock-NOFS	0.989	0.99	0.992	0.993	0.99	0.989	0.979	0.98	0.969	0.97	0.991	0.991
stock-Overall	0.99	0.99	0.993	0.993	0.986	0.986	0.979	0.98	0.969	0.97	0.99	0.99
zoo-FS	0.993	0.993	0.895	0.895	0.929	0.929	0.986	0.986	0.898	0.895	0.995	0.995
zoo-NOFS	0.979	0.979	0.992	0.998	1.0	1.0	0.973	0.989	0.897	0.994	0.903	1.0
zoo-Overall	0.979	0.993	0.994	1.0	0.929	0.989	0.986	0.989	0.897	0.994	0.903	1.0

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.02	0.0	0.046	0.039	0.0	0.013
Australian-Train	0.0	0.004	0.032	0.027	0.0	0.01
boston-Test	0.079	0.0	0.403	0.213	0.0	0.192
boston-Train	0.089	0.047	0.385	0.176	0.0	0.186
churn-Test	0.059	0.0	0.25	0.077	0.0	0.02
churn-Train	0.057	0.191	0.243	0.075	0.0	0.019
compas-two-years-Test	0.002	0.0	0.083	0.052	0.0	0.042
compas-two-years-Train	0.008	0.059	0.075	0.051	0.0	0.04
image-Test	0.009	0.0	—	0.455	0.0	0.574
image-Train	0.011	0.24	—	0.393	0.0	0.573
page-blocks-Test	0.086	0.0	0.411	0.171	0.0	0.154
page-blocks-Train	0.174	0.113	0.605	0.154	0.0	0.21
parkinsons-Test	0.0	0.0	0.566	0.305	0.0	0.323
parkinsons-Train	0.003	0.228	0.595	0.167	0.0	0.407
segment-Test	0.189	0.0	0.621	0.22	0.053	0.301
segment-Train	0.187	0.188	0.631	0.21	0.053	0.309
stock-Test	0.194	0.0	0.714	0.29	0.0	0.38
stock-Train	0.164	0.224	0.751	0.289	0.0	0.401
zoo-Test	0.0	0.089	0.0	0.0	0.0	0.0
zoo-Train	0.0	0.0	0.0	0.002	0.0	0.0

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.004	0.0	0.131	0.064	0.0	0.0
Australian-Train	0.072	0.0	0.246	0.142	0.0	0.0
boston-Test	0.051	0.0	0.554	0.445	0.0	0.171
boston-Train	0.035	0.199	0.717	0.382	0.0	0.278
churn-Test	0.226	0.0	0.493	0.324	0.0	0.344
churn-Train	0.226	0.415	0.47	0.321	0.0	0.336
compas-two-years-Test	0.0	0.0	0.219	0.193	0.0	0.0
compas-two-years-Train	0.0	0.142	0.247	0.207	0.0	0.0
image-Test	0.312	0.0	—	0.751	0.0	0.816
image-Train	0.03	0.943	—	0.706	0.0	0.816
page-blocks-Test	0.031	0.0	0.698	0.133	0.0	0.212
page-blocks-Train	0.084	0.286	0.684	0.134	0.0	0.22
parkinsons-Test	0.158	0.005	0.726	0.632	0.0	0.536
parkinsons-Train	0.065	0.633	0.724	0.491	0.0	0.564
segment-Test	0.281	0.0	0.638	0.476	0.053	0.418
segment-Train	0.268	0.437	0.722	0.447	0.053	0.395
stock-Test	0.476	0.0	0.925	0.668	0.0	0.479
stock-Train	0.452	0.573	0.927	0.608	0.0	0.481
zoo-Test	0.0	0.476	0.929	0.381	0.0	0.0
zoo-Train	0.0	0.0	0.48	0.681	0.0	0.0

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.61	0.625	0.786	0.703	0.65	0.54
Australian-Train	0.612	0.695	0.76	0.622	0.594	0.518
boston-Test	1.0	0.941	0.941	1.0	1.0	0.471
boston-Train	0.941	0.529	0.941	0.941	0.941	0.353
churn-Test	0.741	0.641	0.791	0.756	0.719	0.695
churn-Train	0.755	0.778	0.809	0.758	0.727	0.731
compas-two-years-Test	0.854	0.608	0.955	0.73	0.704	0.808
compas-two-years-Train	0.815	0.914	0.941	0.724	0.707	0.798
zoo-Test	0.789	0.568	0.888	0.801	0.722	0.471
zoo-Train	0.823	0.805	0.929	0.823	0.72	0.477

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.007	0.0	0.123	0.053	0.0	0.01
Australian-Train	0.007	0.0	0.142	0.076	0.0	0.002
boston-Test	0.041	0.0	0.641	0.288	0.0	0.152
boston-Train	0.01	0.057	0.54	0.301	0.0	0.168
churn-Test	0.118	0.0	0.405	0.19	0.0	0.289
churn-Train	0.115	0.38	0.411	0.19	0.0	0.296
compas-two-years-Test	0.0	0.0	0.169	0.121	0.0	0.0
compas-two-years-Train	0.0	0.164	0.157	0.102	0.0	0.0
image-Test	0.0	0.0	—	0.621	0.0	0.732
image-Train	0.0	0.867	—	0.54	0.0	0.73
page-blocks-Test	0.026	0.0	0.628	0.174	0.0	0.159
page-blocks-Train	0.013	0.11	0.678	0.162	0.0	0.16
parkinsons-Test	0.016	0.0	0.66	0.536	0.0	0.501
parkinsons-Train	0.005	0.513	0.671	0.301	0.0	0.482
segment-Test	0.0	0.0	0.686	0.346	0.053	0.309
segment-Train	0.0	0.298	0.675	0.344	0.053	0.313
stock-Test	0.028	0.0	0.809	0.346	0.0	0.366
stock-Train	0.021	0.326	0.841	0.346	0.0	0.388
zoo-Test	0.0	0.0	0.358	0.564	0.0	0.003
zoo-Train	0.0	0.0	0.261	0.123	0.0	0.0

Dataset	GAIN	SOFT	MF	DAE	MM	PPCA
Australian-Test	0.0	0.0	0.029	0.027	0.0	0.003
Australian-Train	0.0	0.015	0.034	0.036	0.0	0.001
boston-Test	0.0	0.0	0.468	0.093	0.0	0.104
boston-Train	0.0	0.021	0.484	0.096	0.0	0.162
churn-Test	0.019	0.0	0.26	0.064	0.0	0.051
churn-Train	0.017	0.14	0.258	0.064	0.0	0.048
compas-two-years-Test	0.0	0.0	0.115	0.072	0.0	0.027
compas-two-years-Train	0.0	0.029	0.104	0.067	0.0	0.032
image-Test	0.0	0.0	—	0.364	0.0	0.515
image-Train	0.0	0.176	—	0.292	0.0	0.514
page-blocks-Test	0.0	0.0	0.437	0.076	0.0	0.116
page-blocks-Train	0.0	0.103	0.449	0.079	0.0	0.11
parkinsons-Test	0.018	0.0	0.6	0.228	0.0	0.304
parkinsons-Train	0.017	0.19	0.608	0.186	0.0	0.342
segment-Test	0.0	0.0	0.59	0.144	0.053	0.098
segment-Train	0.0	0.069	0.593	0.135	0.053	0.097
stock-Test	0.0	0.0	0.751	0.159	0.0	0.118
stock-Train	0.0	0.113	0.725	0.136	0.0	0.098
zoo-Test	0.0	0.032	0.191	0.22	0.0	0.102
zoo-Train	0.0	0.0	0.127	0.0	0.0	0.0

Do We Really Need Imputation in AutoML Predictive Modeling?

ACM Transactions on Knowledge Discovery from Data

Abstract

1 INTRODUCTION

2 BACKGROUND AND CONTEXT

2.1 Missingness Mechanisms

2.2 An Imputation Family Taxonomy

2.3 Description of the Selected Imputation Methods

2.3.1 Mean/Mode.

2.3.2 MissForest.

2.3.3 Probabilistic PCA.

2.3.4 SoftImpute.

2.3.5 Denoise Autoencoder.

2.3.6 Generative Adversarial Imputation Nets.

2.3.7 Binary Indicators.

2.4 Rationale of the Selection of Algorithms

2.5 How Is Imputation Treated in AutoML Platforms

3 EXPERIMENTAL SETUP

3.1 Datasets

3.2 Evaluation Task and Metric

3.3 AutoML Environment

3.4 Imputation Algorithms Implementations

3.5 Machine Specifications

3.6 Computational Resources Employed

3.7 Availability of Code

3.8 Exploring the Hyper-parameter Space of Imputation Algorithms

4 SIMULATING MISSING DATA

4.1 Simulating Missing Completely at Random Data

4.2 Simulating Missing at Random Data

5 COMPARATIVE EVALUATION ON REAL-WORLD DATASETS WITH MISSING VALUES

5.1 Binary Indicators Improve the Predictive Performance

5.2 BI+DAE Is the Best Imputation Method in Real-world Data

5.3 BI+MM Is the Best Method When Considering the Efficiency–Effectiveness Tradeoff

5.4 Best Imputation Subset for Maximizing AUC Performance Is {BI+MM, BI+DAE}

5.5 The Interplay between Feature Selection and Imputation

6 COMPARATIVE EVALUATION ON DATASETS WITH SIMULATED MISSING VALUES

6.1 BI+MF Is the Best Imputation Method in MCAR and MAR Simulated Missing Data

6.2 The Best Imputation Subset for Maximizing AUC Performance Is {BI+MM, BI+MF}

6.3 BI+MM Provides the Best Tradeoff between Effectiveness and Efficiency

7 META-LEVEL ANALYSIS OF REAL-WORLD RESULTS

8 RELATED WORK

8.1 Synopsis of Contributions Relative to the Related Work

9 LESSONS LEARNED AND CONTRIBUTIONS

10 CONCLUSIONS

APPENDICES

A DATASETS APPENDIX

A.1 Real-World Datasets with Native Missing Values

A.2 Complete Datasets for Missing Data Simulation

B MISSING VALUE SIMULATION SETUP APPENDIX

B.1 Datasets Selected to Determine the Percentage of Missing Values per Feature.

B.2 Determining the Average Number of Features on Wich a Missing Feature Depends.

C EXPERIMENTAL RESULTS APPENDIX

C.1 Real-world Results

C.1.1 BI Improve Performance across All Metrics..

C.1.2 BI+DAE Is the Best Method Followed by BI+MM.

C.1.3 BI+DAE Is the Best Method across All BI Extended Methods..

C.1.4 BI+MM and BI+DAE Score 99% of the Maximum across All Metrics..

C.1.5 BI+MM Exhibits the Best Tradeoff between Effectiveness and Efficiency..

C.1.6 Feature Selection–enforced Pipelines Degrade the Performance..

C.2 Simulation Results

C.2.1 A Decline in Predictive Performance Is Caused by Increasing Missingness..

C.2.2 BI+MF Is the Best Method for MCAR Data.

C.2.3 BI+MF Is the Best Method for MAR Data.

C.2.4 BI+MM Exhibits the Best Efficiency vs. Effectiveness Tradeoff for MCAR Missing data.

C.2.5 BI+MM Exhibits the Best Efficiency vs. Effectiveness Tradeoff for MAR Missing Data.

C.2.6 BI+MM and BI+MF Score over 99% of Maximum Performance for MCAR Data..

C.2.7 BI+MM and BI+MF Score over 99% of Maximum Performance for MAR Data..

C.3 Evaluation of Imputation Accuracy

C.3.1 MF Has the Highest Imputation Accuracy for MAR Data..

C.3.2 MF Has the Highest Imputation Accuracy for MCAR Data..

C.4 Real World: Downstream Task Results

C.5 MCAR: Downstream Task Results

C.6 MAR: Downstream Task Results

C.7 Imputation Accuracy Results

C.7.1 MCAR.

C.7.2 MAR.

Footnotes

Supplemental Material

Available for Download

REFERENCES