Abstract
This paper applies graph based causal inference procedures for recovering information from missing data. We establish conditions that permit and prohibit recoverability. In the event of theoretical impediments to recoverability, we develop graph based procedures using auxiliary variables and external data to overcome such impediments. We demonstrate the perils of model-blind recovery procedures both in determining whether or not a query is recoverable and in choosing an estimation procedure when recoverability holds.
Notes
- 1.
The presence of a non-recoverable factor in a summand does not always imply the non-recoverability of the summand. See Example-3 in [18].
References
Allison, P.D.: Missing Data Series: Quantitative Applications in the Social Sciences (2002)
Collins, L.M., Schafer, J.L., Kam, C.-M.: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol. Methods 6(4), 330 (2001)
Daniel, R.M., Kenward, M.G., Cousens, S.N., De Stavola, B.L.: Using causal diagrams to guide analysis in missing data problems. Stat. Methods Med. Res. 21(3), 243–256 (2012)
Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge University Press, New York (2009)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B. (Methodol.) 39(1), 1–38 (1977)
Enders, C.K.: Applied Missing Data Analysis. Guilford Publications, New York (2010)
Garcia, F.M.: Definition and diagnosis of problematic attrition in randomized controlled experiments. Working paper, April 2013. http://ssrn.com/abstract=2267120
Graham, J.W.: Missing Data: Analysis and Design. Statistics for Social and Behavioral Sciences. Springer, New York (2012)
Heitjan, D.F., Rubin, D.B.: Ignorability and coarse data. Ann. Stat. 19(4), 2244–2253 (1991)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. Cambridge University Press, New York (2009)
Lauritzen, S.L.: The EM algorithm for graphical association models with missing data. Comput. Stat. Data Anal. 19(2), 191–201 (1995)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)
Marlin, B.M., Zemel, R.S.: Collaborative prediction and ranking with non-random missing data. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 5–12. ACM (2009)
Marlin, B.M., Zemel, R.S., Roweis, S., Slaney, M.: Collaborative filtering and the missing at random assumption. In: UAI (2007)
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Recommender systems: missing data and statistical model estimation. In: IJCAI (2011)
Mohan, K., Pearl, J.: On the testability of models with missing data. In: Proceedings of AISTAT (2014)
Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. Adv. Neural Inf. Process. Syst. 26, 1277–1285 (2013)
Mohan, K., Pearl J.: Graphical models for recovering probabilistic and causal queries from missing data. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 1520–1528 (2014)
Pearl, J.: Causality: Models, Reasoning and Inference. Cambridge University Press, New York (2009)
Pearl, J., Mohan, K.: Recoverability and testability of missing data: Introduction and summary of results. Technical report R-417, UCLA (2013). http://ftp.cs.ucla.edu/pub/stat_ser/r417.pdf
Robins, J.M., Rotnitzky, A.: Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N.P., Dietz, K., Farewell, V.T. (eds.) AIDS Epidemiology, pp. 297–331. Springer, New York (1992)
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89(427), 846–866 (1994)
Rothman, K.J., Greenland, S., Lash, T.L.: Modern Epidemiology. Lippincott Williams & Wilkins, Philadelphia (2008)
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Shadish, W.R.: Revisiting field experimentation: field notes for the future. Psychol. Methods 7(1), 3 (2002)
Shpitser, I., Mohan, K., Pearl, J.: Missing data as a causal and probabilistic problem. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (2015)
Thoemmes, F., Mohan, K.: Graphical representation of missing data problems. Struct. Equ. Model. Multi. J. 37(1), 1–13 (2015)
Thoemmes, F., Rose, N.: Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal. Technical report R-002, Cornell University (2013)
Thoemmes, F., Mohan, K.: Graphical representation of missing data problems. Struct. Equ. Model. Multi. J. 22(4), 1–13 (2015)
Twisk, J., de Vente, W.: Attrition in longitudinal studies: how to deal with missing data. J. clin. epidemiol. 55(4), 329–337 (2002)
Van Der Laan, M.J., Robins, J.M.: Locally efficient estimation with current status data and time-dependent covariates. J. Am. Stat. Assoc. 93(442), 693–701 (1998)
Van der Laan, M.J., Robins, J.M.: Unified Methods for Censored Longitudinal Data and Causality. Springer, New York (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mohan, K., Pearl, J. (2015). Missing Data from a Causal Perspective. In: Suzuki, J., Ueno, M. (eds) Advanced Methodologies for Bayesian Networks. AMBN 2015. Lecture Notes in Computer Science(), vol 9505. Springer, Cham. https://doi.org/10.1007/978-3-319-28379-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-28379-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28378-4
Online ISBN: 978-3-319-28379-1
eBook Packages: Computer ScienceComputer Science (R0)