ABSTRACT

The original motivation for multiple imputation is to handle survey nonresponse problems for organizations, which can release completed data for public use and ensure the validity of results for a wide variety of analyses. We first summarize some arguments for this strategy from existing literature. We then provide a brief overview of the design-based inferential framework for probability survey data. Some basic ideas include the sampling design and weights, weighted estimation, variance, and design effects. For imputing data from large, complex surveys, we provide a few recommendations from the methodological perspective: (1) incorporating the survey sampling design (e.g, survey weights, sampling strata and clusters, as well as other design-related information such as geographical characteristics); (2) assuming missing at random; (3) using the inclusive imputation strategy; and (4) using the fully conditional specification approach. This chapter lists multiple real-life examples of successfully imputing large surveys. These examples include the NHIS income variable imputation, NHANES missing data imputation, NAMCS race variable imputation, cancer patient survey missing data imputation, and imputation for missing variables in the linked files between survey data and administrative databases. From the data production's perspective, we discuss several issues related to imputed-data editing, processing, documentation, and release.