Abstract
Missing data are an important practical problem in many applications of statistics, including social and behavioral sciences. The older and simple strategy is to choose ad-hoc methods (e.g. available case, complete case) which introduces bias in estimation methods and also changes the data features like variability, symmetry and so on.
... read more
A better strategy is to use principled methods such as Multiple Imputation (MI) or Maximum Likelihood. MI refers to a procedure in which each missing datum is imputed (filled in) with more than one value. This allows for uncertainty about which value to impute. MI is generally accepted, and can be used with virtually any kind of data. Moreover, software is available to perform the analyses. The most complex step in MI is to specify a model from which imputations are drawn. Building an imputation model is standard when the missingness mechanism is at random, in the sense that the probability to be missing depends on the observed data only. However, it can be easily misspecified, especially with multivariate missing data. Thus, it is essential to develop techniques to protect the imputation model from misspecification. This enhances the quality of imputations and the statistical inferences performed on the data. The first part of this dissertation provides new imputation methods for handling multivariate missing data with a general pattern of missingness. We develop a new methodology that incorporates the concept of double robustness property within the MCMC-based algorithms for imputation of missing values. When the missingness mechanism is not at random (MNAR), the incomplete variables are a part of the nonresponse model. This makes the imputation task even more difficult because the imputation model depends on unobserved data. One general strategy is to incorporate a model for the missingness into the imputation model. An important issue then is the necessity of making extra assumptions about the process that leads to missingness. Unfortunately, these assumptions can not be fully verified from the observed data, and thus conducting sensitivity analysis is advisable. We propose a novel imputation method, called the Random Indicator (RI) method, for the data that are MNAR. The RI method makes assumptions about the missingness mechanism and the distribution of the observed part of the incomplete variable. That is, it assumes a logistic transformation for the mechanism that creates missing data, and a normal distribution for the observed part of the incomplete variable. The RI method, in comparison to the existing methodology, is minimal in the assumptions it makes, and creates automatic imputations without user’s intervention for the data that are MNAR
show less