Restrictive Imputation of Incomplete Survey Data

Vink, G.

Restrictive Imputation of Incomplete Survey Data

DSpace/Manakin Repository

Restrictive Imputation of Incomplete Survey Data

Vink, G.

(2015) Utrecht University Repository

(Dissertation)

Supervisor(s): van Buuren, Stef; Pannekoek, Jeroen

Abstract

This dissertation focuses on finding plausible imputations when there is some restriction posed on the imputation model. In these restrictive situations, current imputation methodology does not lead to satisfactory imputations. The restrictions, and the resulting missing data problems are real-life situations that are frequently encountered across different domains of statistics, ... read more such as social statistics, social sciences, geology and medicinal sciences. More specifically, imputation strategies that yield plausible imputations are considered for the following restrictive problems. First, in social statistics highly skewed semicontinuous (or zero-inflated) data are frequently encountered. When imputing these data, the non-negative mixture of continuous values and the point mass (often at zero) need to be considered in such a way that imputations fall within the plausible range of values. Current imputation approaches use multi-step approaches that depend on data transformations to conform the incomplete data to the imputation model. A single-step imputation solution that does not require data transformations and leads to valid inference and plausible imputations is discussed. Second, in many domains in statistics, multilevel (or clustered) data are often encountered. With multilevel data, groups of respondents share common characteristics and can be clustered into classes. This class structure, often summarized in the intraclass correlation coefficient, needs to be taken into account when imputing such data. An imputation approach that provides a straightforward solution for obtaining plausible imputations while taking the multilevel structure of the data into account is discussed. Third, applied researchers frequently use squared terms in their analysis models. It is known that the imputation model should embrace all relations of scientific interest. When generating plausible imputations, the relation between the original variable and its squared counterpart needs to be preserved. After all, a squared value that has no relation to its square root, can never be deemed plausible. An imputation technique for obtaining plausible imputations when the imputation model contains squared terms is proposed. Fourth, in many domains in statistics, compositional data structures are encountered. Compositional data can be defined as a set of parts that obey a certain edit restriction, such that the parts have to sum up to a certain total. Imputing compositional data is challenging because imputations must obey the restrictions in the data while remaining strictly non-negative. An imputation approach that can handle intricately nested compositional data and provides plausible imputations that adhere to the compositional structure is proposed. Finally, when evaluating imputation approaches, simulations studies are often used. Data are usually sampled from some sort of theoretical distribution that serves as the population. If this is not possible, design-based simulation studies are performed, where data is usually sampled from some ‘true’ dataset of sufficient size. Both simulation approaches introduce sampling variance, which is not of specific interest when evaluating imputations. I demonstrate a simplification of the conventional pooling rules for multiple imputation in situations where sampling variance is not of interest. These pooling rules are also applicable in situations where the size of the population is restricted and essentially all units in the population have been observed.

Download/Full Text

Open Access version via Utrecht University Repository

Publisher version

Keywords: Multiple Imputation, Survey Data, Restrictions, Incomplete Data

ISBN: 978-90-393-6300-3

Publisher: Utrecht University

See more statistics about this item