Simulation Study when Data are MAR Clause Samples
Simulation Study when Data are MAR. In this section, we provide three simulation studies to demonstrate our privacy-preserving methods in comparison with the standard methods of addressing missing data under univariate and general missing data patterns. As in Section 3.2, we consider a linear regression model (3.1) as the “analysis model”. The simulation results over 1000 Monte Carlo(MC) data sets. Each simulation study has a different way to generate the MC data sets. In the first study, we explore approaches for addressing continuous variable with miss- ing values under univariate missing data patterns, i.e. only X1 has missing values. Data X1, ..., Xp and Y are generated for n = 200 and n = 1000 individuals. We consider a setting with p = 2 in this study. For each individual, X2 is first generated from a uniform distribution U (−1, 1). Given X2, variable X1 is sampled from a normal dis- X1 tribution with variance σ2 = 1 and mean µX1 = X2. Outcome Y is generated from Y = θ0 + θ1X1 + θ2X2 + s, where s ∼ N (0, 1) and all θj = 1 (j = 0, 1, 2). Variable X1 is missing with probability {1 + exp(1.6 − Y − X2)}−1, resulting in approximately 42% of individuals having missing values. In order to illustrate that our privacy-preserving methods can be applied to binary variable with missing values, we make a little change to the previous data-generating mechanism. Given X2, instead of sampling X1 from a normal distribution, we generate X1 from a Bernoulli distribution with probability {1 + exp(−1 − X2)}−1. The rest procedures are the same.
