Simulation Study. A simulation study is performed to compare the proposed adjusted degree of distinguisha- bility with the classical one. It is also aimed to develop a table to interpret the adjusted degree of distinguishability. To generate 2 2 contingency tables, we used the method presented by Xxxxxx and Xxxx [8]. Bivariate standard normal distribution is used. At the first step, two identically independently distributed random variables (X1 and X2) are generated. Equations (4.1) and (4.2) is used to generate two random variables (X and Y ) from bivariate normal distribution with certain correlation (ρ).
(4.1) X = aX1 + bX2
(4.2) Y = bX1 + aX2 where a = √1 + ρ + √1 − ρ , b = √1 + ρ − √1 − ρ . Then, X and Y variables are categorized into two equal intervals and crossed to have
Simulation Study. By means of a simulation study, we first evaluated the type I error rate of the score statistic Sˆ 1 , Xxxxxxx’x chi square χ2, the likelihood ratio with equal weights LR, and the Terwilliger’s likelihood ratio with weights equal to pj’s TLR. For the score statistic we used the chi square distribution with one degree of freedom to approximate the distribution under the null hypothesis. For the LR and TLR statistics we used the 50:50 mixture of two chi squares with zero and one degree of freedom. We generated 10,000 samples of 200 case chromo- somes and 200 control chromosomes from the multinomial distributions with probabilities p1 pm for m equal to 4, 5, 8, 10, 15 and 20 haplotypes. Similar to the simulation described by Xxxxxxxxxxx, xxx frequency of the most common haplotype, p1, was set to 0.5, whereas the remaining haplotypes were equally frequent (0.5/(m — 1)). The results are shown in left columns of table 6.1. For all m, the type I error rates of the score statistic Sˆ 1 were maintained at the nominal error rate. For m < 10, the type I error rates of Xxxxxxx’x chi
Simulation Study. For each of the simulation scenarios, 5000 simulations were performed in R. Datasets from the first two simulation studies were simulated to resemble actual motivating data described in Section 3.2, with sample size N = 672. Independent predictor variables were generated to mimic age (years), smoking status (yes/no), race (1 = white / 2 = black), and SA status (yes/no), and the outcome variable was generated to resemble the cytokine MCP1 (µg/mL) based on a lognormal regression against those predictors. Age was simulated as a normal random variable with mean 26.6 and standard deviation 6.4, then rounded to the nearest whole number (this permits the formation of x-homogeneous pools when average pool size is small). Smoking status, race, and SA status were simulated as Bernoulli random variables with probabilities 0.47, 0.28, and 0.46, respectively. The outcome, MCP1, was generated under a lognormal distribution such that E[log(MCP1)|X] = —2.48+0.017(Age)+ 0.007(Smoking Status) — 0.388(Race) + 0.132(SA) and V ar[log(MCP1)|X] = 1.
Simulation Study. I conducted four sets of real-data based simulation studies to demonstrate advantages of IPBT over existing methods.
Simulation Study. I conducted two sets of data based simulation studies to illustrate the consistency of gene panels and its applications to DE gene detection.
Simulation Study. I conducted two sets of real-data based simulation studies to demonstrate the advantages of our new approaches.
1) In the first set of simulations, I use 566 normal solid tissue microarray datasets obtained by Affymetrix GeneChip U133A from the global gene expression map to show the correlation between SD estimates and the true SDs. All the following simulation are generated with the parameters obtained from these 566 normal samples. We also show that GDM is a good indicator for group dividing by simulation. 2) In the second set of simulation, I show the false discovery rates (FDR) and Receiver operating characteristic (ROC) curves for our new approaches are almost as good as IPBT and outperform all other competing methods. I also show that our new approaches could be more robust than IPBT when historical data does not have high quality.
Simulation Study. In order to study the performance of the cχ2 and normal distributions as approximations of the null distribution of the score statistic, we performed a simulation study. For sake of simplicity we used the data structure of our example of 33 families (see below). We generated 100,000 data sets of inde- pendently binomially distributed outcomes and 100,000 data sets of indepen- dently normally distributed outcomes. The score statistics were calculated using correlation structure (2.1) based on the coefficients of relationship. We also studied the performance of the distributions in a very small set of nine families. In table 2.1, the actual p-values corresponding to a nominal p-value of 0.05, 0.01, 0.001 and 0.0001 are given. The results were in favour of the cχ2 distri- bution for both binomially and normally distributed outcomes. Even for the set of nine families, the cχ2 distribution performed very well.
TABLE 2.1: Type I error rate when using cχ2 distribution and normal distribution as approximation for the distribution of Q under the null hypothesis. The estimates are based on 100,000 simulations. 33 families 9 families nominal cχ2 normal cχ2 normal Binomial (DM2) 0.05 0.0547 0.0606 0.0550 0.0649 0.01 0.0143 0.0194 0.0137 0.0239 0.001 0.0020 0.0041 0.0017 0.0070 0.0001 0.0004 0.0011 0.0002 0.0019 Normal (BMI) 0.05 0.0538* 0.0615* 0.0566 0.0651 0.01 0.0125* 0.0196* 0.0151 0.0233 0.001 0.0016* 0.0047* 0.0027 0.0069 0.0001 0.0002* 0.0011* 0.0004 0.0023 To illustrate the score statistic, we used data from 79 patients with type 2 diabetes mellitus (DM2), their first-degree relatives and spouses (Xxxxxxxxx et al., 2003). These families were derived from the GRIP population (Ge- netic Research in Isolated Populations), an isolated village in the Southwest of the Netherlands. The GRIP population is described in detail elsewhere (Xxxxxxxxx et al., 2003; Xxxxxxx et al., 2002; xxx Xxxxx et al., 2001). Probands are patients with DM2 treated by physicians participating in GRIP. Among the relatives are patients not related to ascertainment namely patients of other physicians and subjects who did not know that they have DM2. In a combined linkage and association study, a genome scan was carried out on these data and Xxxxxxxxx et al. (2003) found a borderline association between marker D3S3681 and DM2 (LOD score of 1.20, P=0.01). For DM2 we analysed 33 families informative for linkage. One of these families was a combination of two nuclear families. Three families had ...
Simulation Study. In the simulation, we set the numbers of subjects, raters and time points as I = 100, J = 30 and Ti = 5 for any i = 1, . . . , I. We will first demonstrate our approach in Section 3.1 based on one simulated dataset for each setup. We will also present the averages of parameter estimates based on 1000 Monte Carlo replicates. In Section 3.2, we will compare our approach with approaches that do not account for the rater’s effect. The GLMM fitting is implemented by PROC GLIMMIX in SAS 9.4[35].
Simulation Study. In this Section we study the gain obtained in the prediction of direct estimators of the FGT poverty measures (for α = 0), in the sense of mean squared error of small area predictors, when considering three different approaches in the computation of matrix W in (4.5). The first one is the classical approach, where W is a typicality matrix, whereas the second and third approaches consist in implementing the techniques described in Sections 5.2 and 5.3, respectively. We start by describing the data to be used in the model described in Section 4.2. They consist of official data from the Spanish Survey of Income and Living Conditions corresponding to year 2006 for D = 51 Spanish provinces (the small areas). The response variable is the direct estimator of the FGT poverty measure (for α = 0), that is the proportion of poor in the area. The auxiliary covariates are the intercept and the following proportions (in the area) of Spanish people, people of ages from 16 to 24, from 25 to 49, from 50 to 64, equal or greater than 65, people with no studies up to primary studies, Graduate people, employees, unemployed people, inactive people. We have selected from the Instituto Nacional xx Xxxxx´ıstica website (xxxx://xxx.xxx.xx), the more relevant socioeconomic variables related with poverty, being the unemployment rate and share of illiterate population over 16 years old. These variables have been measured in the D = 51 provinces from × 1991 to 2005 (J = 15 years). Therefore, in practice we have two matrices, X1 and X2 of size 51 15. × In order to compute matrix W with the multivariate approach of Section 5.2, we only consider the information contained in J-th columns of X1 and X2, which leads to a matrix of size 51 2. We call WM the proximity matrix computed with the methodology described in Section 5.2. To compute matrix W using the functional approach of Section 5.3, we have obtained two semi- metrics D(2) and D(2), one for each data set (see Ferraty and Xxxx (2006)), for q = 4 functional principal 1,q 2,q components, since q = 4 is enough to collect the most part of the observed variability. Finally, in order to obtain a square matrix of joint distances from the previous two, we have used the related metric scaling technique, introduced by Xxxxxxx and Fortiana (1998), which provides a joint metric from different metrics on the same individuals, taking into account the possible redundant information that can be added simply by adding distance matrices. We call WF the...
Simulation Study. In this section we describe some simulation experiments carried out with the following purposes: (a) to check whether taking into account the spatial correlation between small areas in the model improves the precision of small area estimators; (b) to study the small-sample behavior of the different MSE estimators introduced in this chapter, for different values of the spatial correlation parameter ρ and for different patterns of sampling variances ψd; (c) to analyze the robustness of the proposed bootstrap procedures to non-normality of the random effects and errors. × ∈ { } The experiments are based on a real population, the map of the D = 287 municipalities (small areas) of Tuscany. We considered a model with p = 2, that is, one explanatory variable and a constant, with an D 2 design matrix X = [1D x], where 1D is a column vector of ones of size D and x = (x1, . . . , xD)′ contains the values of the explanatory variable. These values xd were generated from a uniform distri- bution in the interval (0, 1). The true model coefficients were β = (1, 2)′, the random effects variance σ2 = 1 and the spatial correlation parameter ρ 0.25, 0.5, 0.75 . The matrix of sampling variances ψ = diag(ψ1, . . . , ψD) was taken as ψd = 0.7 for 1 d 60; ψd = 0.6 for 61 d 120; ψd = 0.5 for ≤ ≤ ≤ ≤ ≤ ≤ 121 d 180; ψd = 0.4 for 181 d 240 and finally ψd = 0.3 for 241 d 287 (see Xxxxx et al. ×