Simulation Study. A simulation study is performed to compare the proposed adjusted degree of distinguisha- bility with the classical one. It is also aimed to develop a table to interpret the adjusted degree of distinguishability. × To generate 2 2 contingency tables, we used the method presented by Xxxxxx and Xxxx [8]. Bivariate standard normal distribution is used. At the first step, two identically independently distributed random variables (X1 and X2) are generated. Equations (4.1) and (4.2) is used to generate two random variables (X and Y ) from bivariate normal distribution with certain correlation (ρ).
Simulation Study. A simulation study is performed to compare the proposed adjusted degree of distinguisha- bility with the classical one. It is also aimed to develop a table to interpret the adjusted degree of distinguishability. To generate 2 2 contingency tables, we used the method presented by Xxxxxx and Xxxx [8]. Bivariate standard normal distribution is used. At the first step, two identically independently distributed random variables (X1 and X2) are generated. Equations (4.1) and (4.2) is used to generate two random variables (X and Y ) from bivariate normal distribution with certain correlation (ρ).
(4.1) X = aX1 + bX2
(4.2) Y = bX1 + aX2 where a = √1 + ρ + √1 − ρ , b = √1 + ρ − √1 − ρ . Then, X and Y variables are categorized into two equal intervals and crossed to have
Simulation Study. By means of a simulation study, we first evaluated the type I error rate of the score statistic Sˆ 1 , Xxxxxxx’x chi square χ2, the likelihood ratio with equal m · · · weights LR, and the Terwilliger’s likelihood ratio with weights equal to pj’s TLR. For the score statistic we used the chi square distribution with one degree of freedom to approximate the distribution under the null hypothesis. For the LR and TLR statistics we used the 50:50 mixture of two chi squares with zero and one degree of freedom. We generated 10,000 samples of 200 case chromo- somes and 200 control chromosomes from the multinomial distributions with probabilities p1 pm for m equal to 4, 5, 8, 10, 15 and 20 haplotypes. Similar to the simulation described by Xxxxxxxxxxx, xxx frequency of the most common haplotype, p1, was set to 0.5, whereas the remaining haplotypes were equally frequent (0.5/(m — 1)). The results are shown in left columns of table 6.1. For all m, the type I error rates of the score statistic Sˆ 1 were maintained at the nominal error rate. For m < 10, the type I error rates of Xxxxxxx’x chi ≈
Simulation Study. For each of the simulation scenarios, 5000 simulations were performed in R. Datasets from the first two simulation studies were simulated to resemble actual motivating data described in Section 3.2, with sample size N = 672. Independent predictor variables were generated to mimic age (years), smoking status (yes/no), race (1 = white / 2 = black), and SA status (yes/no), and the outcome variable was generated to resemble the cytokine MCP1 (µg/mL) based on a lognormal regression against those predictors. Age was simulated as a normal random variable with mean 26.6 and standard deviation 6.4, then rounded to the nearest whole number (this permits the formation of x-homogeneous pools when average pool size is small). Smoking status, race, and SA status were simulated as Bernoulli random variables with probabilities 0.47, 0.28, and 0.46, respectively. The outcome, MCP1, was generated under a lognormal distribution such that E[log(MCP1)|X] = —2.48+0.017(Age)+ 0.007(Smoking Status) — 0.388(Race) + 0.132(SA) and V ar[log(MCP1)|X] = 1.19. In the first study, we assess each of the proposed analytical strategies when applied to x-homogeneous pools (n = 336) mimicking data from the CPP substudy, and in the next study we compare estimate precision from the various pooling strategies applied to the same generated datasets, comparing k-means clustering to random pooling and selection when x-homogeneous pools cannot be formed (n = 112). The last two simulation studies were developed to assess performance of the analytical methods in additional scenarios. First, we generate a dataset such that application of all proposed methods (excluding the Naive Model) is feasible and theoretically justified. Specifically, pools were formed x-homogeneously on the covariates (to justify analysis under the Approximate Model) with a maximum pool size of 2 (to enable application of the Convolution Method). In the first two simulation studies, the nature of the simulated data precluded formation of pools with both of these characteristics. The final simulation demonstrates a scenario in which the Approximate Model fails and the Convolution Method falters, to caution against analysis via the former when pools are not x-homogeneous, and via the latter (even for pools of maximum size 2) when the convolution integral may be poorly behaved.
Simulation Study. I conducted four sets of real-data based simulation studies to demonstrate advantages of IPBT over existing methods. 1) In the first simulation study, I used 566 normal solid tissue microarray datasets obtained by Affymetrix GeneChip U133A from the global gene expression map to show a general trend between mean value and SD for genes in microarray. All the following simulation are generated with the parameters obtained from these 566 normal samples. We also show different SD estimates from different methods versus their truth to illustrate the over-shrinkage phenomenon and how IPBT can avoid the over-shrinkage. 2) In the second set of simulation, I show the false discovery rates (FDR) and receiver operating characteristic (ROC) curves for IPBT and competing methods. I also show the consistency of IPBT and other existing methods on independent datasets 3) In the last simulation, I show that IPBT can be robust even if the historical data has some noise.
Simulation Study. I conducted two sets of data based simulation studies to illustrate the consistency of gene panels and its applications to DE gene detection.
Simulation Study. I conducted two sets of real-data based simulation studies to demonstrate the advantages of our new approaches. 1) In the first set of simulations, I use 566 normal solid tissue microarray datasets obtained by Affymetrix GeneChip U133A from the global gene expression map to show the correlation between SD estimates and the true SDs. All the following simulation are generated with the parameters obtained from these 566 normal samples. We also show that GDM is a good indicator for group dividing by simulation. 2) In the second set of simulation, I show the false discovery rates (FDR) and Receiver operating characteristic (ROC) curves for our new approaches are almost as good as IPBT and outperform all other competing methods. I also show that our new approaches could be more robust than IPBT when historical data does not have high quality.
Simulation Study. In order to study the performance of the cχ2 and normal distributions as approximations of the null distribution of the score statistic, we performed a simulation study. For sake of simplicity we used the data structure of our example of 33 families (see below). We generated 100,000 data sets of inde- pendently binomially distributed outcomes and 100,000 data sets of indepen- dently normally distributed outcomes. The score statistics were calculated using correlation structure (2.1) based on the coefficients of relationship. We also studied the performance of the distributions in a very small set of nine families. In table 2.1, the actual p-values corresponding to a nominal p-value of 0.05, 0.01, 0.001 and 0.0001 are given. The results were in favour of the cχ2 distri- bution for both binomially and normally distributed outcomes. Even for the set of nine families, the cχ2 distribution performed very well.
Simulation Study. In the simulation, we set the numbers of subjects, raters and time points as I = 100, J = 30 and Ti = 5 for any i = 1, . . . , I. We will first demonstrate our approach in Section 3.1 based on one simulated dataset for each setup. We will also present the averages of parameter estimates based on 1000 Monte Carlo replicates. In Section 3.2, we will compare our approach with approaches that do not account for the rater’s effect. The GLMM fitting is implemented by PROC GLIMMIX in SAS 9.4[35].
Simulation Study. In this Section we study the gain obtained in the prediction of direct estimators of the FGT poverty measures (for α = 0), in the sense of mean squared error of small area predictors, when considering three different approaches in the computation of matrix W in (4.5). The first one is the classical approach, where W is a typicality matrix, whereas the second and third approaches consist in implementing the techniques described in Sections 5.2 and 5.3, respectively. We start by describing the data to be used in the model described in Section 4.2. They consist of official data from the Spanish Survey of Income and Living Conditions corresponding to year 2006 for D = 51 Spanish provinces (the small areas). The response variable is the direct estimator of the FGT poverty measure (for α = 0), that is the proportion of poor in the area. The auxiliary covariates are the intercept and the following proportions (in the area) of Spanish people, people of ages from 16 to 24, from 25 to 49, from 50 to 64, equal or greater than 65, people with no studies up to primary studies, Graduate people, employees, unemployed people, inactive people. We have selected from the Instituto Nacional xx Xxxxx´ıstica website (xxxx://xxx.xxx.xx), the more relevant socioeconomic variables related with poverty, being the unemployment rate and share of illiterate population over 16 years old. These variables have been measured in the D = 51 provinces from × 1991 to 2005 (J = 15 years). Therefore, in practice we have two matrices, X1 and X2 of size 51 15. × In order to compute matrix W with the multivariate approach of Section 5.2, we only consider the information contained in J-th columns of X1 and X2, which leads to a matrix of size 51 2. We call WM the proximity matrix computed with the methodology described in Section 5.2. To compute matrix W using the functional approach of Section 5.3, we have obtained two semi- metrics D(2) and D(2), one for each data set (see Ferraty and Xxxx (2006)), for q = 4 functional principal 1,q 2,q components, since q = 4 is enough to collect the most part of the observed variability. Finally, in order to obtain a square matrix of joint distances from the previous two, we have used the related metric scaling technique, introduced by Xxxxxxx and Fortiana (1998), which provides a joint metric from different metrics on the same individuals, taking into account the possible redundant information that can be added simply by adding distance matrices. We call WF the...