Comparing Analytical Strategies Clause Samples

Comparing Analytical Strategies. The goal of this first simulation study is to assess each of the discussed analytical strategies for x-homogeneous pools applied to data resembling the CPP substudy, where an analytical method is deemed appropriate if it provides accurate estimates of the regression coefficients as well as their standard errors. Pools were formed based on the x-homogeneous clustering strategy described in Section 3.8.1, where pool sizes ranged from 1 to 6. Analytical strategies under consideration included standard least squares regression on log-transformed pooled outcomes (Naive Model), WLS on the log-transformed pools with inverted pool size as a predictor variable (Approximate Model), and the likelihood-based MCEM strategy under lognormal regression (MCEM Model). We also provide regression results from the full data as well as a random sample of size n = 336 for comparison purposes. Since many of the pools in this simulation consisted of more than 2 specimens, direct optimization of the likelihood under the Convolution approach was not viable for this first simulation. Table 3.2 displays the mean bias and empirical standard deviation (SD) of the regression coefficient estimates. The ratio of mean estimated standard error to empirical standard deviation (SˆE/SD) is also provided, where a value of 1 is ideal. 95% confidence interval (CI) coverage is based on the estimated standard errors and a t-reference distribution with n — 5 degrees of freedom. Based on these simulation results, the Naive Model provides biased estimates, which can result in severe CI undercoverage. This characteristic is particularly noticeable for βˆ3, which has only 81% CI coverage. The remaining methods provide approximately unbiased estimates of the regression coefficients (Mean Bias ≈ 0) as well as their estimated standard errors (SˆE/SD ≈ 1) and close to 95% CI coverage. Thus, both the Approximate as well as the MCEM Models provide valid results when pools are x-homogeneous. Although the main purpose of this simulation is to test the validity of the proposed analytical methods, it is also worth noting that estimates from these x-homogeneous pools analyzed under the Approximate and MCEM Models are noticeably more precise than those from a random sample, and are only slightly less efficient than estimates from the full dataset. The MCEM method appears to provide marginally more precise estimates