Proposed Two-Groups Models Clause Samples

Proposed Two-Groups Models. Many approaches have been proposed for locfdr estimation by fitting the two-groups model. These include fully parametric, nonparametric, Bayesian and empirical Bayes, and semi-parametric approaches. With any of these approaches, fitting the model requires knowledge of p0, f0(z), and of either f1(z) or f (z). The marginal distribution of all zj’s, f (z), is estimated using the data for all the proteins in the experiment. The sub-density f0(z) is typically estimated using only the central part of the distribution of zj’s in the neighborhood of the zero point. The rationale being that this central part consists mainly of null proteins. In microarray gene expression analyses; ▇▇▇▇▇▇▇ et al., (2002)[3] estimates f (z) by fitting a mixture of beta distributions, when the two-group model is specified using p-values. ▇▇▇▇▇ (2002)[37] estimates f (z) by maximum likelihood estimates of high-order polynomials and natural spline basis with 7 degrees of freedom. In ▇▇▇▇▇▇▇ et al., (2002)[3], the distribution of the null genes, f0(z), is simply the beta(1,1) component of the mixture of beta distributions, since under the null hypothesis for a well defined test statistic; the p-values follow a uniform distribution on [0,1], which is equivalent to a beta(1,1). ▇▇▇▇▇ (2002)[37] estimates f0(z) using an empirical null distribution using both central matching and maximum likelihood estimates, where the estimation is done using a fixed-sized window around the peak corresponding to the 0 point of the empirical distribution of the zj’s. Pan et al., (2003)[94] used a normal mixture model with the number of mixture components estimated by a likelihood ratio test based procedure for the estimation of both f (z) and f0(z). We propose to investigate the performance of several distributional choices for both f0(z) and f (z), including normal, skew-normal (sN) and skew-t (sT), and finite mixtures of them. Finite mixtures of distributions have found wide recognition in modeling heterogeneous data and as approximations to complicated probability den- sities, presenting multimodality, skewness and heavy tails. Comprehensive surveys of the application of mixture models are available in B¨▇▇▇▇▇▇ (2000) [21], McLachlan and Peel (2000) [86], and from a Bayesian perspective, in Fru¨hwirth-▇▇▇▇▇▇▇▇▇ (2006) [47]. Furthermore, we propose to investigate the utility of the Generalized Hyperbolic (GH) distribution, as a means of dealing with the excess kurtosis that is sometimes observed in the cen...