Local False Discovery Rate Sample Clauses

Local False Discovery Rate. We calculate a false discovery rate for all the proteins in each of the three data sets, and identify the set of proteins that are deemed significant using two cut-off points. The first cut-off point chooses as significant all proteins with local fdr ≤ 0.1. The second cut-off is taken to be the value that corresponds to the maximum second derivative of the smoothed-monotonic local fdr curve. In these calculations we omitted the two groups model, truncated normal mixture - Student’s t mixture, since this particular combination of distributional components produced significantly worse results compared to the other combinations. Table 3.8 gives the number and proportion of proteins declared significant under each of the cut-offs.
Local False Discovery Rate. The idea of using the mixture distribution in (3.10) is also closely related to FDR. In particular to the local false discovery rate of Efron et al., (2001)[41], and ▇▇▇▇▇ and ▇▇▇▇▇▇▇▇▇▇ (2002)[40]. By definition, the local false discovery rate, locfdr, is the posterior probability of population f0(z), given the mixture model in (3.9), and is given by locfdrj(z∗) = Pr(protein j is null | zj = z ∗) = p0f0(z∗)/f (z∗) = f +(z∗)/f (z∗) (3.11) = f +(z∗)/(f +(z∗) + p1f1(z∗)) (3.12) where the sub-density f +(.) corresponds to the distribution of the null proteins. Why a local false discovery rate ? In proteomics data analyses, the above Bayesian definition of locfdr has several advantages over the frequentist FDR. Firstly, it can be implemented at the test statistic value level, when a p-value computation is either cumbersome or not feasible. Secondly, since it only depends on the marginal distribution of the z values, independence of the zj’s is not required. Assumptions about the distribution of the z values under H1 are also not required. In essence, the FDR gives an estimate of the number of false positive hypotheses that a practitioner can expect if the experiment is done an infinite number of times, and as such is a less reliable estimate of the number of false discovery hypotheses in any given experiment. The q-value approach of Storey and ▇▇▇▇▇▇▇▇▇▇ (2003)[113] is an improvement in this sense since it assigns to each protein its own measure of significance. However, the q-value is not a true estimate of the probability for an individual protein, say protein A, to be a false positive since it is computed using all the proteins that are more significant than protein A. Clearly a protein whose p-value is near to a chosen cutoff, for example 0.05, does not have the same probability to be differentially expressed as a protein whose p-value is close to zero. This ‘averaging’ behavior of the q-value tends to yield inflated probabilities for a protein to be a false positive. The local false discovery rate on the other hand gives an estimate of the false discovery rate attached to each protein. The estimated local false discovery rate for a given protein provides a measure of belief in the jth protein’s significance that depends only on the value of zj, and not on its inclusion in a larger set of possible values, Z ≤ zj. Therefore the locfdr is much preferable in situations where the primary interest is in identifying proteins that show some evidence of differenti...