Statistical Significance Tests Sample Contracts

Agreement Among Statistical Significance Tests for Information Retrieval Evaluation at Varying Sample Sizes
Statistical Significance Tests • May 1st, 2009

Research has shown that little practical difference exists be- tween the randomization, Student’s paired t, and bootstrap tests of statistical significance for TREC ad-hoc retrieval ex- periments with 50 topics. We compared these three tests on runs with topic sizes down to 10 topics. We found that these tests show increasing disagreement as the number of topics decreases. At smaller numbers of topics, the randomization test tended to produce smaller p-values than the t-test for p-values less than 0.1. The bootstrap exhibited a system- atic bias towards p-values strictly less than the t-test with this bias increasing as the number of topics decreased. We recommend the use of the randomization test although the t-test appears to be suitable even when the number of topics is small.

AutoNDA by SimpleDocs
Draft better contracts in just 5 minutes Get the weekly Law Insider newsletter packed with expert videos, webinars, ebooks, and more!