Assessing agreement on classi cation tasks: the kappa statisticKappa Statistic • November 7th, 2007
Contract Type FiledNovember 7th, 2007Currently, computational linguists and cognitive scientists working in the area of discourse and dialogue argue that their subjective judgments are reliable using several di erent statistics, none of which are easily interpretable or comparable to each other. Meanwhile, researchers in content analysis have already experienced the same di culties and come up with a solution in the kappa statistic. We discuss what is wrong with reliabil- ity measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better o as a eld adopting techniques from content analysis.