Nonparametric testing for agreement among several judges

Marozzi, Marco

The problem of whether the rankings of some objects given by a set of judges show any agreement or are more or less independent is addressed. The most familiar measure for concordance is the Kendall W coefficient. Classical tests for concordance are the Friedman and tests. Legendre (2005) showed via simulation that the Friedman test is too conservative and less powerful than its permutation version but his study was very limited. In this paper, the study of Legendre is deeply extended. It is shown that the Friedman test is too conservative and less powerful than both the F test and the permutation test for concordance which always have a correct size and very similar power. The F test should be preferred because it is computationally much easier.