Rank and Order

When doing statistical analysis, it doesn’t take long until you start to question the assumptions of the tests you’re doing and looking for alternatives for special cases that just don’t seem too fit the typical approaches.  For example, the t-test is the workhorse of hypothesis testing and is used in one-sample and two-sample situations to test for a difference from an assumed mean (usually zero) or that the difference between means of two samples is the same or different.  A primary assumption of the t-test, however, is that the measurements of your sample or samples are normally distributed.  It’s also not unusual to have a measurement that is qualitative but can be ordered in a least to most manner while at the same time is not categorical.  Consider looking at pest damage to crops where plants could be ordered from least to worst level of damage, but there is no numerical measure of the damage.

While the t-test is generally robust to deviations from normality for large sample sizes, smaller sample sizes can still be problematic, and sample size doesn’t provide any assistance to the crop damage example.  For cases like these, we can turn to Nonparametric Statistics, which provides a collection of statistical tests that have milder assumptions about the data.  In particular, no assumption of normality.  These distribution-free methods are typically based upon the ranking or ordering of the samples, and for most normal-theory methods, there is a nonparametric alternative.

Sticking with the t-test, suppose we have 2 treatments for our crops that are supposed to control pests.  Unfortunately, we don’t have a good way to quantify the degree of damage, but we can look at the crops and easily put them in order of least to most damaged.  Here’s a peek at our data:

m = 7 crops treated with A
n = 9 crops treated with B

Order  Treatment        Order  Treatment
1             B                           9             A
2             B                        10             B
3             B                        11             B
4             B                        12             A
5             A                        13             A
6             B                        14             A
7             B                        15             A
8             B                        16             A

For this example we’ll use the Wilcoxon Rank-Sum Test which follows these steps:

  1. Combine all the observations and rank them from smallest to largest.  (We’ve got that in our table, above.)
  2. Find the sum of the ranks for one of the treatments to give the W statistic.
    Using treatment A, we get W = 5 + 9 + 12 + 13 + 14 + 15 + 16 = 84.
    We could alternatively use treatment B to get W = 52 – it doesn’t matter which we choose.
  3. Since the sample size is small, we can get an exact p-value by:
    finding all possible permutations of the data in which m ranks are assigned to treatment A and ranks are assigned to treatment B; compute W for treatment A; and then determine what fraction of the total number of permutations are above/below our original W value for treatment A.  If we have a large sample size for which calculating all possible permutations is not feasible, we can just sample a large number (>1000 is usually enough) and determine what fraction exceeded our original W.

R provides wilcox.test for just this purpose:

> wilcox.test(x=c(5,9,12,13,14,15,16),
+ y=c(1,2,3,4,6,7,8,10,11),
+ exact = TRUE)

Wilcoxon rank sum test

data:  c(5, 9, 12, 13, 14, 15, 16) and c(1, 2, 3, 4, 6, 7, 8, 10, 11)
W = 56, p-value = 0.007867
alternative hypothesis: true location shift is not equal to 0

(Note that the exact=TRUE parameter tells R to do the full permutation test.)

“And that’s it!  You are done!”  (That’s what my instructor would regularly say after describing a new topic.  He was the most enthusiastic instructor I’ve ever had.)

And there is so much more!

The Kruskal-Wallis test is a nonparametric alternative to one-way ANOVA.

The Jonkheere-Terpstra test is similar to Kruskal-Wallis but for ordered alternatives.  For example, increasing doses of pain relievers would be expected to have a trend in their effect.

The Wilcoxon Signed-Rank test is a paired comparison test, like a paired t-test.

The Friedman test is useful for a randomized complete block design where we might typically use a repeated measures ANOVA.

And the list goes on and on.  A couple of good references are Introduction to Modern Nonparametric Statistics by James Higgins and Nonparametric Statistical Methods by Myles Hollander, Douglas Wolfe, and Eric Chicken.

Of course, there’s no free lunch.  With conversion of continuous values to ranks, there is some information loss.  As a result, for data that actually is normally distributed, nonparametric tests tend to have lower power than the parametric alternative.  In other words, you may need more samples to detect an effect with nonparametric methods.  None the less,  these methods fill in the gap when strong assumptions of normality cannot be satisfied.

Leave a comment