Mann-Whitney U Test


Menu location: Analysis_Nonparametric_Mann-Whitney.


This is a method for the comparison of two independent random samples (x and y):

The Mann Whitney U statistic is defined as:

- where samples of size n1 and n2 are pooled and Ri are the ranks.


U can be resolved as the number of times observations in one sample precede observations in the other sample in the ranking.


Wilcoxon rank sum, Kendall's S and the Mann-Whitney U test are exactly equivalent tests. In the presence of ties the Mann-Whitney test is also equivalent to a chi-square test for trend.


In most circumstances a two sided test is required; here the alternative hypothesis is that x values tend to be distributed differently to y values. For a lower side test the alternative hypothesis is that x values tend to be smaller than y values. For an upper side test the alternative hypothesis is that x values tend to be larger than y values.


Assumptions of the Mann-Whitney test:


A confidence interval for the difference between two measures of location is provided with the sample medians. The assumptions of this method are slightly different from the assumptions of the Mann-Whitney test:


The theta statistic [U/(n1*n2)] is provided as an additional effect size reflecting the distance between the two underlying frequency distributions from which the samples are drawn (Newcombe, 2006a). It is equivalent to the area under the receiver operating characteristic (ROC) curve.


Technical Validation

StatsDirect uses the sampling distribution of U to give exact probabilities. These calculations may take an appreciable time to complete when many data are tied.


Confidence intervals are constructed for the difference between the means or medians (any measure of location in fact). The level of confidence used will be as close as is theoretically possible to the one you specify. StatsDirect approaches the selected confidence level from the conservative side via the Hodges-Lehman estimator (Monahan, 1984).


When samples are large (either sample > 80 or both samples >30) a normal approximation is used for the hypothesis test and for the confidence interval. Note that StatsDirect uses more accurate P value calculations than some other statistical software, therefore, you may notice a difference in results (Conover, 1999; Dineen and Blakesley, 1973; Harding, 1983; Neumann, 1988).


A confidence interval for theta is constructed using Newcombe's fifth method (Newcombe, 2006b). Note that the confidence interval for theta is directly comparable with the P value for the Mann-Whitney test whereas the confidence interval for the difference between the two measures of location (medians) is not.



From Conover (1999, p. 218).

Test workbook (Nonparametric worksheet: Farm Boys, Town Boys).


The following data represent fitness scores from two groups of boys of the same age, those from homes in the town and those from farm homes.


Farm Boys Town Boys
14.8 12.7
7.3 14.2
5.6 12.6
6.3 2.1
9.0 17.7
4.2 11.8
10.6 16.9
12.5 7.9
12.9 16.0
16.1 10.6
11.4 5.6
2.7 5.6


To analyse these data in StatsDirect you must first enter them in two separate workbook columns. Alternatively, open the test workbook using the file open function of the file menu. Then select the Mann-Whitney from the Nonparametric section of the analysis menu. Select the columns marked "Farm Boys" and "Town Boys" when prompted for data.


For this example:


Mann-Whitney U test


Observations (x) in Farm Boys = 12 median = 9.8 rank sum = 321

Observations (y) in Town Boys = 36 median = 7.75


U = 243 U' = 189

Theta = 0.437500 (95% CI: 0.271772 to 0.621223)


Exact probability (adjusted for ties):

Lower side P = 0.2645 (H1: x tends to be less than y)

Upper side P = 0.7355 (H1: x tends to be greater than y)

Two sided P = 0.529 (H1: x tends to be distributed differently to y)


95.1% confidence interval for difference between medians or means:

K = 134 median difference = 0.8

CI = -2.3 to 4.4


Here we have assumed that these groups are independent and that they represent at least hypothetical random samples of the sub-populations they represent. In this analysis, we are clearly unable to reject the null hypothesis that one group does NOT tend to yield different fitness scores to the other. This lack of statistical evidence of a difference is reflected in the confidence interval for the difference between population means, in that the interval spans zero. Note that the quoted 95.1% confidence interval is as close as you can get to 95% because of the very nature of the mathematics involved in nonparametric methods like this.


P values

confidence intervals