10011. Testing for Differences Between Two Groups:
Nonparametric Tests

Reading: SPSS Base 8.0 User's Guide: Chapter 29, Nonparametric Tests
Homework:
Download: bank.sav        (Download Tips)

  1. Overview
  2. Independent Samples: The Data
  3. Independent Samples: Testing the Assumptions
  4. Independent Samples: Running the Mann-Whitney U
  5. Related Samples: The Data
  6. Related Samples: Testing the Assumptions
  7. Related Samples: Running the Wilcoxon Test for Paired Data
  8. References

1. Overview

This set of notes looks at nonparametric tests of differences between two groups. Nonparametric tests should be used when the dependent variable is ordinal or when the t test assumptions have not been met. The decision tree presented in the previous set of notes is reproduced here.

Table 1. The Decision Tree
Score
Dependency
Scale of
Measurement
Score
Distribution
Measure SPSS Statistics Path
Independent
Scores
Interval or
Ratio
Symmetric
Homogeneous
t Test Compare Means
- Independent Samples t test
-- Equal Variances Assumed
Symmetric
Nonhomogeneous
Welch's t Test Compare Means
- Independent Samples t test
-- Equal Variances Not Assumed
Skewed in Different Directions Mann-Whitney U
(Wilcoxon Rank Sum Test)

Nonparametric Tests
- 2 Independent Samples
-- Test Type: Mann-Whitney U
Ordinal (not an issue)
Related
Scores
Interval or
Ratio
Symmetric
Difference Scores
Paired Samples
t Test
Compare Means
- Paired-Samples t test
Nonsymmetric
Difference Scores
Wilcoxon Test
for Paired Data
Nonparametric Tests
- 2 Related Samples
-- Test Type: Wilcoxon
Ordinal (not an issue)
Notes on the score distribution assumptions:

   (a) Kurtosis is not viewed as being a major threat to the
       t test.  If the two populations are symmetric, and if the
       variances are equal, then the t test may be used.  

   (b) If the two populations are symmetric, and the variances
       are not equal, then use Welch's t test.

   (c) Skewness is not a problem if the skewness is in 
       the same direction.  If the variances are equal then
       use a t test.

   (d) If skewness is in the same direction and the variances
       are unequal, then if the sample sizes are equal use 
       Welch's t test.

   (e) In most instances in social science combined sample sizes
       of 40 or more would be considered "moderately large."

   (f) See Myers and Well (1991, p. 69) for additional discussion
       of these points.

top


2. Independent Samples: The Data

The data for this example is a survey of bank employees. The Bank data file, bank.sav, contains information about 474 employees hired by a Midwestern bank between 1969 and 1971. The bank was engaged in Equal Employment Opportunity (EEO) litigation. The datafile is supplied by SPSS, Inc. as part of the base package.

We want to test the hypothesis that starting salaries for males and females are equal. We will be using two of the variables in the bank.sav file, gender and starting salary. In the next section we will also look at the current salary and Those variables are defined in Table 2.

Table 2. Selected Variables in bank.sav
Variable Name Variable Label / Value Label
id Employee code
salbeg Beginning salary
sex Gender of employee /
     0 = Males
     1 = Females
salnow Current salary
jobcat Employment category
  1    Clerical
  2    Office trainee
  3    Security officer
  4    College trainee
  5    Exempt employee
  6    MBA trainee
  7    Technical

top


3. Independent Samples: Testing the Assumptions

Table 3. T Test Assumptions for Independent Groups
1. Are the observations in the two groups independent or related?
2. What is the scale of measurement for the dependent variable?
3. What are the shapes of the distributions in the two groups?
4. Are the distributions in the two groups homogeneous?

Assumption 1 (independence). The observations in the two groups are independent because there are different participants in the two gender conditions.

Assumption 2 (scale of measurement). The scale of measurement for beginning salary and current salary is ratio.

Assumption 3 (normality). This assumption is that the distributions are normally distributed for both males and females. We can use the explore procedure to look at the stem-and-leaf plots, the skewness and kurtosis statistics, and the normality tests.

The stem-and-leaf plots indicate that both distributes are positively skewed, see Table 4. The t is reasonably robust if the distributions are about equally skewed and if the frequencies in each condition are about the same. This bank has many more male employees than female employees. The skewness and kurtosis statistics will give up additional information about the magnitude of the skewness in each distribution.

Table 4. Stem-and-Leaf Plots for Males and Females
Beginning salary Stem-and-Leaf Plot

SEX= Males
Frequency Stem & Leaf

  2.00  3 . &
  4.00  4 . 6&
 35.00  5 . 14444677778&
120.00  6 . 0000000000000000333333333333333466666699& 
 15.00  7 . 22588
 17.00  8 . 14477&
  5.00  9 . 34&
 13.00 10 . 2599&
  7.00 11 . 04&
 14.00 12 . 0799&
  3.00 13 . 2
 23.00 Extremes (>=13500)

Stem width: 1000
Each leaf: 3 case(s)
& denotes fractional leaves.

Beginning salary Stem-and-Leaf Plot

SEX= Females
Frequency Stem & Leaf

 9.00 3 . 6999
46.00 4 . 0000000000022333333344
58.00 4 . 5555555555555666888888888999
39.00 5 . 111111222444444444&
19.00 5 . 55577777&
12.00 6 . 00133
12.00 6 . 66699&
 8.00 7 . 2222
 4.00 7 . 55
 9.00 Extremes (>=7800)

Stem width: 1000
Each leaf: 2 case(s)
& denotes fractional leaves.

Descriptive statistics are shown in Table 5. Both distributions are positively skewed. The males salaries range from $3,600 to $31,992 per year.The skewness statistic for the males, 2.390, is about about 15.7 standard error units greater than zero. The female salaries range from $3,600 to $12,000. The skewness statistic for the females, 1.767, is about 10.6 standard error units greater than zero. The male salaries are more strongly skewed than the female salaries.

Table 5. Selected Descriptives for each Gender

Sex of employee Statistic Std. Error
Beginning salary Males Mean 8120.56 226.91
Median 6300.00
Std. Deviation 3644.71
Minimum 3600
Maximum 31992
Skewness 2.390 .152
Kurtosis 8.488 .302
Females Mean 5236.79 79.90
Median 4950.00
Std. Deviation 1174.24
Minimum 3600
Maximum 12000
Skewness 1.767 .166
Kurtosis 5.352 .330

The normality statistics, see Table 6, indicate that neither of the distributions are normal, KS for males(258) = .259, p < .0005, KS for females(216) = .148, p < .0005.

Table 6. Tests of Normality

Kolmogorov-Smirnov(a)

Sex of employee Statistic df Sig.
Beginning salary Males .259 258 .000
Females .148 216 .000
a Lilliefors Significance Correction

4. Assumption 4 (homogeneity). The boxplots in Figure 1 give a graphic representation of the homogeneity problem.

Figure 1. Boxplots for each Gender

The variances are not homogeneous, Levene(1, 472) = 105.969, p < .0005, see Table 7.

Table 7. Test of Homogeneity of Variance

Levene Statistic df1 df2 Sig.
Beginning salary 105.969 1 472 .000

There are major problems with homogeneity and different degrees of skewness in this data. The Mann-Whitney U (aka the Wilcoxon test for independent data) is more appropriate for this data than is the t test. In this situation people will sometimes report both the t test and the Mann-Whitney U test.  The Mann-Whitney U tests whether or not the two groups are "equivalent in location."   That is, do the distributions of the two groups overlap.

top


4. Independent Samples: Running the Mann-Whitney U

To run the Mann-Whitney U test click

Statistics
    Nonparametric Tests
         2 Independent Samples

Then select Mann-Whitney U as the test type.

The Kolmogorov-Smirnoz Z and the Wald-Wolfowitz runs tests are sensitive to both location and the shapes of the distributions. The Moses extreme reactions test looks at extreme scores of a treatment group relative to a control group.

Move salbeg to the Test Variable List: window and sex to the Grouping Variable: window. Open the Define Groups dialog box and enter 0 (males) as the value for Group 1: and 1(females) as the value for Group 2:. Click OK to run the analysis.

Information about the ranks is given in Table 8 and the Mann-Whitney U statistic information is given in Table 9.

The Mann-Whitney U ranks all the cases from the lowest to the highest score. The "Mean Rank" is the mean of the those ranks for each group and the Sum of Ranks is the sum of those ranks for each group.   U1  is defined as the number of times that a score from group 1 is lower in rank than a score from group 2.    U2 is defined as the number of times that a score from group 2 is lower in rank that a score from group 1. U is defined as the smaller of U1 or U2.  The computational formulas for U1 and U2 are as follows: 

U1 = n1n2 + (n1(n1 + 1))/2 - R1

U2 = n1n2 + (n2(n2 + 1))/2 - R2

where

n1 = number of observations in group 1
n2 = number of observations in group 2
R1 = sum of ranks assigned to group 1
R2 = sum of ranks assigned to group 2

In this example U1 is the smaller.

U1 = n1n2 + (n1(n1 + 1))/2 - R1
      = (258)(216) + (258(258 + 1))/2 - 81,285
      = 53,148 + 33,411 - 81,285
      = 89,139 - 81,285
      = 7,854

U2 = n1n2 + (n2(n2 + 1))/2 - R2
      = (258)(216) + (216(216 + 1))/2 - 31,290
     = 53,148 + 23,220 - 31,290
     = 76,370 - 31,290
     = 45,078

The Mann-Whitney U  looks at the locations of one set of scores relative to the locations of the other set of scores.  If U is not significant then the rankings of  one set of scores are similar to the rankings of the other set of scores.  

When the sample sizes for both groups are larger than 20, then the sampling distribution of U approaches a normal curve. In that case the Z score based on the U distribution can be reported. In this example the Z is -13.496.    If the distributions are identical in location then the Z score will be 0.  A Z score of 1.96 would indicate that the locations of the distributions are different at p = .05. 

                  Table8. Ranks

Sex of employee N Mean Rank Sum of Ranks
Beginning salary Males 258 315.06 81285.00
Females 216 144.86 31290.00
Total 474

Table 9. Test Statistics(a)

Beginning salary
Mann-Whitney U 7854.000
Wilcoxon W 31290.000
Z -13.496
Asymp. Sig. (2-tailed) .000
a Grouping Variable: Sex of employee

Note. If the number of tied ranks is "excessive" then the Mann-Whitney U may not be appropriate. (Hinkle, Wiersma, & Jurs, 1994). They do not define "excessive."

Additional Analyses

The bank might argue that salaries are higher for men than for women because there are more men than women in higher level positions in the bank. Hypothesis 2 is an equal pay for equal work hypothesis: male and female clerical workers receive equal starting salaries.

To test this hypothesis we first need to select the clerical workers in the file bank.sav. Clerical workers are coded as "1" in the variable jobcat. To select only the clerical workers click

Data
   Select Cases

In the Select Cases dialog box click the If condition is satisfied radio button. Click on the If... button to specify the condition to be met. Move jobcat to the window on the top right. Enter "= 1" and press continue. You should be back at the Select Cases dialog box. Now click the Use filter variable radio button and make sure that jobcat appears in the filter variable window. Finally make sure the Unselected cases are filtered radio button is checked, it is at the bottom on the Select Cases dialog box. When you use a variable as a filter all the values of that variable remain in the data base. You can turn the filter off by selecting the All cases radio button at the top of the Select Cases dialog box.

Then we use explore to test the assumptions of the t-test. Finding that the assumptions were not met we run the Mann-Whitney U.

Curious? Run it for yourself.

top


5. Related Samples: The Data

Hypothesis 3 - Current salaries, salnow, will be higher than beginning salaries, salbeg..

Although this hypothesis may not be very exciting. It gives us the opportunity to use the Wilcoxon test for pairs of variables.

top


6. Related Samples: Testing the Assumptions

Table 10. T Test Assumptions for Dependent Groups
1. Are the observations in the two groups independent or related?
2. What is the scale of measurement for the difference score?
3. What is the shape of the distribution of the difference scores?

Assumption 1 (independence). The observations in the two groups are dependent because the same participants contributed to both scores. This is a repeated measures design.

Assumption 2 (scale of measurement). The scale of measurement for the difference score is ratio, there is a rational zero point for salary differences.

Assumption 3 (normality).

NOTE: Check to make sure that the filter is not still on from the previous problem. If it is there will be a "Filter On" message at the bottom right of the SPSS Data Editor Window. If it is on go into the Select Cases dialog box to check the All cases radio button.

The shape assumption is that the difference scores are normally distributed. First compute the difference score as

COMPUTE saldiff = salnow - salbeg .

A positive saldiff score would indicate that current salaries are greater than beginning salaries. We can use the explore procedure to look at the stem-and-leaf plots, the skewness and kurtosis statistics, and the normality tests. The skewness statistics for saldiff are shown in Table 26. As expected the difference scale is also highly positively skewed.

Table 11. Selected Descriptives for the Salary Difference Score

Statistic Std. Error
SALDIFF Mean 6961.3924 198.6928
Median 5700.0000
Variance 18712960.780
Std. Deviation 4325.8480
Skewness 2.182 .112
Kurtosis 5.764 .224

top


7. Related Samples: Running the Wilcoxon Test for Paired Data

To run the Wilcoxon test for paired data click

Statistics
    Nonparametric Tests
         2 Related Samples
   

Then select Wilcoxon as the Test Type.

Check SPSS Help for descriptions of the other available tests.

Add the salbeg - salnow variable pair to the Test Pairs(s) List: and click OK.

Information about the ranks is given in Table 27 and the Mann-Whitney U statistic information is given in Table 28.

The computation of the value of the Wilcoxon test involves a) computing the difference scores, b) ranking the absolute values of the difference scores, and then c) finding the Mean Rank for all the cases with negative difference scores and the Mean Rank for all cases with positive difference scores.

In this example the total number of cases was 474. The current salary was greater than the beginning salary for every case, so the total number of negative difference scores was zero and the Mean Rank for those cases was 0. There were no tied cases. The total number of positive difference scores was 474, the total number of cases. The Mean Rank for those 474 cases was 237.50.

Do the results of this test support the hypothesis of no difference between current and beginning salary?

 

Table 12. Ranks

N Mean Rank Sum of Ranks
Current salary - Beginning salary Negative Ranks 0(a) .00 .00
Positive Ranks 474(b) 237.50 112575.00
Ties 0(c)

Total 474

a Current salary < Beginning salary
b Current salary > Beginning salary
c Beginning salary = Current salary
Table 13. Test Statistics(b)

Current salary - Beginning salary
Z -18.865(a)
Asymp. Sig. (2-tailed) .000
a Based on negative ranks.
b Wilcoxon Signed Ranks Test

top


8. References

Hinkle, D. E., Wiersma, W., & Jurs, S. G. (1994). Applied statistics for the behavioral sciences (3rd ed.). Boston: Houghton Mifflin.

Norusis, M. J. (1990). SPSS Introductory Statistics Student Guide. Chicago, IL: SPSS Inc.

Misanan, J. R., & Hinderliter, C. F. (1991). Fundamentals of Statistics for Psychology Students. New York: HarperCollins.

Myers, J. L., & Well, A. D. (1991). Research Design and Statistical Analysis. New York: HarperCollins.

top


ŠLee A. Becker, 1997, 1998 -revised 07/07/99