Crosstabs: Measures for Ordinal Data

Reading: SPSS Base 9.0 User's Guide: Chapter 14, Crosstabs
                SPSS Base 9.0 User's Guide: Chapter 6, File Handling and File Transformations -Weight Cases (pp. 121-122)

Download: films.sav        (Download Tips)

  1. Overview
  2. Define the Weight Variable
  3. Select the Crosstabs Options
  4. The Crosstabs Table
  5. Symmetric Measures for Ordinal Data
     A. Linear-by-Linear Association
     B. Gamma
     C. Tau c
     D. Tau b
     E. Spearman Correlation                
  6. Directional Measures for Ordinal Data
     A. Somers' D  

         Crosstabs: Measures for Nominal Data
         Crosstabs: Kappa                       

1. Overview

This set of notes looks at measures for ordinal and interval data. The data is a hypothetical survey of movie ratings by teen-agers. Teens are asked to rate horror films from one to three stars (1=*, 2=**, and 3=***) and to also rate the amount of violence in each film where 1 is "low violence", and 2 "yeachhhh, high violence". Both of these scales can be considered to be at least ordinal level of measurement. The responses from 54 teen-agers are included in the data file films.sav. The variables in films.sav are shown in Table1.

Table 1. The variables in films.sav
Variable Name Variable Label / Value Label
Violence

Amount of Violence /
1 = 'low violence' 
2 = 'high violence'

Rating Movie Rating /
1 = '*' (one star) , 
2 = '**' (two stars),
3 = '***' (three stars)
Count Observed Cell Counts

This data file makes use of a weight variable, called count, to specify the number of cases for each cell in the 2 (violence descriptions) x 3 (movie rating) contingency table. The count variable specifies the number of cases in each cell of the 2 x 3 table. The values in films.sav are shown in Table 2.

Table 2. The values in films.sav
violence rating count
1
1
1
2
2
2
1
2
3
1
2
3
10
  5
  2
  9
12
16

The count value of 10 for the first row indicates that there are 10 cases in cell 11 (low violence, one star). There are 5 cases in cell 12 (low violence, two stars); 2 cases in cell 13 (low violence, three start); and 9 cases in cell 21 (high violence, one star), and so forth. By adding all the values for count you find the total number of cases, N = 54.

Please note that normally you would enter the data for each case. If the weight variable were not used, then the data file would need to have 54 cases. The films.sav file uses only 6 cases to represent the entire set of data points. The weight variable is useful if you wish to reanalyze a published set of data. As in this example, you could enter the cell frequencies as a weight variable rather than entering all the individual cases.

Prior to running any statistics on these data you need to tell SPSS that the variable "count" is to be used as a weighting variable.

top


2. Define the Weight Variable

Select

Data
    Weight Cases

In the dialog box select the Weight cases by radio box and then move the count variable to the Frequency Variable: window. Then press OK to run the command.

top


3. Select the Crosstabs Options

The crosstabs dialog box is opened by clicking

Analyze
      Descriptive Statistics
            Crosstabs...

In this discussion the variable violence is designated as the row variable and the variable rating is designated as the column variable. The only cell options selected is the observed counts. The statistics selected are all the ordinal statistics (gamma, Somers d, Kendall's tau-b, Kendall's tau-c), correlations, chi-square, and Eta.

top


4. The Crosstabs Table

top


5. Symmetric Measures for Ordinal Data

The output from crosstabs is organized into table of statistics that are symmetric or directional. Directional measures yield different values depending upon which measure is considered to be the dependent variable. The only directional measure for ordinal data is Somers' d. Symmetric measures provide the same result no matter which variable is considered to be the dependent variable. Symmetric measures for ordinal data include the linear-by-linear association, gamma, Kendall's tau-b, Kendall's tau-c, and Spearman's rank order correlation. 

The symmetric statistics for ordinal measures are summarized in Table 5.

Table 5. Symmetric Measures

Value Asymp. Std. Error(a) Approx. T(b) Approx. Sig.
Ordinal by Ordinal Kendall's tau-b .349 .112 3.005 .003
Kendall's tau-c .374 .125 3.005 .003
Gamma .611 .166 3.005 .003
Spearman Correlation .370 .118 2.876 .006(c)
N of Valid Cases 54


a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
c Based on normal approximation.

 A. Linear-by-Linear Association

The linear-by-linear association statistic is also called the Mantel-Haenszel statistic. It is appropriate if both measures are at least ordinal. The statistic is reported when you ask for the chi-square statistics within crosstabs, See Table 5.

Table 5. Chi-Square Tests

The formula for the linear-by-linear association involves the Pearson product-moment correlation coefficient, r, and the total number of cases, N. The formula is

linear-by-linear association = r2 * (N - 1).

The computation of the linear-by-linear association for the films data is shown in Table 5.

Table 5. Calculation of the linear-by-linear association index
 linear-by-linear association = r2 * (N - 1)
                              = .3702 * (54 - 1)
                              = .1369 * 53
                              = 7.256

The Pearson product-moment correlation can be computed within crosstabs by checking the correlations box.

The discrepancy in the second and third decimal places between this calculation and the statistic reported in Table 3 is probably due to rounding. Crosstabs reports the correlation value to three decimal places. SPSS probably used several more decimal places when making its computation. For example, a correlation of .370288 will yield a linear-by-linear association value of 7.26700.

top


B. Gamma, g

Gamma is an ordinal statistic which is computed by using the ordinal statistical operations of "greater than" ("GT" or ">"), "less than" ("LT" or "<"), and "equal to" ("EQ" or "<>" or "="). Using these ordinal statistics each pair of data can be classified as either tied (T), concordant (P), or discordant (Q). The formula for gamma is

g = (P - Q) / (P + Q)

where P is the total number of concordant comparisons, and Q is the total number of discordant comparisons. Let's look at the films.sav data to see how this works. Table 3 is reproduced again below.

Let's label this set of data in terms of row (violence ratings) and columns (movie ratings). There are two rows and three columns (2 x 3) creating a 6-cell design. Each cell can be specified by its row and column position. For example, the low violence (row 1), two-star rating (column 2) cell can be specified as R1C2.

If we compare a data point from one cell (i) with a data point from anther cell (j) we can compare the row and column indices across the two cells as shown in Table 6.

Table 6. Possible Ordinal Comparisons
               ordinal
   Cell i    comparison        Cell j  
     Ri      >?   <?   =?        Rj
     Ci      >?   <?   =?        Cj

When we compare the row indices of the two cells we ask the question "is the row index for cell i greater than (>), less than (<) or equal to (=) the row index for cell j?" When we compare the column indices of the two cells we ask the question "is the column index for cell i greater than (>), less than (<), or equal to (=) the column index for cell j?"

Lets compare a data point in cell R1C1 (low violence, one-star) with a data point in cell R2C2 (high violence, two-stars).

Table 7. Concordant Comparison (C)
               ordinal
   Cell i    comparison    Cell j  

     R1          <           R2
     C1          <           C2
  10 * 12 = 120 concordant comparisons  

The ordinal comparison is consistent, the "one cell" is "less than" the "other cell" for both the row comparison and the column comparison. Conceptually, the movie which was rated as more violent was also rated as having more stars. When the ordinal comparison for both rows and columns is consistent, as in this comparison, the data pair is said to be "concordant" (C).

How many concordant pairs of data are there for the comparison between cell R1C1 and R2C2? There are 10 people in cell R1C1 and 12 people in cell R2C2. If we take the first person in cell R1C1 and compare that person to each of the 12 persons in cell C2C2 there are 12 concordant pairs (12 comparisons can be made). If we take the second person in cell R1C1 and compare that person to each of the 12 persons in cell C2C2 there are another 12 concordant comparisons. Since there are 10 people in cell R1C1, if we compare each of those 10 people to each of the 12 people in cell R2C2 there are 10 x 12 or 120 concordant comparisons.

Lets compare a data point in cell R1C2 (low violence, two-stars) with a data point in cell R2C1 (high violence, one-stars).

Table 7. Discordant Comparison (Q)
               ordinal
   Cell i    comparison    Cell j  

     R1          <           R2
     C2          >           C1
  5 * 9 = 45 Discordant comparisons  

The ordinal comparison is inconsistent, for the row comparison "one cell" is "less than" the "other cell" but for the column comparison "one cell" is "greater than" the "other cell." For these comparisons the movie which was rated as more violent was rated as having fewer stars. When the ordinal comparison for both rows and columns is inconsistent, as in this comparison, the data pair is said to be "discordant" (Q).

How many concordant pairs of data are there for the comparison between cell R1C2 and R2C1? There are 45 (5 x 9) discordant pairs.

The third logical possibility is that the data pairs are tied on one or more of the row or column comparisons. Lets compare a data point in cell R1C1 (low violence, one-star) with a data point in cell R1C3 (low violence, three-stars).

Table 8. Tied Comparison (T)
               ordinal
   Cell i    comparison    Cell j  

     R1          =           R1
     C1          <           C3
  10 * 2 = 20 Tied comparisons  

The ordinal comparison is tied on the row comparison, all these data pairs are said to be tied (T). How many tied comparisons are there when the people in cell R1C1 are compared with the people in cell R1C3?

Lets look at each possible data pair and decide whether that pair is concordant, discordant or tied.

Table 9. Summary of Ordinal Comparisons for the Movie Rating Data
Cell pairs         Data pairs are P, Q or T(ied)?
---------------  ------------------------------------------------ 

--comparisons that are tied on a row variable (violence)

R1C1 with R1C2   Tr tied on a row variable (10 *  5 =  50 pairs)
R1C1 with R1C3   Tr tied on a row variable (10 *  2 =  20 pairs)
R1C2 with R1C3   Tr tied on a row variable ( 5 *  2 =  10 pairs)
R2C1 with R2C2   Tr tied on a row variable ( 9 * 12 = 108 pairs)
R2C1 with R2C3   Tr tied on a row variable ( 9 * 16 = 144 pairs)
R2C2 with R2C3   Tr tied on a row variable (12 * 16 = 192 pairs)
                 -----------------------------------------------
                 Total pairs tied on a row variable = 524 pairs

--comparisons that are tied on a column variable (film ratings)

R1C1 with R2C1   Tc tied on a col variable (10 *  9 =  90 pairs)
R1C2 with R2C2   Tc tied on a col variable ( 5 * 12 =  60 pairs)
R1C3 with R2C3   Tc tied on a col variable ( 2 * 16 =  32 pairs)
                 -----------------------------------------------
                 Total pairs tied on a col variable = 182 pairs

--P (concordant pairs)

R1C1 with R2C2   P concordant pairs        (10 * 12 = 120 pairs)
R1C1 with R2C3   P concordant pairs        (10 * 16 = 160 pairs)
R1C2 with R2C3   P concordant pairs        ( 5 * 16 =  80 pairs)
                 -----------------------------------------------
                          Total P concordant pairs  = 360 pairs

--Q (discordant pairs)

R2C1 with R1C2   Q discordant pairs        ( 9 *  5 =  45 pairs)
R2C1 with R1C3   Q discordant pairs        ( 9 *  2 =  18 pairs)
R2C2 with R1C3   Q discordant pairs        (12 *  2 =  24 pairs)
                 -----------------------------------------------
                          Total Q discordant pairs  =  87 pairs

Given this ordinal data we can compute gamma as follows:

g  =  (P - Q)/(P + Q)
  =  (360 - 87)/(360 + 87)
  =  273/447
  =  .611

Gamma is the proportion of concordant-discordant (P-Q) pairs over the total number of pairs (P+Q). It takes on a positive value if the number of concordant pairs (P) is larger than the number of discordant pairs (Q), a negative value if the number of discordant pairs is greater than the number of concordant pairs, and zero is the number of concordant pairs equals the number of discordant pairs.

These data suggest that moving ratings are positively related to the violent content of those movies, g (N = 54) = .61, p = .003.

Note that gamma ignores tied cases. There are three other statistics available in CROSSTABS which are based on concordant (P) and discordant (Q) data pairs. They are: Kendall's tau-b; Kendall's tau-c; and Somers' d. Each of them deal with tied data pairs (T) in different ways.

top


C. Tau c

Kendall's tau-c is conceptually similar to gamma, but it makes an adjustment for the number of rows and columns and uses the total number of cases rather than just the total number of concordant and discordant pairs as in gamma.

tau c = 2m(P-Q) / ((N*N)(m-1))

where P and Q are the number of concordant and discordant pairs, m is the number of rows or column, whichever is smallest, and N is the total number of cases.

For our movie rating example, tau c is --
tau c =((2*2)(360-87)) / ((54*54)(2-1)) 
      =(4*273)/(2916*1) 
      = 1092/2916 
      = .374

top


D. Tau b

Kendall's Tau b is also conceptually similar to gamma, but it makes a correction for tied pairs on both the dependent variable, Ty, and the independent variable, Tx. Its formula is

tau b = P-Q/SQRT((P+Q+Tx)(P+Q+Ty))

where Tx is the number of pairs tied on X but not Y, and Ty is the number of pairs tied on Y but not X.

In our movie example the movie rating would normally be considered the dependent variable and the violence ratings the independent variable. Then tau b would be --

tau b = (360-87)/SQRT((360+87+524)(360+87+182))
      = 273/SQRT((971)(629)) 
      = 273/SQRT(610759) 
      = 273/781.5107 
      = .349

top


E. Spearman Correlation, rs

The Spearman rank order correlation is a correlational measure that is used when both variables are ordinal. The traditional formula for calculating the Spearman rank-order correlation is

where d2 is the difference between paired ranks, and n is the number of paired ranks. To use this formula you would first rank the data for each variable and then find the differences in the ranks, d, for each case.

CAUTION: Please note that the formula given above is inappropriate when there are tied ranks. Our example data has many ties. The rank order correlation computed by that formula is .513, whereas the correct value (given by SPSS) is .370. So how should the Spearman rank-order correlation be computed if there are ties? The Spearman correlation is a special case of the Pearson product-moment correlation. If you compute a Pearson product-moment correlation on the ranked data the result will be the correct value of the Spearman rank order correlation.

How to compute ranks when there are ties

The violence variable has only two levels (1 and 2) with 17 and 37 cases respectively. All 17 cases at level 1 are tied with each other. How can you rank them from 1 to 17? You can't, so you assigned each of them the average rank. The mean of the ranks from 1 to 17 is 9.0.

mean rank, 1 to 17 = (1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17) / 17
                              = 153 / 17
                              = 9.0

The 27 cases at violence level 2 would have occupied the ranks from 18 through 54. The mean of those ranks is 36.0, so every case in violence level 2 is assigned a rank of 36.

mean rank, 18 through 54 = (18+19+20+21+22+23+24+25+26+27+28+29+30+31+32+33+34+35+36+
                                             37+38+39+40+41+42+43+44+45+46+47+48+49+50+51+52+53+54) / 27
                                        = 972 / 27
                                        = 36.0

The movie rating variable has three levels (1, 2, and 3) with 19, 17, and 18 cases respectively. The 19 cases at movie rating level 1 would have occupied the ranks from 1 through 19. The mean of those ranks is 10.0. The 17 cases at movie rating level 2 would have occupied the ranks from 20 through 36. The mean of those ranks is 28.0. The 18 cases at movie rating level 3 would have occupied the ranks from 37 through 54. The mean of those ranks is 45.5.

Using the Pearson product-moment correlation on the ranked data

SPSS will correctly compute the correct rank-order correlation value. If you want to see for yourself how it works you could compute the ranks for the violence and ratings variables and then compute a Pearson product-moment correlation on the ranked data. Go to

Transform
    Rank Cases

Move both rating and violence to the Variables: window and press OK (make sure the weighting variable is turned on before you rank the data). Two new variables will be created in the Data Editor window, rrating, the ranks for the rating variable, and rviolence, the ranks for the violence variable. Check out the ranks for the two variables. As discussed above, the ranks for rating levels 1, 2, and 3 should be 10.0, 28.0, and 45.5. The ranks for violence levels 1 and 2 should be 9.0 and 36.0, respectively. Now run a Pearson product-moment correlation on the ranked variables rrating and rviolence.

Statistics
    Correlate
        Bivariate

Move rrating and rviolence to the Variables: window and click OK. The correlation is .370, the value of the Spearman rank-order correlation.

In this data the values of the Spearman rank order correlation and the Pearson product-moment correlations were both the same. This would not normally be the case.

top


 6. Directional Measures for Ordinal Data

Crosstabs computes one directional measure for ordinal data, Somers' d.

A. Somers' d 

The crosstabs output for Somers' d is shown in the next table.

Gamma, tau-b, tau-c, and Spearman's rank order correlation make no distinction between the independent and the dependent variable. Somers' d assumes that you can identify one of the variables as the dependent variable. The formula only includes ties on the dependent variable (Ty).

Somers' d = P-Q/(P+Q+Ty)

For the sake of discussion lets assume that the movie rating was the dependent variable in the example from the gamma discussion. Ty would be the number of ties on movie ratings (column ties), but not violence ties (row ties).

Somers' d = (360-87)/(360+87+182) 
          = 273/629 
          = .434

On the crosstabs output this corresponds to the Somers' d with "Movie Rating Dependent."

top


ŠLee A. Becker, 1997-1999  -revised 09/27/99