Reading: SPSS Base 8.0 User's Guide: Chapter 13,
Crosstabs
Download: ctptsd40.sav (Download Tips)
ctptsd80.sav
![]()
There are a wide variety of statistics available in the CROSSTABS procedure. Some of the CROSSTABS statistics are appropriate for nominal (categorical) scales, some for ordinal scales, and some for interval scales. In order to use the output from CROSSTABS wisely, you should be able to recognize what type of data is appropriate for each statistic and you should be able to recognize the level of measurement for the scales that you are analyzing. You should review the Scales of Measurement notes.
This set of notes organizes the CROSSTABS statistics into those that are appropriate for nominal level measures (Pearson Chi-Square, Likelihood Ratio, Phi, Cramer's V, Contingency Coefficient, Lambda, Goodman & Kruskal Tau, Uncertainty Coefficient, and Kappa), those that are appropriate for ordinal level measures (Mantel-Haenszel, Gamma, Tau c, Tau b, Somers' D, and Spearman Correlation), and those that are appropriate for interval level measures (Pearson's r and Eta).
Most of you are familiar with the Chi-Square procedure, but I expect that most of you are unfamiliar with many of the other measures available within CROSSTABS. For that reason these notes spend some time describing the statistics themselves, how they are computed, and how they can be interpreted.
Suppose that you wish to test the effectiveness of two different clinical interventions for Post Traumatic Stress Disorder (PTSD). The participants are selected for the research because they all meet the DSM-IV diagnostic criteria for PTSD prior to treatment. Participants are randomly assigned to one of the two treatments. The incidence of PTSD is assessed after treatment. There are 40 participants in the study. The data are stored as ctptsd40.sav. The variables in ctabpt40.sav are given in Table 1.
| Variable Name | Variable Label / Value Label |
|---|---|
| tx | Treatment Condition/ 1 'Intervention #1' 2 'Intervention #2' |
| ptsd | PTSD Diagnosis after Treatment 0 'Did not meet PTSD criteria' 1 'Did meet PTSD criteria'/ |
The crosstabs dialog box is opened by clicking
Statistics
Summarize
Crosstabs
Bar charts can be displayed by checking the Display clustered bar charts option.
Crosstabs tables can be suppressed by checking the Suppress tables option.
The statistics options, the display of cell information options, and table formatting options can be accessed by clicking the buttons at the bottom of the dialog box.
Crosstabs will create a 2-dimensional table. Move the variable you wish to display as the row variable to Row(s): and move the variable you wish to display as the column variable to Column(s):.
You can use a third variable to layer the crosstabs output. For example, suppose that the primary 2 x 2 crosstabs looks at whether participants help or do not help another person based on having just found 5 cents or 1 dollar. The third variable might look at the crosstabs within another variable such as gender. If the gender variable were added to the Layer window then one 2 x 2 (help by amount of money found) crosstabs tables would be generated for males and another 2 x 2 (help by amount of money found) crosstabs table would be generated for females.
For this discussion the variable tx is chosen to be the row variable and ptsd is chosen to be the column variable. There are no layer variables.
Click on the Statistics.. button to display the statistics options. The statistics options are organized according the scale of measurement of the variables included in the crosstabs. In order to be a wise user of crosstabs you need to understand the level of measurement of the variables included in the crosstabs.
Th default statistic is chi-square. If you choose chi-square several statistics are displayed including a continuity corrected chi-square (for 2 x 2 tables only), Fisher's exact test, the likelihood ratio, and a test of the linear by linear association.
Statistics for nominal measures include the contingency coefficient, lambda, and the uncertainty coefficient.
Statistics for ordinal measures include gamma, Somers' d, Kendall's tau-b, and Kendall's tau-c.
There is one statistic, eta, designed for the case when one one of the measures is nominal and the other is interval.
The correlation option yields both Spearman's rank order correlation (used for ordinal data) and Pearson's product moment correlation (used for interval data).
Other statistics include kappa, used as a measure of inter-rater reliability, a risk analysis test, and McNemar's test, used as test of change in repeated measures designs.
The statistics discussed in this set of notes include chi-square and the nominal statistics (contingency coefficient, phi, Cramer's V, lambda, and the uncertainty coefficient).
The options for the information to be displayed in each cell of the crosstabs are displayed by clicking the Cells... button. The options are divided in to three categories: counts, percentages, and residuals.
The default is the count of the number of observed cases in each cell. You can also choose to display the expected counts for each cell. The observed and expected counts are the basic elements used in the computation of chi-square and the other nominal level statistics.
Percentages within each cell can be based on the row totals (Row), the column totals (Column), and/or on the total number of cases (Total).
The residuals are based on the difference between the observed (O) and the expected (E) values. The unstandardized residual is the simple difference of the observed and expected values.
unstandardized residual = O - E.
The standardized residual is found by dividing the difference of the observed and expected values by the square root of the expected value.
standardized residual = O - E / ÖE
The standardized residual can be interpreted as any standard score. The mean of the standardized residual is 0 and the standard deviation is 1.0. Standardized residuals are calculated for each cell in the design. They are useful in helping to interpret chi-square tables by providing information about which cells contribute to a significant chi-square. If the standardized residual is greater than (the absolute value of) 2.00, then that cell can be considered to be a major contributor to the overall chi-square value.
The adjusted standardized residuals are standardized residuals that are adjusted for the row and column totals. The adjusted standardized residual is defined as
adjusted standardized residuals = O - E / SQRT[nA * nB * (1 - nA/N) * (1 - nB/N) / N]
where nA is to row total, nB is the column total, and N is the total number of cases (Haberman, 1978, p. 111).
The format... options control the order of printing of the row variable. The default is to print the row variable according to the ascending order of its values. You could choose to print the row variable according to the descending order of its values.
The crosstabs table is shown in Table 2. The cell options included count, expected count, row percentage, and the unstandardized residual. There is no hard and fast rule about whether to report row or column percentages. In this case I wanted to know the percentages who met the PTSD diagnosis within each treatment condition so the row (treatment condition) percentage option was chosen.
If you were going to present the table in a thesis you would generally display only counts and percentages. The expected counts and residuals will be used in the discussion of the computation of the chi-square statistic.
| PTSD Diagnosis After Treatment | Total | ||||
|---|---|---|---|---|---|
| Did not meet PTSD criteria | Did meet PTSD criteria |
||||
| Treatment Condition | Intervention #1 | Count | 14 | 6 | 20 |
| Expected Count | 11.5 | 8.5 | 20.0 | ||
| % within Treatment Condition | 70.0% | 30.0% | 100.0% | ||
| Residual | 2.5 | -2.5 | |||
| Intervention #2 | Count | 9 | 11 | 20 | |
| Expected Count | 11.5 | 8.5 | 20.0 | ||
| % within Treatment Condition | 45.0% | 55.0% | 100.0% | ||
| Residual | -2.5 | 2.5 | |||
| Total | Count | 23 | 17 | 40 | |
| Expected Count | 23.0 | 17.0 | 40.0 | ||
| % within Treatment Condition | 57.5% | 42.5% | 100.0% | ||
Recall that people were selected to participate in this study because they met the PTSD criteria prior to treatment. The 40 participants in the study were randomly assigned to the two treatments, intervention #1 and intervention #2. After participating in intervention #1, 14 people (70%) no longer met the PTSD criteria. The remaining 6 participants (30%) were still met the PTSD criteria. After participating in intervention #2, 9 people (45%) no longer met the PTSD criteria. The remaining 17 participants (55%) still met the PTSD criteria. Overall, 23 of the 40 participants (57.7%) no longer met the PTSD criteria at the end of the study.
It looks as though intervention #1 may have been more successful than intervention #2. The question is, is this difference large enough to be statistically significant?
The chi-square statistics associated the crosstabs in Table 2 above are given in the next table..
| Value | df | Asymp. Sig. (2-sided) | Exact Sig. (2-sided) | Exact Sig. (1-sided) | |
|---|---|---|---|---|---|
| Pearson Chi-Square | 2.558(b) | 1 | .110 | ||
| Continuity Correction(a) | 1.637 | 1 | .201 | ||
| Likelihood Ratio | 2.588 | 1 | .108 | ||
| Fisher's Exact Test | .200 | .100 | |||
| Linear-by-Linear Association | 2.494 | 1 | .114 | ||
| N of Valid Cases | 40 | ||||
| a Computed only for a 2x2 table | |||||
| b 0 cells (.0%) have expected count less than 5. The minimum expected count is 8.50. | |||||
The chi-square statistic answers the question "are the two variables independent?" In this case the question is, "is PTSD at posttreatment independent of the treatment?" The chi-square value is printed in the row headed "Pearson Chi-Square." It is named after Karl Pearson, the developer of chi-square. The chi-square is not significant, c ²(1, N = 40) = 2.56, p = .11, so the two variables are independent. The posttreatment PTSD diagnosis is independent of the treatment. In other words, one treatment was not more effective than the other treatment.
Note: reporting chi-square statistics. When you report a value for chi-square you must include the degrees of freedom and the number of cases. The value of the chi-square should be rounded to 2 decimal places.
Computation of Chi-Square. The formula for chi-square is:
![]()
where O is the observed frequency and E is the expected frequency. The unstandardized residual (see Table 1) is the value of O - E.
The observed values are the "counts" within each cell. The "expected" counts are the counts that would be expected if the two variables were unrelated (if they were independent). If the interventions and PTSD diagnosis were unrelated then we would expect that the percent of people with (or without) a PTSD diagnosis after treatment would be the same for intervention #1 as it was for intervention #2. After treatment, the overall percentage of people who were not diagnosed as PTSD was 57.5%, the remainder, 42.5%, were diagnosed as PTSD. If intervention and diagnosis were independent then we would expect the same percentage of people with the PTSD diagnosis in each of the treatment conditions.
For this discussion lets label the cells as follows: the rows will be labeled as r1 and r2 for interventions #1 and #2 respectively; the columns will be labeled as c1 (Did not meet the PTSD criteria) and c2 (Did meet the PTSD criteria). Each cell is labeled by its respective row and column number, e.g., r1c1 for the the cell: intervention #1-Did not meet the PTSD criteria.
There were 20 people in intervention #1; 57.5% of 20 is 11.5, which is the expected count for cell r1c1. There were 20 people in intervention #2; 57.5% of 20 is 11.5, which is the expected count for cell r2c1. The expected count of cell r1c2 is 42.5% of 20 or 8.5. The expected count of cell r2c2 is also 42.5% of 20 or 8.5.
Another way to figure out the expected frequencies is to multiple the row total count by the column total count and divide by the total N. Using this method the expected count for cell r1c1 (Er1c1) would be
Er1c1 = (row total * column total) / N
= (20*23)/40
= 460/40
= 11.5
The steps for the calculation of the chi-square are shown in Table 4.
CELL O E O-E (O-E)2 (O-E)2/E
r1c1 14 11.5 14-11.5= 2.5 6.25 6.25/11.5 = 0.54348
r1c2 6 8.5 6- 8.5=-2.5 6.25 6.25/ 8.5 = 0.73529
r2c1 9 11.5 9-11.5=-2.5 6.25 6.25/11.5 = 0.54348
r2c2 11 8.5 11- 8.5= 2.5 6.25 6.25/ 8.5 = 0.73529
--------------------
c2(1, N = 40) = 2.55754
|
Degrees of Freedom. The degrees of freedom for the chi-square is calculated as the number of rows minus1 times the number of columns minus1.
df = (r-1) * (c-1)
= (2-1) * (2-1)
= 1 * 1
= 1
Minimum Expected Count. If the minimum expected count for a chi-square is less than 5, then the chi-square value may not be accurate. In this instance the minimum expected frequency is 8.5, so the calculated chi-square is all right.
The index reported as the "Continuity Correction" is the also known as the Yates' correction. This correction is given for 2 x 2 tables. The continuity correction was thought to give a better approximation to the theoretical sampling distribution for chi-square when the observed frequencies in any cell were small (less than 5). However, research by Camilli and Hopkins (1978) has led to the recommendation that the continuity correction not be used for 2 x 2 tables because it results in an unnecessary loss of power (e.g., Hinkle, Wiersma, & Jurs, 1994).
You make the correction by adding .5 when the difference between observed-expected is negative and subtracting .5 when the difference between observed-expected is positive. The computation of the chi-square with the continuity correction is shown in Table 5.
corrected corrected corrected
CELL O E O-E O-E (O-E)2 (O-E)2 / E
r1c1 14 11.5 2.5 2.5 - .5 = 2.00 4.00 4.00/11.5 = 0.34783
r1c2 6 8.5 -2.5 -2.5 + .5 = -2.00 4.00 4.00/ 8.5 = 0.47059
r2c1 9 11.5 -2.5 -2.5 + .5 = -2.00 4.00 4.00/11.5 = 0.34783
r2c2 11 8.5 2.5 2.5 - .5 = 2.00 4.00 4.00/ 8.5 = 0.47059
--------------------
c2 with Continuity Correction(1, N = 40) = 1.63684
|
The likelihood ratio is a statistic which is computed using a log-linear model. The trend in statistics is towards log-linear models and away from chi-square models. Loglinear models have the advantage of being able to decompose the variance based on the frequencies table into component parts, much like ANOVA. You can think of log-linear models as ANOVAs for categorical data. The symbol for the likelihood ratio according to the current APA Publication Manual is LR. Statistical tests often use the symbol L2. The formula for the likelihood ratio is
![]()
This statistic starts out the same as chi-square, you build a table of observed (O) and expected (E) frequencies for each cell in the design. Next find the ratio of the observed/expected frequencies, O/E. Then find the natural log of that ratio, ln(O/E). Next, multiply ln(O/E) by the observed frequency (O). Finally, multiply the sum of all the O*ln(O/E) by 2 to find the likelihood ratio. The calculation of the likelihood ratio for this data is given in Table 6.
CELL O E O/E ln(O/E) O*ln(O/E)
r1c1 14 11.5 1.2173913 0.1967103 14* 0.1967103 = 2.7539441
r1c2 6 8.5 0.7058824 -0.3483067 6*-0.3483067 = -2.0898402
r2c1 9 11.5 0.7826087 -0.2451225 9*-0.2451225 = -2.2061021
r2c2 11 8.5 1.2941176 0.2578291 11* 0.2578291 = 2.8361202
------------------------
1.2941220
LR (1, N = 40)= 2 * 1.2941220 = 2.58824
|
Fisher's exact test is reported for 2 x 2 crosstabs tables. It can be used instead of the chi-square value when one or more of the cells has an expected value of less than 5.
The linear-by-linear association measure is appropriate only if both the row and column variables are at least ordinal. This measure is discussed in the notes on crosstabs measures for ordinal data.
The magnitude of the chi-square statistic is related to the sample size. If the percentages within each cell are constant then the larger the sample size the larger the chi-square value. Suppose that you continued to collect data in the present experiment until you doubled the sample size from 40 to 80 participants and that the percentages in each of the 80 participant data were exactly the same as those in the 40 participant data. The data for the 80 participant study is stored in ctptsd80.sav. You could run the chi-square with the file ctptsd80.sav to verify the results discussed in this section.
Compare the chi-square values for the 40-participant example chi-square values for the 80-participant example in Table 8.
| N = 40 | N = 80 | ||||
|---|---|---|---|---|---|
| df | Value | Sig. (2-sided) |
Value | Sig. (2-sided) |
|
| Pearson Chi-Square | 1 | 2.56(b) | .110 | 5.12(c) | .024 |
| Continuity Correction(a) | 1 | 1.64 | .201 | 4.14 | .042 |
| Likelihood Ratio | 1 | 2.59 | .108 | 5.18 | .023 |
| Fisher's Exact Test | .200 | .041 | |||
| Linear-by-Linear Association | 1 | 2.49 | .114 | 5.05 | .025 |
| a Computed only for a 2x2 table | |||||
| b 0 cells (.0%) have expected count less than 5. The minimum expected count is 8.50. | |||||
| c 0 cells (.0%) have expected count less than 5. The minimum expected count is 17.00. | |||||
The value of the chi-square for this 80-participant example is 5.12, p < .05, while the chi-square for the 40-participant example was only 2.56, p > .05. Because the percentages in each cell are the same for the two sample sizes, the differences in the chi-square value are the result of the different sample sizes. The measures of association described in the following sections were designed to control for sample size.
The statistics options for nominal measures include: the contingency coefficient, phi, Cramér's V, lambda, and the uncertainty coefficient. The output for those statistics is organized into two tables, one table for symmetric measures (phi, Cramer's V, and the contingency coefficient) and one table for directional measures (lambda, Goodman and Kruskal's tau, and the uncertainty coefficient). A measure is directional if the value of the statistic depends on which of the variables is designated as the dependent variable. A measure is symmetric if the value of the statistic is same no matter which variable is designated as the dependent variable. This section describes symmetric measures.
These statistics are all measures of association that are conceptually similar to the correlation coefficient, another measure of association. The possible values for the correlation coefficient range from -1 to +1. The possible values for these statistics will vary, but they tend to fall within the range of 0 to +1. They do not take on negative values because the values of nominal measures are not ordered. As we shall see in the crosstabs notes for ordinal measures, measures of association based on ordinal data can take on negative values..
The value of the chi-square itself is difficult to interpret because it is a function of the sample size, the amount of independence between the variables, and the degrees of freedom. In order to overcome this difficulty of interpretation, several statistics have been created that measure the "degree of association" between the two nominal variables. . Three of these measures, phi, Cramer's V, and the contingency coefficient, are based on the chi-square itself. The crosstabs output for these symmetric statistics are shown in Table 9.
![]() |
![]() |
The values of the statistics are the same for both sample sizes. Because these measures are all based on the chi-square statistic, the significance level is also based on the significance of the chi-square statistic.
Phi is a measure of association based on chi-square which controls for sample size. Phi can range from 0 to +1. It is most appropriate for 2 x 2 contingency tables.
It is calculated as the square root of the value of chi-square divided by N, the total sample size.
![]()
The phi calculations for the 40- and 80-person samples are shown in Table 10.
| N = 40 | N = 80 |
|---|---|
| f = SQRT(c2/N)
= SQRT(2.55754/40) = SQRT(0.063986 ) = .253 |
f = SQRT(c 2/N) = SQRT(5.11509/80) = SQRT(0.0639386 ) = .253 |
Cramér's V is appropriate for tables that are larger than 2 x 2. It also uses chi-square and corrects for table size. Cramér's V can range from 0 to +1.
The formula for Cramér's V is --
![]()
where N is the total number of cases and k is the smaller of the number of rows and columns.
For 2 X 2 tables k = 2 so the k-1 term becomes 1. Consequently, for 2 x 2 tables Cramér's V is equal to phi.
The contingency coefficient (CC) is another way of correcting for sample sizes and tables that are larger than 2 x 2. The problem with the contingency coefficient is that its maximum value depends upon the size of the table. The maximum CC value for a 2 x 2 table is 0.707. The maximum CC value for a 4 x 4 table is 0.87. This makes it nearly impossible to compare CC values across different size tables. I would recommend using phi or Cramér's V rather than the Contingency Coefficient.
The formula for CC is --
![]()
The contingency coefficient calculations for the 40- and 80-person samples are shown in Table 11.
| N = 40 | N = 80 |
|---|---|
| CC = SQRT[c2/(c2 + N)] = SQRT[2.55754/(2.55754 + 40)] = SQRT(.0601 ) = .245 |
f = SQRT[c 2/(c2 + N)] = SQRT[5.11509/(5.11509 + 80)] = SQRT(.0601 ) = .245 |
The directional measures of association for the 80 person example are shown in Table 12. The values of these measures for the 40 person sample would be the same, although the significance levels would be larger (less significant).
| Value | Asymp. Std. Error(a) | Approx. T(b) | Approx. Sig. | |||
|---|---|---|---|---|---|---|
| Nominal by Nominal | Lambda | Symmetric | .189 | .135 | 1.302 | .193 |
| Treatment Condition Dependent | .250 | .126 | 1.747 | .081 | ||
| PTSD Diagnosis After Treatment Dependent | .118 | .175 | .634 | .526 | ||
| Goodman and Kruskal tau | Treatment Condition Dependent | .064 | .055 | .025(c) | ||
| PTSD Diagnosis After Treatment Dependent | .064 | .055 | .025(c) | |||
| Uncertainty Coefficient | Symmetric | .047 | .041 | 1.158 | .023(d) | |
| Treatment Condition Dependent | .047 | .040 | 1.158 | .023(d) | ||
| PTSD Diagnosis After Treatment Dependent | .047 | .041 | 1.158 | .023(d) | ||
| a Not assuming the null hypothesis. | ||||||
| b Using the asymptotic standard error assuming the null hypothesis. | ||||||
| c Based on chi-square approximation | ||||||
| d Likelihood ratio chi-square probability. | ||||||
Lambda is based on an entirely different concept than is chi-square. Lambda is based on the idea of a proportional reduction in error. The basic formula is as follows--
Lambda = ( P(1) - P(2) ) / P(1)
where P(1) the overall probability of making an incorrect classification. P(1) is equal to 1 minus the probability of the modal category without taking the classification variable into account. P(2) is the probability of making an error after taking into account the classification variable. P(2) is equal to 1 - the sum of the modal categories at each level of the classification variable.
To see how this works lets take another look at the data from the 80-participant study.
| PTSD Diagnosis After Treatment | Total | ||||
|---|---|---|---|---|---|
| Did not meet PTSD criteria | Did meet PTSD criteria |
||||
| Treatment Condition | Intervention #1 | Count | 28 | 12 | 40 |
| % within Treatment Condition | 70.0% | 30.0% | 100.0% | ||
| Intervention #2 | Count | 18 | 22 | 40 | |
| % within Treatment Condition | 45.0% | 55.0% | 100.0% | ||
| Total | Count | 46 | 34 | 80 | |
| % within Treatment Condition | 57.5% | 42.5% | 100.0% | ||
Suppose that you were to guess whether or not a participant had met the criterion for PTSD. If you did not know the intervention condition for a participant you would maximize the number of correct guesses by guessing the modal (largest category) for everybody. In this case we would guess that each participant did not meet the PTSD criteria. We would be correct for 46 participants (57.5%). and thus incorrect for 34 participants (42.5%). So, P(1), the probability of an incorrect classification would be .425.
If you took the intervention into account and made predictions of whether or not the participant met the PTSD criteria, then you would guess that a person who had received intervention #1 would not meet the PTSD criteria and that a person who had received intervention #2 would meet the PTSD criteria. You would be correct for 28 of the participants who had received intervention #1 and for 22 of the participants who had received intervention #2. Our error rate, taking into account the intervention condition is 30 people (12 in intervention #1 and 18 in intervention #2) or 37.5% (30/80 = .375) of the total N. So P(2) is .375.
Lambda would be
Lambda = ( P(1) - P(2) ) / P(1)
= (.425 -
.375)/ .425
= .050/.425
= .118
The proportional reduction in errors, given that we take the intervention into account is 11.8% . We originally had a 42.5% error rate, we reduced it to a 37.5% error rate. This 5% decrease was 11.8% of the total percentage of errors (42.5%).
Lambda is an asymmetric measure. That is, the value of lambda depends upon which variable is considered to be the dependent variable. In this example, we used the intervention (the independent variable) to predict the incidence of PTSD (the dependent variable). In the accompanying CROSSTABS print out this corresponds to the lambda column labeled "WITH PTSD DEPENDENT".
Lets compute lambda using the intervention as the dependent measure. Looking at the totals back in our example for this section note that the modal intervention category is intervention #1 (40 people were assigned to intervention #1). In this case there were an equal number in each category so we could use either intervention #1 or #2 as the modal category. Thus, if we know nothing about whether or not a participant met the PTSD criteria, we would predict intervention #1 for each person in the sample and we would be wrong for the 40 people who had received intervention #2. Our error rate, P(1) would be 40/80 or .5000 . If we now take meeting the PTSD criteria into account we would predict that those who did not meet the PTSD criteria would be in the intervention #1 condition and we would make incorrect guesses for the 18 people who were actually in intervention #2. Knowing that a person did meet the PTSD criteria we would predict that the person was in intervention #2 and we would make an incorrect guess for the 12 people who were actually in intervention #1. Our total error rate, taking PTSD into account is 30 people. The probability of making an error when we take PTSD into account, P(2), is 30/80 or .375. Lambda then becomes --
Lambda = ( P(1) - P(2) ) / P(1)
= (.500 -
.375)/ .500
= .125/.500
= .250
We have reduced our error rate by 25% when we take into account PTSD when making a prediction of the intervention condition. In the accompanying CROSSTABS print out this corresponds to the lambda column labeled "WITH TX DEPENDENT".
The symmetric value for lambda is a kind of an average between the two asymmetric values.
Can you see why lambda would be the same for our example with 40 people?
Notes on the interpretation of lambda --
Significance of Lambda
CROSSTABS provides you with an approximate significance level for lambda. The approximate significance level when "PTSD Diagnosis After Treatment" is used as the dependent variable is p = .526. The approximate significance level when the "Treatment Condition" is used as the dependent variable is p = .081.
Conceptual Interpretation of Lambda
Another consideration when interpreting lambda is recognizing that (a) lambda is a proportional reduction in error (PRE) measure, (b) that it is an asymmetric measure (the value of lambda depends on which variable is considered to be the dependent variable), and (c) it is used with nominal measures.
For example, a lambda of .35 means that there was a 35% reduction in error in predicting the dependent variable when the independent variable was taken into account.
The Goodman and Kruskal Tau is similar to lambda. According to Goodman and Kruskal (1954) it can be interpreted as the relative decrease in the proportion of incorrect predictions when we go from predicting the row category based only on the row marginal probabilities (as in Lambda) to predicting the row category based on the conditional proportions of both row and column.
For further reading about Goodman and Kruskal's tau see Bishop, Feinberg, and Holland (1975) or Goodman and Kruskal (1954).
The uncertainty coefficient is similar to Lambda. The value it gives in an estimate of the reduction in error based on knowing a person's position on the independent variable. The formula takes into account the entire distribution rather than just the mode. There is a description of the formula in the an old SPSS manual (Nie, Hull, Jenkins, Steinbrenner, & Bent, 1975) on page 226-227.
Bishop, Y. M. M., Feinberg, S. E., & Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.
Camilli, G., & Hopkins, K. D. (1978). Applicability of chi-square to 2 x 2 contingency tables with small expected cell frequencies. Psychological Bulletin, 85, 163-167.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross-classification. Journal of the American Statistical Association, 49, 732-764.
Haberman, S. J. (1978). Analysis of qualitative data: Vol 1 Introductory topics, New York: Academic Press.
Hinkle, D. E., Wiersma, W., & Jurs, S. G. (1994). Applied statistics for the behavioral sciences (3rd ed.). Boston: Houghton Mifflin.
Nie, N. H., Hull, C. H., Jenkins, J. G., Steinbrenner, K., Bent, D. H. (1975). SPSS: Statistical package for the social sciences (2nd ed.). New York: McGraw-Hill.
©Lee A. Becker, 1997, 1998 -revised 09/24/99