GLM for
Single-Factor, Between-Subjects Designs

Reading: SPSS Base 9.0 User's Guide, Chapter 20, GLM Univariate Analysis
Homework: GLM_1way
Download: glm_1way.sav        (Download Tips)

  1. Overview
  2. The Data
  3. Testing the Assumptions
  4. Running GLM: Univariate
  5. The Basic Output
  6. Post-hoc Tests
  7. References

1. Overview

Single-factor experiments have one independent variable. SPSS has several procedures that will analyze single-factor experiments: means, oneway, and the general linear model (GLM). The means and and oneway procedures are limited to analyzing single-factor, between-subjects designs. The GLM procedure can analyze nearly any analysis of variance design including: designs with more than one independent variable, unequal n designs, and repeated measures designs.  For that reason we will will focus on using the GLM procedures for all analysis of variance (ANOVA) problems.

The GLM procedure can providing you with:
(a) a test of the homogeneity of cell variances,
(b) a test of trend,
(c) multiple comparison (post-hoc) tests between the means, and
(d) planned (a priori) comparisons between means.

What doesn't it do?
(a) It does not test the assumption of normal distributions within each cell of the design. You still need to run explore to test the normality assumption.

top


2. The Data

The data for this set of notes come from a study by Friedman, Harper, Becker, Wilson, and Tinker (1997). They hypothesized (a) that children who have experienced psychological trauma will display symptoms that are similar to children who suffer from attention-deficit/hyperactivity disorder (ADHD) and (b) that the ADHD and trauma group scores would be higher than those of the control group. Three groups of children and their parents were recruited from the community. One group was recruited to participate in a posttraumatic stress disorder (PTSD) treatment outcome study (n = 16). A second group was recruited to participate in a study of children who suffer from ADHD (n = 14). The third group, the control group, suffered from neither PTSD or ADHD (n = 22). Friedman et al. looked at the hyperactivity and attention scales of the Behavior Assessment System for Children (BASC) - Parent Rating Form (Reynolds & Kamphaus, 1992). The hyperactivity scores are reported as T scores. T scores have a mean of 50 and a standard deviation of 10. As the name implies, the ratings of hyperactivity and attention were made by the parent of the child. We will be looking at the results of the BASC hyperactivity scale for this set of notes. The variables in the data file, oneway.sav, are shown in Table 1.

Table 1. Variables in the oneway.sav Data File
Variable Variable Label / Value Labels / Missing Values
id  
group Group /
  1 Control
  2 ADHD
  3 trauma
hyperact BASC Hyperactivity T score

top


3. Testing the Assumptions

The assumptions for a single-factor, between-subjects ANOVA design are shown in Table 2.

Table 2. Oneway ANOVA Assumptions
1. Are the observations in each of the groups (or cells) independent ?
2. The scale of measurement for the dependent variable is at least interval.]
3. The shapes of the distributions in each of the groups are symmetric.
4. The distributions in each of the groups are homogeneous.

Assumption 1 (independence). In this example the three groups are made up of separate individuals so the data are independent.

Assumption 2 (scale of measurement). The scale of measurement for the BASC hyperactivity scale is interval.

Assumption 3 (normality). It is assumed that the distributions in each of the groups are normal. The analysis of variance is robust if each of the distributions are symmetric or if all the distributions are skewed in the same direction. This assumption can be tested by running normality tests that are available in the explore procedure. Open the Explore dialog box and add the BASC hyperactivity score to the Dependent List and the Group variable to the Factor List. In the Statistics... dialog box make sure that Descriptives is checked so that the skewness and kurtosis scores will be displayed.  In the Plots... dialog box check Normality plots with tests to display the Kolmogorov-Smirnov and Shapiro-Wilk normality tests. 

The normality statistics are shown in Table 3. The Shapiro-Wilk normality tests indicate that the scores are normally distributed in each of the three conditions. The Kolmogorov-Statistic is significant for the control group, but that statistic is more appropriate for larger sample sizes.

Table 3. Tests of Normality

Kolmogorov-Smirnov(a) Shapiro-Wilk

Independent variable group Statistic df Sig. Statistic df Sig.
BASC hyperactivity T score control .206 22 .016 .912 22 .055
ADHD .166 14 .200(*) .940 14 .442
trauma .151 16 .200(*) .900 16 .084
* This is a lower bound of the true significance.
a Lilliefors Significance Correction

The skewness statistics are shown in Table 4. The skewness and kurtosis scores indicate that the scores in the ADHD and trauma conditions are normally distributed. There is some positive skewness in the control condition. 

Table 4. Means, Skewness, and Kurtosis Statistics

Independent variable group Statistic Std. Error
BASC hyperactivity T score control Mean 43.82 2.20
Skewness .973 .491
Kurtosis .341 .953
ADHD Mean 60.14 2.71
Skewness -.235 .597
Kurtosis -1.066 1.154
trauma Mean 64.75 3.61
Skewness -.407 .564
Kurtosis -1.289 1.091

 

4. Assumption 4 (homogeneity).  The variances in each of the cells should be the same. Levene's test for homogeneity of variances can be displayed from either the GLM procedure or from Explore.  The Explore procedure provides additional information about transformations that could be used on the data if the variances are not homogeneous.  

Boxplots will give a visual comparison of the variances (and central tendency) across the groups in the design.  They will be displayed when you check the Boxplots: Factor levels together option in the Plots... dialog box.  The boxplots are presented in Figure 1.

Figure 1. Boxplots for the control, ADHD, and trauma groups.

Visual inspection of the box lengths indicates that the trauma group has greater variability than the control group, we will want to check the Levene statistics to see if there is are significant differences in variability between the three groups. The differences between the medians of the three groups suggest that the analysis of variance will support the hypotheses.

In Explore, the Levene test is displayed by checking one of the Spread vs Level with Levene Test:  options found in the Plots... dialog box.  

Spread vs. Level Options:
   -None
will suppress the test of homogeneity (the default).
   -Power estimation will produce plots of the natural log of the interquartile range (the spread) on the y-axis with the natural log of the median (the level) on the x-axis. If you just beginning to explore the homogeneity issue for your study then this is a good starting point. The plot will provide you with an a "power" estimate that can be used to transform the data to make the variances more homogenous. 
   -Transformed will transform the data prior to plotting the data and running the statistics. You will need to select a transformation from the list of options that will appear in the Power: box. You can use the information provided by the power estimation plots to select the transformation to apply to the data.
  -Untransformed will displays plots using the raw data plotting the interquartile range (the spread) on the y-axis and the median (the level) on the x-axis. A "power" estimate is not provided if you check this option. 

The Levene statistics as displayed by the Explore procedure are shown in Table 5. Recall that the Levene statistic is a test of the null hypothesis that the variances are homogeneous. The variances are homogeneous, as shown by the nonsignificant Levene statistic (based on the mean), Levene (2, 49) = 2.284, p = .113. 

Table 5. Test of Homogeneity of Variance

Levene Statistic df1 df2 Sig.
BASC hyperactivity T score Based on Mean 2.284 2 49 .113
Based on Median 1.687 2 49 .196
Based on Median and with adjusted df 1.687 2 43.838 .197
Based on trimmed mean 2.223 2 49 .119

If the variances had not been homogeneous, then you would look at the spread vs. level plot and consider applying a transformation to the data in an attempt to reduce the problem.  Figure 2 shows the spread vs. level plot for the untransformed scores. The natural log (LN) of the interquartile range (spread) is plotted on the on the vertical axis and the the natural log (LN) of the median (level) is plotted on the horizontal axis.. 

If there is a linear relationship between the spread and level, that is, if the slope is not equal to zero, then a power transformation can be used to make the variances more homogeneous.  In this case the BASC variable would be transformed by raising it to the power of -.270 (Power for transformation = -.270) using the following compute command ---

COMPUTE trbasc = hyperact**-.270 .
EXECUTE.

Note that the transformation is applied to the entire set of BASC scores, not just to the scores for offending cell. After applying that transformation the Levene statistic (based on the mean)  was lowered from 2.284 to 0.917 with a corresponding change in the significance level from p = .113 to p = .406.  If the variances had not been homogeneous, this change may have been enough to render the transformed data homogeneous. Of course, in this example the original variances were homogeneous, so the power transformation would not be applied to the data.

top


4. Running GLM: Univariate

We will start out by looking at the basic output for a single-factor, between-subjects design.  The basic output includes the ANOVA source table (the F test), the means and standard deviations, and a plot of the means. If the main effect for group is significant then we will want to run a post hoc test to determine which of the means are different from each other.

The GLM procedure for this one-factor, between-subjects design is selected by clicking

Analyze
   General Linear Model
         Univariate ... 

The variables in the Data Editor are shown in the box at the left. Move hyperact to the Dependent List: box and group to the Fixed Factor(s): box. 

Options...  

The options dialog box is divided into three sections. The top section, Estimated Marginal Means, is used to select which means are displayed in the output.  The means to be displayed are selected by moving an effect from Factor(s) and Factor Interactions box to the Display Means for: box.  In this example, if you moved the group effect to the Display Means for: box, then the group main effect means would be displayed. 

The middle section, display, has a number of options.  At this point we are most interested in the descriptive statistics option. Descriptive statistics will display the means, ns and standard deviations for each cell in the design.  The Homogeneity tests option will display the Levene test of homogeneity, the same test that we looked at in using the Explore procedure.  The Spread vs. level plot will produce two plots: the standard deviation vs. the mean and the variance vs. the mean. The Explore procedure plotted the interquartile range and the median. The Spread vs. level plots provided by the GLM procedure do not give the slope or the power for the transformation. You will need to run the Explore procedure to obtain those values.  We will omit the homogeneity tests because we have already looked at them in the Explore output. 

The bottom section of the Options... dialog box allows you to set the significance level and the corresponding confidence intervals.  By default the significance level is set at .05 and the confidence intervals are set at 95%. 

Lets select the Descriptives option.

Post Hoc... 

The Post Hoc... dialog box provides a list of post-hoc tests for comparing the observed means.  

Plots...

The Plots... dialog box allows you to display simple plots of the effects. If you want to see a plot of the means for the group main effect, move group to the horizontal axis and then add it to the list of plots. 

Contrasts... 

The Contrasts... dialog box allows you to test a particular contrast (e.g., polynomial contrast) for an effect.  We will look at contrasts later in this set of notes.

Model...

The Model... dialog box allows you to specify which effects are to be included in the model. The default is a full factorial where all the effects and interactions are included in the model.  You can also specify the Sums of Squares type to be used when computing the ANOVA, and whether or not the intercept is to be included in the model. 

After you have finished selecting the various options, click the OK button to run the ANOVA.

top


5. The Basic Output

The basic output from GLM: Univariate  includes descriptive statistics (shown in Table 6), the ANOVA statistics (shown in Table 7), and a plot of the three means (shown in Figure 3).

The descriptive statistics are displayed for each cell and for the combined scores, see Table 6.  Descriptive statistics include the mean, standard deviation, and number of cases for each cell in the design, as well as the overall (total) mean, standard deviation, and number of cases.

The ANOVA statistics are shown in Table 7. 

The corrected model, with 2 df, is the overall model. It includes the variance due to all of the effects in the design.  In this study there is only one effect, the main effect for group. So the sum of squares for the corrected model is equal to the sum of squares for the group main effect. If the design included two factors, A and B, then the corrected model would include the sums of squares for the two main effects and the interaction. In psychology it is unusual to report statistics for the corrected model. We are typically interested in the main effect and interaction effects rather than the model as a whole.

There is an R Squared associated with the corrected model, see note a. The R squared is the amount of variance in the dependent variable that is accounted for by the corrected model. In this case the main effects of group accounts for 41% of the variance in the scores. The R squared for the particular sample will always be larger than the R square for the population from which the sample was drawn. R squared takes advantage of chance variation in the sample that will not be present in the population as a whole. The Adjusted R Square is an estimate of the predictability of the model in the population as a whole. It is always smaller then R squared. How much smaller it is than R Squared depends upon the number of variables in the model and the sample size. The smaller the sample size, holding constant the number of variables, the larger the correction. The larger the number of variables in the model, holding sample size constant, the larger the correction. In this case the model is expected to account for 38% of the variance in the dependent variable in the general population. The adjusted R squared is sometimes called the shrunken R squared.

The intercept term in this ANOVA is a test of whether the grand mean is different from zero. Because all the dependent variable scores are positive the grand mean is necessarily different from zero. Therefore the test of the intercept is not of interest to us.

The source identified as GROUP is the main effect of group (control vs ADHD vs. PTSD). It is the effect of interest in this study.  It indicates that there are significant differences between the three means, F(2, 49) = 16.94, p < .0005.  The question now becomes which of those means are significantly different from each other. 

The source identified as Error (with 49 degrees of freedom) is the within-cells error term.  The mean square for the Error term is used to test the Group main effect.  The mean square for this term is used as the denominator of the F test.

The sums of squares for the total includes the intercept, the main effect, the interaction, and the error term. If there are equal n's in each cell of the design, then the total sums of squares will exactly equal the sums of squares for each of those terms.

The sums of squares for the corrected total is the sum of the sums of squares for the corrected model and the error terms.

Profile Plots

The profile plot for the group main effect is shown in Figure 3. 

Figure 3. Mean BASC hyperactivity T scores for the participants diagnosed as ADHD, PTSD, and a control group without either diagnosis.

The mean plots displayed by the one-way ANOVA procedure are difficult to interpret because they do not include error bars.  A more informative presentation can be made using the Graph option.

Graphs
    Interactive
         Error Bar

Move the BASC T score to the y-axis and the Group variable to the x-axis. At the bottom of the window select Confidence Interval for the Mean in the Error Bars Represent box. Next, go to the error bar folder and select the shape and direction of the desired error bars. 

The graph is shown in Figure 4.  The confidence intervals help to show that the means for the ADHD and Trauma groups are not significantly different from each other because the 95% C. I. of one mean includes the other mean.  The control group is significantly smaller than the mean of the other two groups because the 95 % C. I. for the control group does include the means of either the ADHD group or the trauma group.

Figure 4. Mean BASC hyperactivity T scores, ns, and 95% confidence intervals for the participants diagnosed as ADHD, PTSD, and a control group without either diagnosis.

top


6. Post Hoc Tests

The GLM procedure provides a wide array of possibilities for running post hoc tests. 

Click on Post Hoc...  

Move the group main effect from the Factor(s) window to the Post hoc test for window. The post hoc options at the bottom on the dialog box become available.  The options are divided into two main groupings: Equal Variances Assumed (14 tests) and Equal Variances Not Assumed (4 tests).

The post hoc tests that are available are shown in Table 8. Range test are used to identify subsets of means that do not differ from each other. Pairwise tests compare the differences between each pair of means.

(Note: SPSS Advanced Models 9.0 has a description of these tests on pages 334-337)

Table 8. Post Hoc Tests Available in GLM: Univariate
Test Name of Test / Comments Range
Test
Pairwise
Test
Equal Variances Assumed
LSD Least Significant Difference /
Equivalent to running simple t tests for each pair of means. The alpha level is not controlled.
  Yes
Bonferroni Assumes that you wish to test all possible pairs of means. The actual alpha level is the significance level defined in the dialog box divided by the number of possible pairs of means (called C, the number of comparisons).

corrected alpha = alpha/C

For example, if there are three means then there are three possible pairs of means (1 v 2, 1 v 3, and 2 v 3). If the significance level in the dialog box is set at .05, then the Bonferroni corrected alpha level is

corrected alpha = .05/3 = .01667

  Yes
Sidak Assumes that you wish to test all possible pairs of means.  The Sidak formula is:

corrected alpha = 1 - (1 - alpha)1/C

where C is the number of comparisons.

For example, if there are three means then there are three possible pairs of means (1 v 2, 1 v 3, and 2 v 3).  If the significance level in the dialog box is set at .05, then the Sidak corrected alpha level is

corrected alpha = 1 - (1 - .05)1/3
                        = 1 - (.95).3333
                         = 1 - (.983048)
                         = .01695

The Sidak procedure has slightly more power than the Bonferroni procedure when alpha = .05.  When alpha = .01 the two procedures are nearly identical. 

  Yes
Scheffe Assumes you wish to test all possible pairs and all possible combinations of means.  Note that this is a very conservative test. For example, if you have three means, then there are six possible comparisons (1 v 2, 1 v 3, 2 v 3, 1+2 v 3, 1+3 v 2, and 2+3 v 1). Do not use the Scheffe method if you do not intend to test all possible pairs and all possible combinations of means.

For three means and an original alpha of .05, the corrected alpha level would be

corrected alpha = .05/6 = .0083

Scheffe is exact for unequal group sizes.

  Yes
R-E-G-W F Ryan-Einot-Gabriel-Welsch F Yes  
R-E-G-W Q Ryan-Einot-Gabriel-Welsch range test Yes  
S-N-K Student-Newman-Keuls.   This is a stepwise test for ordered means where the alpha level depends upon the number of "steps apart" each of the means are from each other.   Yes  
Tukey Honest Significant Difference (HSD).  Yes Yes
Tukey's-b Tukey's alternative procedure. This is a stepwise test for ordered means there the alpha (see S-N-K).  It uses the average of the Tukey and the S-N-K procedure at each step.   Yes  
Duncan Multiple Range Test.  This is a stepwise test for ordered means (see S-N-K).  Yes  
Hochberg's GT2   Yes Yes
Gabriel   Yes Yes
Waller-Duncan   Yes  
Dunnett Tests a control mean against all other means.   Yes
Equal Variances Not Assumed
Tamhane's T2 Uses the Welch procedure for determining degrees of freedom for the SE of the contrast. Uses Student's t distribution.  Uses the Sidak procedure to find the alpha level.  Slightly more conservative that the Games-Howell procedure.   Appropriate when variances are unequal or when variances and group sizes are unequal.   Yes
Dunnett's T3     Yes
Games-Howell Uses the Welch procedure for determining degrees of freedom for the SE of the contrast. Uses the studentized range distribution. Appropriate when variances are unequal or when variances and group sizes are unequal.   Yes
Dunnett's C     Yes

Most graduate level statistics spend several pages discussing alternative post-hoc tests. It is up to you to to understand the post hoc test that you have chosen to use. You should not pick a test at random, nor pick the one based on the results of the tests.

As a general rule, if you want to make all possible pairwise comparisons between means then many statistics books recommend either the Tukey HSD test or the Fisher protected least significance difference test, which is also known as the Bonferroni corrected test, or the Dunn procedure.

Let's look at the Tukey HSD post hoc test. The HSD is both a range test and a pairwise test. The output from the HSD pairwise test is shown in Table 9, the output from the HSD range test is shown in Table 10.

Table 9. Multiple Comparisons (Pairwise)
Dependent Variable: BASC hyperactivity T score
Tukey HSD

Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval
(I) Group (J) Group
Lower Bound Upper Bound
control ADHD -16.32(*) 3.999 .000 -25.99 -6.66
trauma -20.93(*) 3.843 .000 -30.22 -11.64
ADHD control 16.32(*) 3.999 .000 6.66 25.99
trauma -4.61 4.281 .533 -14.95 5.74
trauma control 20.93(*) 3.843 .000 11.64 30.22
ADHD 4.61 4.281 .533 -5.74 14.95
Based on observed means.
* The mean difference is significant at the .05 level.

In Table 9 the mean of each group is compared to the mean of each of the other groups. For example the mean of the control group is compared to the mean of the ADHD group and to the mean of the trauma group. This makes for some redundancy in the table. For example the comparison between the control group and the ADHD group is the same as the comparison between the ADHD and the control group

Lets look at the first row in Table 9. The mean of the control group (Group I) is 43.82. The mean of the ADHD group (Group J) is 60.14. The mean difference (I - J) is -16.32(*). The asterisk indicates that the mean difference is significant at the .05 level (see the asterisk note at the bottom of the table). The standard error of the difference is found using the formula

where MSerror is the within cells error term from the analysis of variance, ni is the number of cases in group i, and nj is the number of cases in group j. For the control vs. ADHD comparison

                 

                 = 3.999

Table 10. Range Tests
BASC hyperactivity T score

Tukey HSDa,b,c

N Subset
Group 1 2
control 22 43.82  
ADHD 14   60.14
trauma 16   64.75
Sig.   1.000 .495
Means for groups in homogeneous subsets are displayed.
Based on Type III Sums of Squares
The error term is Means Square(Error) = 136.816
a Uses Harmonic Mean Sample Size = 16.724
b The group sizes are unequal. The harmonic mean of the group sizes is used. 
Type I error levels are not guaranteed.
c. Alpha = .05

The range tests in Table 10 identify subsets of means that do not differ from each other. There are two subsets for the hyperactivity scores. Subset 1 contains only the mean for the control group. The control group mean is significantly different from the other two means because the other two means are not a part of subset 1. Subset 2 contains the means for the ADHD and the trauma groups indicating that those two means are not significantly different from each other.

In this unequal n study the confidence intervals for the paired comparisons were different for each comparison because the standard error was a function of the differing cell ns. In order to compute homogeneous subsets you need to have a common confidence interval for each of the comparisons. The harmonic mean is used as the number of cases in each group rather than the group ns. The harmonic mean sample size is found by the following formula

Nh = p / (1/N(1) + 1/N(2) + ... + 1/N(p)) 

where p is the number of cells in the ONEWAY analysis.

The harmonic mean of the ns for all cells in this analysis would be

Nh = 3/(1/22 + 1/14 + 1/16)
   = 16.724

top


7. References

Friedman, M. C., Harper, M. L., Becker, L. A., Wilson, S. A., & Tinker, R. H. (1997, November). A comparison of attention deficit/hyperactivity disorder and posttraumatic stress disorder symptomatology in children. Poster presented at the annual meeting of the International Society for Traumatic Stress Studies. November, 1997. Montreal, Canada.

top


ŠLee A. Becker, 1997-1999  - revised 11/09/99