FREQUENCIES

Reading: SPSS Base 9.0: Chapter 11, Frequencies
                                        : Chapter 7, Working with Output
                                        : Chapter 8, Draft Viewer
Homework: Frequencies
Datafile: Use the skills99.sav data file you created or 
                Download:  skills99.sav           Download tips

  1. Overview
  2. Select the Frequencies Procedure
  3. Select One or More Variables
    -Run the Procedure and Interpret the Output
  4. Statistics Options
  5. Chart Options
  6. Format Options
  7. Saving the syntax commands
  8. Save or Print the Output
  9. Save the Output in a Word 95/97 File

1. Overview

The FREQUENCIES and DESCRIPTIVE procedures have two uses. One important use is to make another check on whether you have entered the data correctly. You should make it a habit to run FREQUENCIES on all your variables before you start running other statistics or tests of your hypotheses. Carefully examine each variable looking for values that may have been entered by mistake. Check for out-of-range values. For example, are there 6's in a variable which should only have the values between 1 and 5? Check to make sure that data make sense. If you know that about 60% of your subjects were females and the percent females turns out to be only 20% then you know you have a problem in the date set or in the DATA DEFINITION.

A second important use is to provide descriptive statistics. Descriptive statistics are often used to describe the demographic characteristics of the sample. What percent of males and females are there? What was the range of ages and what was the median age? Survey data is often reported in terms of frequencies.

We have already used the FREQUENCIES procedure. Now we get to look at all the various possible options and statistics that are available within the FREQUENCIES procedure.

The output created by running the FREQUENCIES procedure is saved in the Viewer window. We will discuss how to manipulate data in the Viewer window, how to save data in the Output Navigator window, and how to print data from that window.

The data that we will be using is your Skills Survey data, skills99.sav. The discussion below assumes that the file skills99.sav has been opened in the SPSS Data Editor.

As always, you should open an SPSS window and follow along by actually running the SPSS procedures.

The general strategy for running any SPSS procedure is as follows:

Select the Procedure Select the Variables Select the Options
Run the Procedure Interpret the Output Save or Print the Output

top


2. Select the Frequencies Procedure

The frequencies dialog box is opened by clicking

Analyze
      Descriptive Statistics
            Frequencies

The dialog box shows the variable labels for each of the variables in the data set in the left box. The variables are organized according to their order in the active file.  The label for the variable is displayed rather than the 8-character name of the variable.  Variables that are preceded by the "#" symbol (within a diamond shape) are numeric variables.  Variables that are preceded by the "A>" symbol (within a box) are string variables. You can see additional information about a variable by right clicking a variable and then selecting the Variable Information option. The variables to be analyzed will be moved to the empty box to the right.

Note: You have the option of displaying the names of the variables either in the order that they appear in the file, or alphabetically according to the name of the variable.  You also can choose to display either the variable names or the variable labels.  These options can be changed by in the SPSS Options dialog box. 

Edit (in the top row of bttons)
     Options...
          General (Tab)
               See the Variable Lists section

The Statistics..., Charts..., and output Format... options are accessed by the buttons at the bottom of the dialog window. The buttons to the far right will run the procedure (OK), Paste the commands into the syntax window, Reset the variables to be analyzed, Cancel the frequencies procedure, and provide Help. You can turn the frequencies tables output on or off by checking the Display frequencies tables option.


3. Select One or More Variables

You can select a variable to be analyzed by double clicking the variable.   The variable will move from the left pane to the Variables: pane on the right.  Double clicking a variable in the Variables: pane will unselect the variable.  You could also single click on a variable to highlight it and then press the right arrow button (located between the two panes) to move it to the Variables: pane.

Selecting Multiple Variables

Multiple variables can be selected in one of three ways.

a) Selecting another variable and moving it to the Variable(s): box by double clicking the variable or by highlighting the variable and pressing the right arrow button.

b) You can press and holding the ctrl key while selecting (highlighting) a set of variables and then move the entire set to the Variable(s): box. For example you could select all the knowledge of statistics items for analysis by holding down the ctrl key while selecting the variables for knowledge of analysis of variance (ANOVA), chi square, correlation, factor analysis, frequency distribution, multiple regression, repeated measures ANOVA, and t test and then press the right arrow key to move the entire set to the Variable(s): box.

c) If the variables you wish to select are contiguous to each other you can pressing and holding the shift key while selecting a set of variables. For example you could select all the variables for analysis by clicking the variable at the top of the list, pressing and holding the shift key, and then clicking the variable at the bottom of the list. Or you could hold down the shift key while moving the cursor down the variable list. 

If you wanted to run frequencies on nearly all the variables you could use the shift key to select all the variables and then use the ctrl key to unselect the unwanted variables. For example, suppose that you wanted to run frequencies on all the variables except ID. First, use the shift key to select all the variables and then press and hold the cltr key while you click on the ID variable.

Run the Procedure

Once a variable is selected the OK and Paste buttons become available. Make sure the Display frequency tables option is checked and then press the OK button to run the frequencies procedure on the selected variable.

Interpret the Output

The results are shown in an SPSS Viewer window.. The left pane of the viewer window contains an outline of the information, the right pane contains the data.

The output is divided into four sections: title, notes, a statistics summary for all the variables selected, and frequencies tables for each of the selected variables. Each element in the output is an "object" that can be edited, saved, and or printed.

Title. The default title is the name of the procedure that was run. In this case the title is Frequencies. You edit the title by moving the cursor to the title and then double clicking the left mouse button. If you selected all the statistics questions from the skills survey you could change the title to, say, Frequencies: Knowledge of Statistics.

Notes. The notes section contains information about when the statistics were run, some information about the data file, how missing values were handled, the syntax commands that were generated, and the resources used. The notes section is normally closed (the icon next to "Notes"  is a closed book rather than an open book). You can open (display) the notes section by double clicking on the closed book icon in front of "Notes." Double click on the open book icon to close the Notes section.

Statistics. The statistics section includes statistics for all the selected variables. The default statistics include the number of valid cases and the number of missing cases for each variable analyzed.

Frequencies tables. The final section includes frequencies tables for each of the selected variables. The first column of the table lists the value labels for the valid values followed by the missing values and then the total number cases. Value labels for each non-null value (i.e., for each value that has a frequency of at least 1). The second column list the frequencies for each value. The third column lists the percent of cases for each value. This percent is based on the total number of cases. the fourth column lists the valid percent of cases for each value. The valid percent is based on the number of valid cases. If there are no missing values then the percent and valid percent columns will be identical. The last column lists the cumulative percent. The cumulative percent is based on the valid percent column.

top


4. Statistics Options

Click the Statistics... button at the bottom of the Frequencies dialog box to display the statistics that are available.

The statistics are grouped into four sections: measures of central tendency (mean, median, mode, and sum); measures of dispersion (standard deviation, variance, range, minimum, maximum, and standard error of the mean); measures of distribution (skewness and kurtosis), and percentile values.

Measures of central tendency (mean, median, mode, and sum). You should all recognize these central tendency measures.

Measures of dispersion (standard deviation, variance, range, minimum, maximum, and standard error of the mean). These measures are also very common. The standard error of the mean is found by dividing the standard deviation by the square root of the number of valid cases.

Measures of distribution (skewness and kurtosis). The terms skewness and kurtosis refer to distribution shapes that deviate from the shape of a normal distribution.

A skewed distribution is characterized as by a tail off towards the high end of the scale (a positive skew) or towards the low end of the scale (a negative skew). If the distribution has no skewness, then the skewness statistic will be zero. If the distribution has a positive skew, then the skewness statistic will be positive. If the distribution has a negative skew then the skewness statistic will be negative.

A distribution with kurtosis is characterized by the distribution being to narrow and peaked (a leptokurtic distribution) or too wide and flat (a platykurtic distribution). Again, if there is no kurtosis, the kurtosis statistic will be zero. If the distribution is leptokurtic, then the kurtosis statistic will be positive. If the distribution is platykurtic, then the kurtosis statistic will be negative.

A normal distribution has both no skewness and no kurtosis. As for any statistic, the actual values of the skewness and kurtosis statistics rarely turn out to be exactly zero. That is, if you randomly sampled a set of values from a population that was perfectly normal, it is unlikely that the skewness and kurtosis statistics would both be equal to zero. The question becomes, are the skewness or kurtosis scores so different from zero that we have to reject the hypothesis that they represent a normal distribution. We do this by setting up a 95% confidence interval (C.I.) around the skewness score and another 95% confidence interval around the kurtosis score. If the 95% confidence interval includes the value zero then we cannot reject the hypothesis that the distribution has no skewness (or no kurtosis).

The 95% confidence intervals are defined as

95% C.I. = skewness statistic ± 1.96 * (standard error of skewness)

and

95% C. I. = kurtosis statistic ± 1.96 * (standard error of kurtosis).

For example, suppose the skewness statistic for the knowledge of correlations question was -.339 and the standard error of skewness was .388. Is the distribution for the correlation question negatively skewed? The 95% confidence interval is found as follows:

95% C. I. = skewness statistic ± 1.96 * (standard error of skewness)
                = -.339   ± 1.96 * .388
                = -.339   ± 0.761
                = (-.339 - 0.761) to (-.339 + 0.761)
                = -1.100 to 0.422

A graphic representation of the 95% confidence interval for this skewness value is shown in Figure 1.

Figure 1. 95% Confidence Interval for the Skewness Value

The 95% confidence interval ranges from -1.100 (through zero) to 0.422. Because the 95% confidence interval includes zero we say that there is no evidence to reject the hypothesis that the distribution is not skewed. Or more simply, the distribution is not skewed.

Further suppose that the kurtosis statistic for the correlation question was .705 and that the standard error of kurtosis was .759. Is the distribution leptokurtic? The 95% confidence interval is found as follows:

95% C. I. = kurtosis statistic ± 1.96 * (standard error of kurtosis)
                = .705   ± 1.96 * .759
                = .705   ± 1.488
                = (.705 - 1.488) to (.705 + 1.448)
                = -0.783 to 2.193

A graphic representation of the 95% confidence interval for this kurtosis value is shown in Figure 2.

Figure 2. 95% Confidence Interval for the Kurtosis Value

The 95% confidence interval ranges from -0.783 (through zero) to 2.193. Because the 95% confidence interval includes zero we say that there is no evidence to reject the hypothesis that the distribution has no kurtosis. Or more simply, the distribution has no kurtosis. Because there no kurtosis and no skewness the correlation scores are said to be normally distributed.

The concept of a confidence interval is basic to understanding statistics. Confidence intervals are a standard part of the output of many SPSS procedures. Press this button if you would like a mini refresher course on confidence intervals

Percentile values. The percentile values option will print the values at a given percentage. If you select quartiles the scores at the 25th percentile, the 50th percentile (the median), and the 75th percentile will be given. You can choose to find the cut points that divide the scores into n equal groups. For example if you choose 5 equal groups, then the following scores will be given: 20th percentile, 40th percentile, 60th percentile, and the 80th percentile. The percentile(s) option allows you to select any given percentile score.

The percentile values options are useful if you want to define groups of participants. For example, if you wish to divide your participants into the top, middle, and bottom third on the basis of their IQ scores you could find the cut points by selecting the option to find the cut points that divides the group into three equal groups.

top


5. Chart Options

The chart options include bar charts, pie charts, histograms, and histograms with a normal curve. The values for the charts can be expressed as either frequencies or percentages.

Bar charts and pie charts are commonly used when you have categorical data such as gender or race. Any value that is empty (no one selected that value) is not included in these charts.

Histograms are commonly used when you have interval data such as age or IQ scores. Histograms show the all the values from the lowest to the highest scores. Empty values between the lowest and highest scores are not excluded for histograms.

You can ask to display a normal distribution curve on top of the histogram chart. The normal distribution displayed is the what the histogram should look like if the data were normally distribution. You can visually compare the histogram data to the superimposed normal curve to get a visual sense about whether or not your data is normally distributed. If it looks like there is a problem with the data you could then compute the 95% confidence intervals for skewness and kurtosis to see if the shape is statistically different from a normal distribution.

top


6. Format Options

    Order by. The format options refer to how the frequencies tables are formatted. The values for a variable are normally listed ascending order of the values themselves. For example, if you displayed a frequencies table for the ID variable, the values would begin with ID = 1 and end with the highest ID value. You can also have the values listed in descending order. In the ID example the values would begin with the highest ID value and end with ID = 1.

The other two options allow you to order the output according to the counts for each value. Ordering by counts is very useful in some contexts. Suppose that you are doing a survey for one of the fast food chains in the area. One of the question is "When you think of fast food restaurants, which one comes to your mind first?" You then get a bunch of answers including Wendy's, Burger King, McDonalds, etc. It would be very helpful to have the FREQUENCIES for this question ordered by the descending counts.

    Multiple Variables.  The Compare variables option will display a statistics table that includes all the selected variables. Separate frequencies tables will be displayed for each variable even though the Compare variables option has been selected. The Organize output by variables option will display the statistics table and then the frequencies table for each of the selected variables.

   Display frequency tables. This format option allows you to suppress the printing of frequencies tables based on the number of categories. This is typically used to suppress the printing for variables with a large number of categories. For example you may not wish to print out frequencies tables for the age variable, preferring instead to display a histogram of those values.

top


7. Saving the syntax commands

You can save the syntax commands that SPSS creates by clicking the Paste button on the right side of the Frequencies dialog box. The syntax editor window will open displaying all the commands for the current settings. You can save the commands in the normal way from the syntax editor window. Saving the commands will allow you to easily run the same set of commands again without having to select all the options from the various dialog boxes.

top


8. Save or Print the Output

The output displayed in the Output Navigator window can be printed or saved to a disk.

Print the output. Select the output to be printed by highlighting elements in the outline pane. The highlighted elements with the open book icon will be saved. You can open or close the book on an individual element by double clicking on the book icon at the front of each element.

You can preview how the document will look when printed by selecting

File
   Print Preview

Save the output as an *.spo file. If you press the disk icon or

File
   Save

then the highlighted elements in the outline pane will be saved. As with the print option, only those highlighted elements with the open book icon will be saved. The default file extension is *.spo (SPss Output). The *.spo file can be opened only in the SPSS Viewer window. Use the

File
   Open

sequence in SPSS to open the *.spo file. You cannot open an *.spo file in your favorite word processor.

The Save As option gives you the option of saving the file as a Navigator file (*.spo) or as "all files (*.*). However, saving the output navigator file with the *.doc extension still will not allow the file to be opened successfully with MS Word or with MS WordPad.

Export the output as an .html or .txt file. Another possibility is to Export one or more objects. The export option saves objects in either .htm or .txt format. The .htm format can be used in web pages or read by MS Word 95.  The .txt format can be read by any ASCII editor.  You could insert either an .htm or .txt file into MS Word 97 but I do not recommend inserting a .txt file into Word 97.  Table boundaries in .txt files are identified by dashes (----) and bars (|) rather than by solid lines and the .txt tables are cumbersome to edit.  Use the

File
   Export

option or right click on an object and select Export to open the Export Output dialog box.  The default file name is OUTPUT, you will probably want to change the name to reflect the content that is being exported. Select the file type (*.htm or *.txt) as desired. When you have completed selecting the desired options press OK to export the file.

top


9. Save the Output in an MS Word File

You can save an SPSS Viewer object directly into a word processor such as MS Word by (a) right clicking the object, (b) press the copy option, (c) open an MS Word file and then paste the object into the open file. The object is created as a normal table in Word.  You can edit the elements of the table within Word.  (Note: this option does not work with objects created in SPSS 7.5.)

It is also possible to use the copy objects command rather than the copy command to copy and paste an object from the SPSS Viewer to a Word file.  The object is created as an object in Word.  Although it is possible to edit the contents of an object in WS Word, I have found the process to be tedious and unreliable. It is easier to do all your editing in the output navigator prior to copying the objects to MS Word.

In summary, I recommend using copy and paste to move objects from the SPSS Viewer to an MS Word file.

top


©Lee A. Becker, 1997, 1998 -revised 09/21/99