Data Structures for SPSS.

Reading: SPSS Base 9.0 User's Guide, Chapter 3, Data Files
Activities: Run the SPSS tutorials "Overview" and "Getting Data: Basic Structure of an SPSS data file"
Homework: none

I. Overview

II. The Layout of an SPSS Data File
     A. Cases
     B. Variables
     C. Values


I. Overview


Create or read a data file

Select the procedure and the variables
Run the procedure



Interpret the output
Publish your paper

top


II. The Layout of an SPSS Data File

  Variables




Cases



Values

Most data files are rectangular in shape. They have three components:

1. Cases (typically the rows of the rectangular file). Cases are the individual participants in your study.

2. Variables (typically the columns of the file). Here are some examples of variables -

3. Values (the intersection of cases and variables).

Here is an example of some data from a survey about attitudes towards the death penalty. In this study information was collected about age, gender, and attitude towards the death penalty (the variables) for each of 4 research participants (the cases). Attitude towards the death penalty was measured on a 6-point continuum with the following labels: strongly opposed, opposed, slightly opposed, slightly approve, approve, and strongly approve.

Table 1. Original Data
Participant Age Gender Death Penalty
Jones, W. 25 Male strongly opposed
Anderson, S   Female slightly opposed
Perez, C. 18 Female  
Smith, L. 41 Male approve

The values that are entered into a data file are typically numeric (numeric values contain only numbers) rather than alphanumeric (alphanumeric or 'string' values contain letters, or combinations of letters and numbers, rather than only numbers). One reason for using numeric values rather than alphanumeric values is that many SPSS procedures will only accept numeric values. For example, an analysis of variance can only be run using numeric values. Although some SPSS procedures will accept either numeric or alphanumeric values (e.g., frequencies) numeric values can be used by a wider range of procedures. Another reason to use numeric values is that it is easier to enter single digit to refer to a value (e.g., 1) than it is to enter a whole series of letters (e.g., strongly opposed).

In this example the values for Age are numeric , and the values for Gender and Death Penalty are alphanumeric. The alphanumeric values should be coded as numeric values prior to entering them into a raw data file. Any numeric values could be used to code the nominal variable of Gender. Lets, arbitrarily, decide to code females as "1" and males as "2". Death penalty can be considered to be an interval variable, and the values should range from 1 to 6. Strongly opposed could be coded as either "1" or "6". Suppose the responses to the death penalty scale were coded as follows: 1 = strongly opposed; 2 = opposed; 3 = slightly opposed; 4 = slightly approve; 5 = approve; and 6 = strongly approve. The values 1 through 6 would be entered into the data file. The codes for those values (e.g., strongly opposed) are called the value labels.

The APA ethical guidelines stipulate that the data collected from research participants are to be confidential. Rather entering the names of the participants into the raw data file you should create an ID variable and number each of the participants. Making those changes the data now look like this:

Table 2. Data with Assigned Values
ID Age Gender Death Penalty
001 25 2 1
002   1 3
003 18 1  
004 41 2 5

You have probably noticed that the age value is missing for participant #002. and that the death penalty value is missing for participant #003. When you are entering data you can leave those values blank. SPSS will consider them to be system missing values and correctly handle them when running analysis. There are other options for how to enter missing values and SPSS offers several ways of dealing with missing values in each of its procedures. We will have much more to say about this topic in later sections.

In order to run a statistical analysis of your data you first need to create a data file. You can create a data file using the Data Editor within SPSS or you can create a data file using your favorite word processor, using some spreadsheet programs (e.g., EXCEL, LOTUS 1-2-3, and others that use the SYLK format), or using some database programs (e.g., dBase). See SPSS Base, Chapter 3, Data Files for more information on how to read a data file that was not created by the SPSS data editor.

How to create a data file using the SPSS Data Editor is described in Entering Data Using the SPSS Data Editor. How to create a data file using a word processor is described in Overview of ASCII Data Files

top


ŠLee A. Becker, 1998, 1999  -revised 08/20/99