00011. Overview of ASCII Data Files

Reading: SPSS Base 8.0 User's Guide, Chapter 3, Data Files
Activities: Run the SPSS tutorial "Reading ASCII Data"
Homework: None

  1. Introduction
  2. Fixed-Column ASCII Format
  3. Freefield ASCII Format

1. Introduction

ASCII is the abbreviation for American Standard Code for Information Interchange. See

An ASCII data file contains just the values of variables stored in what is called ASCII (American Standard Code for Information Interchange) format. An ASCII file is distinct from a normal word processing file in that the later contains formatting information such as font sizes, margin information, header and footer information and so forth. An ASCII data file is distinct from an SPSS systems file in that the systems file contains both the values of the variables and the variable definition information.

ASCII data files are sometimes called raw data files because they contain just the data. That is, no variable definition information is included in a raw data file.

You can create an ASCII data file using the text or DOS text save options in your word processor. Computer programs used to collect experimental data often store the data they collect in ASCII files.

The general structure of an ASCII data file that you create using a word processor, spreadsheet program, or a form program like the one that created the Skills Survey data is the same as the data file that is created by the SPSS Data Editor. A data file contains cases, variables, and values.

Table 1. The Structure of a Data File
  Variables


Cases

Values

See also: 00001. Data Structures for SPSS.

ASCII data files can be either fixed column format or freefield format.

top


2. Fixed-Column ASCII Format

Fixed column format means that the values for a variable are always located in the same column. Four variables, ID, FIRSTNAME, AGE, and GENDER, are shown in Table 2. ID is always located in columns 1-2, FIRSTNAME is always located in columns 3-12, AGE is always located in columns 15-16 and GENDER is always located in column 18 (1 = "Female"; 2 = "Male"; 9 = "No Gender Information"). In this example the case with ID = 03 is a 10 year old female whose firstname is Suzanne.

Table 2. Fixed Column Format
01Martha      18 1
02            53 9
03Suzanne     10 1
04Debbie         1
07Fernandez   21 2

The values can be right next to each other ( e.g., ID and FIRSTNAME) or they can be separated by one or more spaces(e.g., AGE and GENDER). The rule is that the values for a variable must always be located in the same column.

Variable types. A wide array of data types are available in fixed column ASCII format including: numeric, several date types, string, and dollar.

Missing values. Missing values can be either system missing or user-defined missing when you use use a fixed-column ASCII data file. A system missing value is defined by a set of blanks across the entire field. For example, the age for the case with ID = 04 is system missing. The gender for the case with ID = 02 is user missing. When you define your variables you would need to specifically assign 9 as missing for the gender variable.

String variables. String variables can be used in fixed-column format data files. By default SPSS expects the values of string variables to be left justified, as they are in this example.

top


3. Freefield ASCII Format

In freefield format the variables for each case must appear in the same order and the values for each variable must be separated by one or more spaces or commas. A space or comma is called a common delimiter. The same data for ID, FIRSTNAME, AGE, and GENDER are shown in freefield format in Table 3. The data in Table 3 look strange because of the irregular spacing and the use of multiple commas, but each value meets the rules that the values must be separated by one or more spaces or commas and the variables are entered in the same order for each case.

Table 3. Freefield Format - Example 1
01 Martha 18 1
   02  " "  53         9
03,,,,,,Suzanne,,,,,10 , , 1 
04,Debbie,-99,1
07,   Fernandez    21    2 

The example in Table 3 is somewhat restricted in that each case begins on a new record. The same data are shown in Table 4. That data is also a properly formed freefield data file although this chaotic structure is not recommended.

How does SPSS read this mess? First you need to define the number and the order of the of variables in the data file. In this example there are four variables in the following order: ID, FIRSTNAME, AGE, and GENDER. This specifies that the first four values encountered will be ID, FIRSTNAME, AGE, and GENDER for the first case, that the next four values will be ID, FIRSTNAME, AGE, and GENDER for the second case, and so forth. For example, when it gets to the fifth value it knows that it is the value for ID for the second case, and that the ninth value is the value for ID for the third case.

Table 4. Freefield Format - Example 2
01 Martha 18 1 02 "  " 53 9 03,Suzanne 10
 1  04    Debbie
-99,1 07,Fernandez,21
2 

Variable types. Only numeric and string variables are allowed in freefield format ASCII data files.

Missing values. It is easy make mistakes with freefield format, especially if you have missing data. You cannot use system missing values when you enter data in freefield format. A system missing value is normally defined as a set of blank spaces. But blanks are common delimiters in freefield format. If you left a set of blanks for the missing age for the case with ID = 04, then the next value SPSS reads (1) would be considered to be the AGE for that case, the next value (07) would be considered to be the gender for that case. Every value after the improperly used 'system missing' value would be incorrectly read! The same problem occurs if you forget to enter a value. Everything after the nonentered value would be misread.

If you have missing values and you are using freefield format then you must use user defined missing values. Enter the value to be defined as missing in the data file and then make sure that those values are identified as missing when you define the variables. In Tables 3 and 4 the value -99 was entered when AGE was missing and 9 was entered when GENDER was missing.

String variables. Special care needs to be taken when string variables are used in freefield ASCII format. If the value of a string variable includes blanks (e.g., John Anderson, or Microsoft Word) then the blank, a common delimiter, will cause each element of the string to be read as a separate variable. In addition, if a comma is used as a part of the string value (e.g., Denver, CO) then the comma, a common delimiter, will cause each element to be read as a separate variable. For example, SPSS will read
Colorado Springs, CO
as the values for three different variables.

One solution is to create different variables whenever possible. For example, create firstname and lastname variables instead if a single name variable, or create city and state variables rather than a single variable for place of residence.

A more general solution is to place quotation marks around string variables when freefield ASCII format is used. For example the string values
"Colorado Springs","Colorado"
will be read as two variables.

A string value that is all blanks must be enclosed in quotes, e.g., the value of the firstname for the 2nd case.

top


©1997,1998-Lee A. Becker
-revised 07/07/99