00101. How to Read a Freefield, ASCII Data File

Reading: SPSS Base 8.0, Chapter 3, Data Files
                SPSS Base 8.0: Chapter 9, Working with Command Syntax
Homework: Read the Skills Survey Data
Download: skills98.dat        (Download Tips)

  1. Overview
  2. Give the name of the ASCII freefield file.
  3. Define the Variable Names and Variable Types.
  4. Read in the Data
  5. Enter Labels and Missing Values.
  6. Go to the Data Editor and Verify the Values
  7. Save the Data File.
  8. Save the Freefield Definitions in a Syntax File.
  9. Common Freefield Errors

1. Overview

The purpose of this set of notes is describe how to create and read a freefield, ASCII data file. If you have not read section 00011. Overview of ASCII Data Files you should do so now. The steps are as follows:

Give the name of the ASCII freefield file. Define the variable names and variable types. Read in the data file. Enter Labels and Missing Values. Go to the Data Editor and verify the values Save the Data File

 

Save the freefield definitions in a syntax file

 

On the first day of this course you filled out the a Survey of Skills and Knowledge. The data from that survey was stored as an ASCII data file. The purpose of this set of notes is to describe how to read that data file into SPSS. Table 1 shows the Codebook for the Skills Survey, Table 2 shows some representative data from your responses to the survey.

Table 1. Codebook for the 1998 Skills Survey
Name Variable Type Variable Label / Value Labels
ID Numeric 2.0 Case Identification Number
Demographic Information
  Numeric 1.0 Undergraduate Major /
   1 = Psychology
   2 = Sociology
   3 = Anthropology
   4 = Exercise Science
   5 = Guidance and Counseling
   6 = Communications
   7 = Business
   8 = Biology
   9 = Other
  String 20 Other Major
   n/a = not applicable, Other Major not checked
  Date (mm/dd/yyyy) Today's Date (mm/dd/yyyy)
  Numeric 1.0 Are you taking the course for: /
  1 = undergraduate credit; 2 = graduate credit; 3 = no credit
  Numeric 1.0 Do you have a computer at home?/ 0 = No; 1 = Yes
  Numeric 1.0 Do you have a modem?/ 0 = No; 1 = Yes
  String 20 What is your favorite word processing package?
Statistical Knowledge (all items use the same value labels)
  Numeric 1.0 Knowledge of frequency distributions /
   1 = I know nothing about this area
   2 = I have a limited understanding of this area
   3 = I understand this area reasonably well
   4 = I consider myself to be an expert in this area
  Numeric 1.0 Knowledge of chi square
  Numeric 1.0 Knowledge of correlations
  Numeric 1.0 Knowledge of t test
  Numeric 1.0 Knowledge of simple anova
  Numeric 1.0 Knowledge of repeated measures anova
  Numeric 1.0 Knowledge of multiple regression
  Numeric 1.0 Knowledge of factor analysis
Computer Knowledge and Skills (same value labels as for Statistical Knowledge)
  Numeric 1.0 UNIX commands
  Numeric 1.0 EMACS commands
  Numeric 1.0 MS-DOS commands
  Numeric 1.0 word processing on a PC
  Numeric 1.0 programming language (e.g, BASIC, PASCAL, C+)
  Numeric 1.0 how to use Windows 95
  Numeric 1.0 how to use spreadsheets
  Numeric 1.0 how to run SPSS on a mainframe (e.g., on a Unix system)
  Numeric 1.0 how to run SPSS for Windows

 

Table 2. Example of Freefield ASCII Data from the
Skills Survey
01,01,"n/a",09/9/1998,1,1,1,"Microsoft Word"
4,3,4,4,3,2,2,3
3,1,2,4,1,3,2,2,2
02,01,"n/a",09/9/1998,1,0,0,"WORKGROUPS"
4,2,2,3,3,2,2,2
3,2,2,4,1,4,1,2,2
03,01,"fine arts",09/9/1998,1,1,1,"Microsoft word"
3,2,3,3,2,2,1,1
1,1,1,3,1,3,3,1,2
04,01,"n/a",09/9/1998,1,1,1,"Window's 95"
3,2,3,2,2,1,2,1
1,1,1,3,1,3,2,1,2
05,02,"n/a",09/8/1998,1,1,1,"microsoft word"
2,2,2,2,2,1,2,1
2,1,1,3,1,2,2,2,1

Here are a couple of things to be careful of when writing a program to output data from a Form, as in this example, or from a computer program used to collect experimental data. First, write the program so that all string values are enclosed in quotes. Note that the values for the two string variables (other major and favorite word processing package) are enclosed in quotes. As a general rule any string value that might include embedded blanks must be enclosed in quotes.

Second, the forms program entered "n/a" (non applicable) as the default value for the Other Major variable. A freefield datafile must have a value entered for every variable. For that reason a default value, "n/a", was entered if the person did not fill out the text box for Other Major.

Assume that the data was entered exactly as shown in Table 2. Would it be possible to read that data as a fixed-column ASCII data file? Why or why not?

You should be able to read and understand the data in Table 2 by knowing the information from the Codebook in Table 1. How knowledgeable about correlations is the person whose ID # is 3?

 


2. Give the Name of the ASCII Freefield File.

File
   Read ASCII Data
         Freefield
            File:
      (browse until you find your freefield file, then open it)


3. Define the Variable Names and Variable Types.

 Name:    (name the first variable in the data file)
   Data Type (define the data type as numeric or string, if string also give the width of the string)
      Add (click Add to enter the variable name and type)

Recycle through name and data type until you have defined all the variables in the data file in the order in which they occur for each case.

The only data types allowed in freefield data are numeric and string. The skills data has two string variables (Other Major and Favorite Word Processor) enclosed in quotes. The remaining variables, except date, are clearly numeric. What data type should be assigned to the date variable in this data set?


4. Read in the Data File.

After entering all the variable names click OK. SPSS will create a new data file in the Data Editor and try to enter all the values into that new data file. If SPSS finds any errors when entering the data into the new data file it will open the output navigator and describe the errors it encountered.


5. Enter Labels and Missing Values.

Use the Define Variable dialog box in the Data Editor to enter variable labels, value labels, and missing values. (See 00010. Entering Data Using the SPSS Data Editor.)

Default numeric data type. All freefield numeric variables are defined as data type F8.2 by default. You may wish to change the data types. For example you may want to redefine the number of decimals as "0" for all integers.

Date variables. The date variable in this example was read as a string variable. You can change the data type to "date" in the Define Variable dialog box.


6. Go to the Data Editor and Verify the Values.

Whenever you read in a file created outside of SPSS you should go to the SPSS Data Editor and visually scan the data to make sure that everything seems to have been entered correctly. Be sure to check out the ID values to make sure they are all present and that they are within the range of values that are reasonable for your study. If something is amiss in the way that values were assigned to variables it is likely to show up in the ID variable. You often can tell where the error occurred by checking the point at which the ID values become out of sequence.

top


7. Save the Data File

Save the new data as an SPSS system file by clicking

File
   Save

(See the Data Editor notes on Saving Files for additional information)

top


8. Save the Freefield Definitions in a Syntax File

When you exit SPSS the Freefield Definitions (the variable names, the order of the variables, and the data types) will be lost. If you want to read the same freefield ASCII data file again you will have to open the Define Freefield Variables dialog window and redefine all the variables and data types again.

If you think you might ever want to read the freefield ASCII file again you should save the commands the SPSS used to open and read the file. You can do this by going back to the Define Freefield Variables dialog window by clicking

File
   Read ASCII Data
         Freefield

All your work should still be there. Click on the

Paste

button and the commands that SPSS created will be displayed in a syntax window. The syntax window is like a word processor window. You could edit the any of the commands in the window. You should save the commands as an SPSS command file. The default extension for SPSS command files is *.sps. To save the commands in the syntax window click

File
   Save

select the disk or subdirectory in which to save the file and enter a filename. SPSS will automatically add the .sps extension.

How to use the syntax file to read the freefield ASCII data

In order to use this syntax file to read your freefield ASCII data file you just open the syntax file

File
   Open

Select File of type: Syntax (*.sps) and double click on the filename you wish to open. The syntax file will open in the SPSS Syntax Editor window. Make any necessary changes to the commands and then click

Run
   All

to read the data into the SPSS Editor. Note that all the numeric variables have been assigned the default data type of F8.2. Also note that there are no variable labels or value labels. You can open the Define Variable box for each variable and redefine the variables. Or, better yet, you can use the dictionary of your previously saved systems file to define the variable labels, value labels, and missing values.

How to use a previously saved SPSS (*.sav) file to define the variables

Suppose that you have defined the variable names, the order of the variables, and the data types for a freefield data file and have successfully read the data in to the SPSS Data Editor. You still have to define the variable labels, value labels, missing values, refine numeric variables for which the default F8.2 data type is not appropriate, and redefine the date variables read in strings to a date data type. You can use the variable definition information from a previously saved SPSS (*.sav) file to define the variables in the current file. Click

File
   Apply Data Dictionary...

then double click on the name of the previously saved SPSS (*.sav) file (or single click the name and then click the Open button). The variable definition information from the previously saved file will be applied to the variables in the current Data Editor.

The new variable definition will apply to those variables that have exactly the same name in both files. The new variable definition will not be applied if the data types are different, e.g., if the variable DATE is numeric in the previously saved file but is string in the Data Editor, then the variable definition will not be applied.

top


9. Common Freefield Errors

Here are some common errors that occur when reading freefield data.

Error Type

Probable Warning

1. An extra value was entered or a value was inadvertently omitted.

When you have freefield data this error can occur while reading the last case. I occurs when SPSS finds that the there are fewer values than expected for that variable. This is a serious error, it indicates that somewhere in your data file you have misentered a value. Perhaps you forgot to enter a value for a case. Perhaps you entered an extra value somewhere.

-unexpected end of file
>Warning # 522
>An unexpected end of file has been found in the middle of reading a case. The partial case will be ignored. Check your input for a possible missing record
2. Attempted to use system missing values (a blank). -unexpected end of file
>Warning # 522

3. Missing values were appropriately entered as user defined missing values, but they were not identified when the variables were defined in the Data Editor.

No warning or error given
4. String values with embedded commas or blanks were not enclosed in quotes.

Suppose that a) you have a variable called STATE, b) that value for STATE in the data file is New Mexico (without quotation marks), and c) that the variable following STATE was a numeric variable. SPSS would read New as the value for STATE and try to read the string value Mexico as the value for the next (numeric) variable. The warning is printed because string values are not valid numeric values.

String variables in fixed-column format ASCII files do not need to be enclosed in quotes. Any string information found in the columns assigned to the string variable will be valid.

-invalid numeric field
>>Warning # 1102
>An invalid numeric field has been found. The result has been set to the system-missing value.

>Command line: n Current case: n Current splitfile group: n
>Field contents: 'xxxxxxx'
>Record number: n Starting column: n Record length: n

and/or

-unexpected end of file
>Warning # 522

5. The letter "O" was used in place of a zero (0) in a numeric variable. If a variable contains any letters it is considered to be a string variable. This would cause a numeric variable to be set to system missing.

This error can also occur when reading fixed-column ASCII data files.

-invalid numeric field
>Warning # 1102
6. This error occurs when a string value is longer than the width of the string variable. Suppose you define the length of a string variable as 20 characters wide. If the freefield string variable was 45 characters long then only the first 20 characters would be read and warning #1115 would be given.

This may or may not be a serious error. You need to decide if the remaining characters provide useful information. If so, the you should redefine the width of your string variable.

This error does not occur with fixed-column ASCII data.

-input text too long
>Warning # 1115
>The input text was too long. It has been truncated.

>Command line: n Current case: n Current splitfile group: 1
>Field contents: 'xxxxxxxxx xxxxxxxx xxxxxx xxxxxxxxx'
>Record number: nn Starting column: n Record length: nn

top


Homework: Read the Skills Survey Data

Value: 15 points
Due: At the beginning of the next class

The purpose of this homework is to create an SPSS data file from a freefield, ASCII data file and to save the file on a floppy disk.

The data from the Survey of Skills and Knowledge that you completed the first day of class is available as a freefield, ASCII data file. The name of the file is skills98.dat.

A codebook for the Skills Survey is shown in Table 1 above. The order of the variables in the data file correspond to the order of the variables in the codebook.

Note: there are no user-defined missing values in this data file.

1. Use the codebook and the freefield data file skills98.dat to create an SPSS data file called skills98.sav. Save your data file on your own floppy disk. You will need to make up your own variable names for each of the variables. Make sure you include variable labels for all variables except ID. Make sure you include value labels for all variables except OTHER MAJOR, TODAY'S DATE, and FAVORITE WORD PROCESSOR.

You should have no errors or warnings when you read the data into SPSS. If you do have errors, correct them prior to handing in your data file. Check the data in the Data Editor to make sure that they seem reasonable, given what you know about the data.

2. Save the freefield definitions to a syntax file called skills98.sps.

3. Label your disk with your name. Turn in your disk with the skills98.sav and skills98.sps files at the beginning of class.

Point distribution:

skills98.sav - 10 point

3 - all variables from skills98.dat are included in skills98.sav
3 - all values from skills98.dat are correct
2 - variable labels all ok
2 - value labels all ok

skills98.sps - 5 points

When we read skills98.dat using your file skills98.sps, there should be only one warning given by SPSS. That warning is the following:

>Warning # 1115
>The input text was too long. It has been truncated.

>Command line: 23 Current case: 14 Current splitfile group: 1
>Field contents: 'Microsoft Works for Windows'
>Record number: 40 Starting column: 31 Record length: 58

-1 point for each additional error or warning.
-5 points if any variable is missing or the order of the variables is incorrect.

 

top


ŠLee A. Becker, 1997, 1998      -revised 07/07/99