Delimited ASCII Data Files

Reading: SPSS Base 9.0 User's Guide, Chapter 3: Data Files
Homework: Read a Delimited ASCII Data File
Download: delimited1.dat      Download Tips
                  delimited2.dat
                   delimited3.dat

  1. Overview
  2. Delimited ASCII Format
  3. How to Read a Delimited Datafile into SPSS
  4. Completing the Data Definition Process
  5. Another Example of a Delimited Datafile
  6. How to Use the Syntax File to Read the ASCII Data.
  7. How to Use the TextWizard Predefined Format file to Read the ASCII Data 



1. Overview

This set of notes describes the delimited format datafile and how to read it into SPSS.


2. Delimited ASCII Format

In delimited ASCII format the variables for each case must appear in the same order and the values for each variable must be separated by a delimiter.  A delimiter can be a comma, space, tab, semicolon, or some other symbol chosen by the user.  The data that we used in the fixed-column notes for id, firstnam, age, and gender (1 = "Female" 2 = "Male", and 9 = "No gender information) are redisplayed in Table 1.

Table 1. Data for a Fixed-Column ASCII Datafile
01Martha      18 1
02            53 9
03Suzanne     10 1
04Debbie         1
07Fernandez   21 2

Those same data are shown in a delimited format in Table 2. The data from Table 2 are stored in the file delimited1.dat.  The example in Table 2 uses a combination of commas and spaces as delimiters.  The first row of data uses a space as a delimiter.  The second row of data uses a comma as a delimiter. Commas and spaces are randomly used in the remainder of the file.  A typical delimited datafile will only use a single delimiter rather than a combination of delimiters as shown here. Notice that the file is not fixed-column format because the values for a particular variable do not always fall in the same column for every case.

Table 2. Delimited Format - Example 1 (delimited1.dat)
01 Martha 18 1
02,,53,9
03,Suzanne 10 1
04 Debbie  1
07 Fernandez,21,2

Variable types. A wide array of data types are available in delimited ASCII format including: numeric, several date types, string, dollar, comma, and dot. These are the same data types that are available in fixed-column ASCII data file.

Missing values. Notice how the missing value for the variable firstnam for the second case is handled.  An extra delimiter is used to "stand in" for the missing value. The rule is that consecutive delimiters indicate missing values. If both firstnam and age were missing the values for the second case would look like this:

02,,,9

The missing value for age in case #4 is indicated by consecutive blanks. The use of blanks as a delimiter may pose problems because consecutive blanks are hard to visually detect in a datafile.

String variables. String variables can be used in delimited format data files but they pose an interesting problem. Suppose that you wanted to enter a city and state, e.g.,

Colorado Springs, CO

If you defined both a space and a comma as delimiters, then Colorado Springs would be read as two different variables rather than a single variable (e.g., city). CO would be appropriately read as another variable.  In freefield format this problem would have been fixed by surrounding the two city word with quotation marks, as in  "Colorado Springs" .  This solution will not work with delimited formatted data files.  The only way that I have found to get around this problem is to avoid using a space as a delimiter when you have string values that include spaces as a part of the value.

The example in Table 2 is somewhat restricted in that each case begins on a new record. The same data are shown in Table 3. That data is also a properly formed delimited data file although this chaotic structure is not recommended if you are creating the datafile with a word processor.  The carriage return at the end of a line will be read as delimiter by default.

Table 3. Delimited Format - Example 2 (delimited2.dat)
01 Martha
18 1
02,,53,9
03
Suzanne 10
1
04 Debbie  1
07 Fernandez
21,2

The example in Table 4 shows the same data in another properly formatted delimited data file. Each value is on a separate line.  Missing values are indicated by a blank row.  Note that the row should be totally empty.  A space on a blank line would indicate two missing values rather than one missing value if a space were defined as a delimiter.  When reading this form of a delimited datafile do not define any delimiters.  The default carriage return delimiter will work just fine. This case is pertinent because the data file created by the Skills Survey web form is of this type.

Table 4. Delimited Format - Example 3 (delimited3.dat)
******************
01
Martha
18
1
******************
02

53
9
******************
03
Suzanne
10
1
******************
04
Debbie
1
******************
07
Fernandez
21
2

The asterisks are automatically entered by the web form program.  They are used to visually separate one case from the next case. The row of asterisks would be interpreted as an extra variable. When reading in this type of file you would declare "Do not import" as the data format for that variable.

top


3. How to Read a Delimited Datafile into SPSS

Lets try to read the data from the file delimited1.dat. Open the data file in SPSS with the following sequence of commands:

File
   Read Text Data

Step 1 -

Does your data match a predefined format?

The answer here is no.  A predefined format means that you have previously stored the formatting information in a file somewhere.  Press NEXT to go to Step 2.

Step 2 -

The first question in Step 2 asks if the data file is delimited by specific characters or is in fixed column format.  Select delimited format.

The second question asks whether or not the variable names are included at the top of the file. In our dataset the first row of data is the first case, not a list of variable names, so select no.

Step 3 -

The first case begins on line 1.

All the data for a case are presented on one line, so check the option: Each line represents a case. Reading the files for examples 2 (delimited2.dat) or 3 (delimited3.dat) would require you to enter the number of variables per case.

We want to read (import)  all the cases.

Step 4 -

This step defines the delimiters used in the datafile. The data in example1 (delimited1.dat) used both a space and a comma as delimiters.  Both of those should be checked.

Look at what happens to the data when you do not check either a space or a comma as a delimiter. You can tell if you have chosen the right combination by looking at the pattern of the data.  Are all the id numbers lined up under the first variable, are all the names lined up under the second variable, etc.  If the values do not line up properly, then you have not selected the correct delimiters, or the data file has not correctly used delimiters.

Check the values in the spreadsheet with the from Table 1 to make sure that the data has been read correctly.

Click NEXT to go to the next step.

Step 5 -

The next step is to enter the variable names and the data format for each variable. Click on V1 and enter "id" as the variable name and make sure that the variable format is numeric.  You cannot choose the width of the variable at this time, only the basic format of the variable (numeric, string, dollar, etc.).

Note that there is an option of not importing a variable.  If you do not wish to include a particular variable within the datafile you are creating, then click "Do not import" as the data format.

Go ahead and enter the names and formats for the other three variables (firstnam, string; age, numeric; and gender, numeric).  Notice that you are asked for the length of the string when you enter a string variable.

When you have finished, click Next.

Step 6

If at any time in the future you might want to read the text file again, then you should answer yes to either of the questions in this step. For example, you may be in the middle of the process of collecting data from your web page or computer program and you will want to reread the text data file after more data has been collected.

There are two tools for making the rereading process easier. The first saves all the information you just entered in a file called a "TextWizard Predefined Format".  The default extension of this type of file is ".tpf."   Just to see how this works click yes in response to the question "Would you like save this file format for future use."  Then press the Save As button, select the directory to save the file and enter a file name.  I recommend using the same file name as the original ASCII file.  In this case "delimited1."  The extension ".tpf" will be appended automatically.

The other tool is to create a syntax file out of the information that you entered.  To try this out click yes  in response to the question "Would you like to paste the syntax?"

Press Finish to continue the process.

A syntax window will open.  The syntax commands will look similar to those in the next table.

data list list(',', ' ')
file='C:\temp\delimited1.dat.' 
/ id(f8.2) firstnam(a9) age(f8.2) gender(f8.2).
EXECUTE .

The SPSS command "data list" is used to read a text file.  The data list command defines:

(a) the data file type as "list."  In a "list" type datafile there is only one record per case. That is, all the variables for a case appear on a single line of dat.  The symbols in parentheses define the two delimiters, the comma (",") and the space(' ').  The delimiter definitions are delimited by a comma.
(b) the location and name of the ASCII file (file='C:\temp\data\delimited1.dat' );and
(c) the names of each of the variables, and the data type

/ id(f8.2) firstnam(a9) age(f8.2) gender(f8.2).

For example, the variable id is the first variable, its data type is numeric8.2 (f8.2). The variable firstname is the second variable, its data type is string9 (a9).  The variable age is the third variable, its data type is numeric8.2 (f8.2). The fourth variable is gender, its data type is numeric8.2 (f8.2). The data list command uses the fortran formatting notation for identifying data types.  The fortran format "f" refers to a numeric variable; the fortran format "a" stands for a string (or alphanumeric, or string) variable. Numeric variables are set to the default numeric format, 8.2.

The EXECUTE. command will cause the data to be read the entered in to the Data Editor.

Save the syntax commands.  The reason for creating the syntax commands is to use them in the future.  To do that you will need to save the commands as a syntax file. I recommend saving the syntax commands in a file that has the same name as the ASCII file, e.g., "delimited1" in this example.  SPSS will automatically assign the extension ".sps" to a syntax file.

Run the syntax commands. At this point no data have been transferred to the SPSS Data Editor. Run all the syntax commands to read the data into the Data Editor.

top


4. Completing the Data Definition Process

Running the syntax file will read the data into the Data Editor.  Go back to the Data Editor and check to make sure that the data has been correctly read.

Complete the data definition process by adding variable labels, value labels, and user defined missing values where appropriate.

top


5. Another Example of a Delimited Datafile

Lets look at the issues raised by trying to read the delimited datafile delimited3.dat.

Open the data file in SPSS with the following sequence of commands:

File
   Read Text Data

Step 1 -

Does your data match a predefined format?

The answer here is no.  We have not previously stored the formatting information for the delimited3.dat datafile. Press NEXT to go to Step 2.

Step 2 -

The first question in Step 2 asks if the data file is delimited by specific characters or is in fixed column format.  Select Fixed width format.   

There are some interesting issues raised by the choice of Fixed Width rather than Delimited.  In what sense is this data fixed width?  Well, each variable always begins in the same column (column 1) of a particular line of data. What about the strings? They are not all the same width. You can set the width of the string variable so that it is wide enough to encompass the longest response.  The width of the string does not impact the fixed placement of the next variable because the next variable always begins in column 1 of the next line of data.

If you choose Delimited as the data arrangement, then the Text Wizard will not read the data correctly if you have any missing data.  I think this is a bug in the program.  It looks as though the data is read correctly in the Text Wizard data preview spreadsheet, but the data are not correctly read into the Data Editor itself. 

The second question asks whether or not the variable names are included at the top of the file. In our dataset the first row of data is the first case, not a list of variable names, so select no.

Step 3 -

The first case begins on line 1.

There are five lines of data for each case.  One line for each of the four variables (ID, firstname, age, and gender) and one additional line for the row of asterisks that separate the data for each case. .

We want to read (import)  all the cases.

Step 4 -

This step defines the breakpoints for each individual variable. 

The breakpoints for numeric variables are straightforward. For example, if you go to Line within case: 2 line of 5, you will find a vertical line (the breakpoint) at the beginning of the row.

You need to be careful about the breakpoints for string variables. The two main concerns are:

(1)  The default width of the string variable is defined by the length of the text for the first case. In our example the name in the first case is "Martha", which is 6 letters long.  The width of the string variable is automatically set to String6.  The remaining names are trunkated at 6 letters.  For example, the name for the third case, "Suzanne", is trunkated to "Suzann" . 

There are a couple of solutions to this problem. One solution is to paste and edit the resulting syntax commands, making the width of the firstname variable long enough to read the longest name in the datafile.  Another solution would be to insert another breakline beyond the longest string in the datafile.  Perhaps at column 15 for the data in example 3. That would create another string variable to be inserted in the datafile after the firstnam variable.  You could then select Do not Import as the data format for the extra variable. 

(2) Strings with embedded blanks may be read as two variables. For example, if you entered the name and first initial (e.g., Martha S.) for the first case, then the text wizard would set a delimiter at column 0 and another one just before the "S", creating two string variables.  I think this only happens when the first case has embedded blanks. The solution to the problem is to delete the second breakline.  You may still have the problem that the width of the variable is defined by the length of the string for the first case.  But, as discussed above, you can solve that problem.

Step 5 -

The next step is to enter the variable names and the data format for each variable.

Select Do Not Import as the data format for the first variable (V1,  the asterisks). This option could be used for any variable in the datafile. For example, if you only want to run analyses on a subset of the variables in the original datafile, then select Do Not Import for the variables that you do not want to be read by the Data Editor.

Define the variable names and data types for the remaining variables in the normal manner.

When you have finished, click Next.

Step 6

Lets look at the syntax file that was created for delimited3.dat.
 
 

Press Finish to continue the process.

A syntax window will open.  The syntax commands will look similar to those in the next table.

data list free()
file='C:\temp\delimited3_dat.'
 /#1(a) id(f8.2) firstnam(a9) age(f8.2) gender(f8.2).
EXECUTE .

The SPSS command "data list" is used to read a text file.  The data list command defines:

(a) the data file type as "free"  In a "free" type datafile the data for each case can appear on more than one record. The variables must appear in the same order for each case.  If we had defined any delimiters, they would have appeared between the two parentheses following the keyword "free."
(b) the location and name of the ASCII file ( file='C:\temp\delimited3.dat' ); and
(c) the names of each of the variables, and the data type

/#1 id(f8.2) firstnam(a9) age(f8.2) gender(f8.2).

The "#1" symbol after the slash indicates that the values for the first case begin on the first line (record) of the datafile.

You can make with width of firstnam variable longer.  For example, you might want to make the varible type (a15) rather than (a9) to make sure that it reads the longest name in the datafile.

The EXECUTE. command will cause the data to be read the entered in to the Data Editor.

Save the syntax commands.  If you save the syntax commands I recommend that you use the file name delimited3.sps. Remember that SPSS will automatically assign the extension ".sps" to a syntax file.

Run the syntax commands. At this point no data have been transferred to the SPSS Data Editor. Run all the syntax commands to read the data into the Data Editor.

After running the data go to the SPSS Data Editor and complete the data definition process by adding variable and value labels and assign user defined missing values for gender.

Check the values in the SPSS Data Editor to make sure that the data has been read correctly.

Try reading delimited2.dat into the Data Editor.

top


6. How to Use the Syntax File to Read the ASCII Data.


To use the syntax file to read the ASCII data
(a) Open the SPSS data editor.
(b) Click File Open... .Find and open the syntax file that you want to use, e.g., fixed.sps. By default, SPSS assumes that you
want to open an SPSS systems file to it limits the display of files to those with a .sav extension. Change the extension to .sps to
display syntax files.
(c) Run the syntax commands.

The commands should run assuming that you have not moved the ASCII data file. That is, the ASCII data file must reside in the
directory that is identified in the name=' part of the data list command. You may have to edit the directory name if the
location of the ASCII file has been changed.

top


7. How to Use the TextWizard Predefined Format file to Read the ASCII Data


To use the TextWizard Predefined Format file to read the ASCII data
(a) Open the SPSS data editor.
(b) Click File Read Text Data .
(c) Find the ASCII data file and open it.
(d) Answer yes to question "Does your text match a predefined format."; Click Browse to find the .tpf file and open it.

At this point you can either check each of the steps in the Text Wizard, or you can press Finish to go skip to the last step of the
process.

top


© Lee A. Becker, 1999.  All rights reserved. Last revised on 07/13/00 .