Reading: SPSS Base 9.0: Chapter 5, Data Transformations
Homework: Compute
Download: compute2.sav (Download Tips)
![]()
SPSS provides very powerful tools for transforming both numeric and string variables. The numeric transformations available in SPSS include: COMPUTE, COUNT, DO REPEAT-END REPEAT, LEAVE, NUMERIC, RECODE, and TEMPORARY. This set of notes looks at the COMPUTE command.
This discussion may seem somewhat abstract to you now. But, as you come to do analyses on your data you will find that COMPUTE and RECODE are your friends. Get to know them well.
There are two important goals of this lecture: first, to describe how the COMPUTE transformations work, and second, to carefully describe how system missing and user missing values influence the outcomes of the various transformations. The effects of missing values are more subtle.
You use compute by creating an algebraic expression,
compute Target Variable = Numeric Expression.
The compute word at the beginning of the line is the SPSS syntax word that tells SPSS to evaluate the expression that follows. The Target Variable holds the result of the numeric expression. For example the compute statement
compute varx = var1 + var2
will sum the values of var1 and var2 and place the result in the target variable varx.
Numeric expression can include variable names, as in the above example and/or constants, which are simply numeric values. For example compute statement
compute varz = (var1 + var2)/2
will sum the values of var1 and var2, divide the sum by the numeric constant 2 and place the result in the target variable varz.
There are four different types of numeric expressions: arithmetic operations, arithmetic functions, statistical functions, and missing value functions. Examples of arithmetic operations include: add, subtract, multiply, divide, and exponentiation (raise to a power). Examples of arithmetic functions include: square root and truncate. Examples of statistical functions include: mean, sum, and standard deviation. Missing value functions determine how user- and system-missing values are handled.
| CAUTION: Whenever you transform your data you should get a listing showing the values of the variables that went into the transformation as well as the variables that are the result of the transformation. Check this listing carefully to make sure that SPSS is doing what you expected it to do. Look for errors in the logic that you used to create the transformation. Look to see how SPSS dealt with missing values. is it what you expected? Usually you will not have to check the values for every case, but check enough cases so that you feel confident that the transformation is working as you planned it to work. |
Suppose you have a scale which measures attitudes towards mental hospitals (AMH). The scale has four items. The data are saved in an SPSS systems file called compute2.sav. The variables in compute2.sav are shown in Table 1. There are 6 cases in the data file.
| Variable | Variable Label / Value Labels / Missing Values |
|---|---|
| age | Age / Missing Value = 99 |
| birthdat | Birthdate of the respondent (mm/dd/yyyy) |
| smonth | Month the survey was submitted |
| sday | Day the survey was submitted |
| syear | Year the survey was submitted |
| amh1 | The criminally insane should be in mental hospitals rather than prisons / 1 'strongly disagree' 2 'disagree' 3 'neutral' 4 'agree' 5 'strongly agree' 9 "I don't understand the question" / Missing Value = 9 |
| amh2 | Mental hospitals provide very valuable services in our culture/ Same value labels as amh1/ Missing Value = 9 |
| amh3 | Another similar question / Same value labels as amh1 / Missing Value = 9 |
| amh4 | Another similar question / Same value labels as amh2 / Missing Value = 9 |
The compute dialog box is opened by clicking
Transform
Compute
The dialog box shows the variable names in the data set in the box at the lower left. The Target Variable: box will hold the name of the target variable. The numeric expression will be created in the Numeric Expression box. Arithmetic and logic operators are presented as buttons. The box on the lower right presents all the possible arithmetic functions, statistical functions, and missing value functions.
The use of conditional statements (e.g. if) and logic operators (e.g., greater than, less than) will be presented in a later set of notes.
Suppose that you want to find overall attitude towards mental hospitals by averaging the scores to the items. Table 2 shows the syntax of four compute commands that are used to find the mean of the four attitude items (amh1, amh2, amh3 and amh4). The commands and the results are shown in Table 2. This would seem like a trivial problem, yet the four compute commands yield four different answers. Our task for this set of notes is to understand the different outcomes. We will return to this set of commands at the end our discussion of the compute command.
| Command Syntax | COMPUTE
avg1=(amh1+amh2+amh3+amh4)/4 . COMPUTE avg2=MEAN(amh1, amh2, amh3, amh4) . COMPUTE avg3=MEAN.2(amh1 TO amh4) . COMPUTE avg4=(VALUE(amh1)+VALUE(amh2)+VALUE(amh3)+VALUE(amh4))/4 . |
|---|---|
| Output | amh1 amh2 amh3 amh4 avg1 avg2 avg3 avg4 1 1 9 9 9 . 1.00 . 7.00 2 5 5 2 1 3.25 3.25 3.25 3.25 3 3 9 3 5 . 3.67 3.67 5.00 4 4 1 . 9 . 2.50 2.50 . 5 . . . . . . . . 6 9 4 2 9 . 3.00 3.00 6.00 |
In this section we will look at the arithmetic operations of add (+), subtract (-), multiply (*), and raise to a power (**). The focus will be on how missing values are handled by arithmetic operations and the order of arithmetic operations. You should open SPSS and run these transformations. The results of your transformations should look like those in the Output section of Table 3.
| Command Syntax |
COMPUTE MONTHS = AGE*12 . COMPUTE NEW1 = amh1 + amh2 . COMPUTE NEW2 = amh1 + amh2 / amh3 . COMPUTE NEW3 = (amh1+amh2)/2 . COMPUTE NEW4 = amh1+amh2/2 . COMPUTE NEW5 = amh1 * 0 . COMPUTE NEW6 = amh1**2 . COMPUTE NEW7 = amh1**-2 . FORMATS MONTHS (F4) NEW1 (F2) NEW2 TO NEW7 (F5.2). |
|---|---|
| Output | AGE amh1 amh2 amh3 amh4 MONTHS NEW1 NEW2 NEW3 NEW4 NEW5 NEW6 NEW7 1 18 1 9 9 9 216 . . . . .00 1.00 1.00 2 99 5 5 2 1 . 10 7.50 5.00 7.50 .00 25.00 .04 3 47 3 9 3 5 564 . . . . .00 9.00 .11 4 . 4 1 . 9 . 5 . 2.50 4.50 .00 16.00 .06 5 23 . . . . 276 . . . . .00 . . 6 65 9 4 2 9 780 . . . . .00 . . |
It is important to understand what happens when the transformation involves user- or system-missing values. The general rule is that "the result is system missing when there is not enough information to evaluate the transformation." For COMPUTE statements that use arithmetic operations the result is usually system-missing if any of the component values are either user-missing or system-missing. But look at the example for NEW5. Because 0 times anything is 0 the transformation can be evaluated and the result is 0 for all cases, even those with user- or system-missing values. The following expressions can be evaluated even if the value is missing:
| Expression | Explanation |
|---|---|
| 0*expression = 0 | zero times anything is 0 |
| 0/expression = 0 | zero divided by anything is zero |
| expression**0 = 1 | anything raised to a zero power is 1 |
| 0**expression = 0 | zero raised to any power is 0 |
| MOD(0,expression) = 0 | The MOD value 0 of any expression is 0 |
It is also important to understand the concept of the "order of operations" when performing transformations. Consider the following example 1 + 2 / 3 = ? If you first add 1 and 2 (1 + 2 = 3) and then divide by 3 the answer would be 1 (3 / 3 = 1). But, if you first divide 2 by 3 (2 / 3 = .67) and then add the 1 the answer would be 1.67 (.67 + 1 = 1.67).
You can specify the order in which arithmetic operations are to be performed by using parentheses. SPSS always performs operations within parentheses first. In the computation for NEW3 the two variables within the parentheses are first summed, and then the sum is divided by 2, the number of questions summed.
If you do not specify the order in which arithmetic are to be done then SPSS uses the following order:
| (1) functions (e.g., SQRT(Y2)); (2) exponentiation (e.g., Y2**2); (3) multiplication, division, and unary - ; and (4) addition and subtraction. |
Using these rules SPSS would have evaluated 1 + 2 / 3 as 1 + (2/3) resulting in the value of 1.67.
Looking at the COMPUTE examples in this section you find that computation of NEW4 is not the same as NEW3. In NEW4, the first thing that happens is that amh2 is divided by 2. The result of that division is added to amh1. This follows from the order rule that division is done before addition. How is NEW2 computed?
You need to memorize the order rules, or you need to use parentheses to clearly specify the order of operations. You will get into less trouble if you use parentheses to specify the order of operations.
The examples in this section create new numeric variables. The default formats for a new numeric variable is F8.2 and the variable is initialized to system missing.
The formats command
FORMATS MONTHS (F4) NEW1 (F2) NEW2 TO NEW7 (F5.2)
changes the formats for months to F4, new1 to F2, and new2 to new7 to F5.2. You could also change the formats by going to each variable and changing the variable type one at a time.
In this section we look at the square root (SQRT), round (RND), and truncate (TRUNC) arithmetic functions.
SQRT finds the square root of a number.
TRUNC truncates or drops off, without rounding, the decimal values of a number. It changes a decimal number to an integer.
RND rounds the value of number. It changes a decimal number to an integer.
Command Syntax |
COMPUTE NEW1 = SQRT(amh1). COMPUTE NEW2 = AGE/17 . COMPUTE NEW3 = TRUNC(NEW2). COMPUTE NEW4 = RND(NEW2). FORMATS NEW3 NEW4 (F2.0). |
|---|---|
Output |
AGE amh1 amh2 amh3 amh4 NEW1 NEW2 NEW3 NEW4
1 18 1 9 9 9 1.00 1.06 1 1
2 99 5 5 2 1 2.24 . . .
3 47 3 9 3 5 1.73 2.76 2 3
4 . 4 1 . 9 2.00 . . .
5 23 . . . . . 1.35 1 1
6 65 9 4 2 9 . 3.82 3 4
|
This example shows that a newly created variable can be used as a variable in a later transformation. For example the computation of NEW3 involves the values of NEW2, a variable that has just been created by the previous COMPUTE statement. Why is this possible? COMPUTE is a transformation, it does not get processed until a procedure is encountered. In this example the procedure LIST causes the first case to be read in; and each compute statement is processed in the order in which they appear in the command file. In the above example, NEW1 is created and added to the end of the active file, then NEW2 is created and added to the end of the active file, then NEW3 is created and added to the end of the active file. Then the next case is read in and each of the transformations are performed on that case. If the command COMPUTE NEW3 = TRUNK(NEW2) had appeared prior to COMPUTE NEW2 = AGE/17, then SPSS would have given you an error message to the effect that it could not find the variable NEW2.
The general rule of thumb regarding missing values for compute statements using arithmetic functions is that the result is system-missing if any of the component values are either user-missing or system-missing.
Because TRUNC and RND return integers, I changed the formats for NEW3 and NEW4 to integers by applying the F2.0 format to both those variables.
FORMATS NEW3 NEW4 (F2.0).
This section looks at the following statistical functions: SUM, MEAN, SD, MIN and MAX. Statistical functions have the following syntax:
function.n(variable list),
where function is the statistical function (e.g., SUM, MEAN, SD, etc.);
.n is the number of nonmissing values needed to evaluate the function; and
variable list is the list of variables to be evaluated by the function.
For example, the expression
MEAN.8(var01 to var20)
states that the mean will only be computed if there are at least 8 valid values for the variables between var01 to var20. If there are fewer than 8 valid values then the result is system missing.
If the .n parameter is not used [e.g., MEAN(var01 to var20)] then a default value of .n is used. Here is a list of the default .n values for the statistical functions:
| Statistical Function |
Default value of .n |
Statistical Function |
Default value of .n |
|---|---|---|---|
| SUM | 1 | SD | 2 |
| MEAN | 1 | VARIANCE | 2 |
| MIN | 1 | CFVAR | 2 |
| MAX | 1 |
Several examples of the use of statistical functions are given in Table 8.
Command Syntax |
COMPUTE SUM = SUM(amh1,amh2,amh3,amh4) . COMPUTE MEAN = MEAN.2(amh1,amh2,amh3,amh4) . COMPUTE STDEV = SD(amh1 TO amh4) . COMPUTE MIN = MIN.3(amh1 TO amh4) . COMPUTE MAX = MAX(amh1 TO amh4) . |
|---|---|
Output |
amh1 amh2 amh3 amh4 SUM MEAN STDEV MIN MAX
1 1 9 9 9 1.00 . . . 1.00
2 5 5 2 1 13.00 3.25 2.06 1.00 5.00
3 3 9 3 5 11.00 3.67 1.15 3.00 5.00
4 4 1 . 9 5.00 2.50 2.12 . 4.00
5 . . . . . . . . .
6 9 4 2 9 6.00 3.00 1.41 . 4.00
|
The variables within the parentheses must be separated by a single comma.
The statistical functions do not automatically display the ".n" argument (the number of valid values needed to evaluate the expression). If you wish to include the .n argument move the cursor to the point just before the opening parentheses and enter both the period and the value for n.
The "TO" in the expression
MAX(amh1 TO amh4)
indicates that you want to include a range of variables. In this instance the numeric expression can be read as "find the maximum value of the all the variables from amh1 to amh4. It will include the first and last variables (amh1 and amh4). SPSS will refer back to the variables in the Data Editor, find the first variable, amh1, look for the second variable, amh4, and include all the variables within that range of variables. If, for example, the variables in the data editor were ordered as follows: amh1 amh2 AGE amh3 amh4, then the variable AGE would have been included in the search for the maximum value! The variable list in the dialog boxes may be arranged alphabetically. Do not assume that they are also arranged in that same order in the Data Editor. You can specify how the variables list in the dialog boxes are arranged by going to -
Edit
Options
General
Variable Lists
You can choose between displaying variable labels or names. And you can choose between displaying the variables alphabetically or in the order that they occur in the file. If you intend to use the "TO" convention in a transformation, then it is more convenient to display the variable lists in the order that they occur in the file.
The general rule of thumb regarding missing values for compute statements using statistical functions is as follows: the result is system missing if the number of valid values is not equal to or greater than the minimum specified for that function.
There are several different types of missing-value functions. Two of the functions check to see if a value for a variable is system missing (SYSMIS) or either either user- or system-missing (MISSING). Another two of the functions will count the number of missing values (NMISS) in a list of variables or the number of valid values (NVALID) in a list of variables. Finally, there is a missing value function that will ignore the missing value flag for user-defined missing values (VALUE) That is, all values whether defined as user-missing or not will be considered to be valid values. Examples of these missing value functions are shown in Table 9.
Command Syntax |
COMPUTE MISS1 = MISSING(amh1). COMPUTE SYSMIS1 = SYSMIS(amh1). COMPUTE NUMMISS = NMISS(amh1 TO amh4). COMPUTE NUMVALID = NVALID(amh1 TO amh4). COMPUTE SUM = VALUE(amh1)+VALUE(amh2)+VALUE(amh3)+VALUE(amh4). |
|---|---|
Output |
amh1
amh2 amh3 amh4 MISS1 SYSMIS1 NUMMISS NUMVALID
SUM 1 1 9 9 9 .00 .00 3.00 1.00 28.00 2 5 5 2 1 .00 .00 .00 4.00 13.00 3 3 9 3 5 .00 .00 1.00 3.00 20.00 4 4 1 . 9 .00 .00 2.00 2.00 . 5 . . . . 1.00 1.00 4.00 .00 . 6 9 4 2 9 1.00 .00 2.00 2.00 24.00 |
MISSING returns a value of 1 if the value of the variable is either user- or system missing. If the value of the variable is valid, then MISSING returns a 0. In the example above, the only missing values for amh1 are for the cases #5 and #6. The value of MISS1 is 1 for both those cases. It is 0 for the other cases.
SYSMIS returns a value of 1 if the value of the variable is system missing. If the value of the variable is valid or user-missing, then SYSMIS returns a 0. In the example above, the only system missing values for amh1 is for case #5. The value of SYSMIS1 is 1 for that case. It is 0 for the other cases.
NMISS counts the number of user- and system-missing values in the variable list. Look at case #4 in the above example. The value of 2.00 for the newly created variable NUMMISS occurs because of the missing values for amh3 (system missing) and amh4 (user missing).
NVALID counts the number of valid values in the variable list.Again look at case #4. The value of 2.00 for the newly created variable NUMVALID occurs because of the valid values for amh1 and amh2 for that case.
VALUE ignores any user-missing value flags. Any values (except system-missing) are considered valid values. For example, the value of the newly computed variable SUM is 28 for the first case because the 9's for amh2, amh3, and amh4 are all considered to be valid values.
If the data type of a variable is a date (e.g., mm/dd/yyyy) then SPSS stores the values for that variable as the number of seconds from October14, 1582 to the given date value (e.g., the number of seconds from October 14, 1582 to January 1, 2000).
The DATE.DMY and DATE.MDY functions are used to compute a date variable from the day, month, and year of an event. For example, the date that the skills survey was submitted was stored in the skills.dat file as the month, the day, and the year. The DATE.DMY or DATE.MDY can be used to convert the values of those three variables(month, day, and year) to a single date type variable (e.g. (mm/dd/yyyy). The syntax for the compute command is shown in Table 9.
Command Syntax |
COMPUTE SURVDATE = DATE.MDY(smonth,sday.syear). FORMAT SURVDATE(ADATE). |
|---|
The compute command shown in Table 9 will return a date variable, survdate. The values of survdate will be displayed in the SPSS Data Editor as the number of seconds from October14, 1582 to the date of the survey. You can convert the display to any one of a number of other formats (e.g., mm/dd/yyyy) using the Define Variable window for the survdate variable. You could also run the FORMAT syntax command to change the display type to ADATE (american date).
The CTIME.DAYS function is very helpful if you need to keep track of the number of days between dates. For example, given that you know the birthday of a person, the CTIME.DAYS function can be used to compute the age of that person as of the day that the SPSS program was run. That is, you can determine a person's current age every time the database is run without having to change any of the variables in the database itself. The CTIME.DAYS function can compute the number of days between, say, the date of arrest and the date that the person was seen in the court system, or the number of days between the first administration of a memory test and subsequent administrations of that test.
Recall that SPSS stores date variables as the number of seconds from October 14, 1582. The CTIME.DAYS function converts the number of seconds to the number of days.
Here is an important date variable, $JDATE. The variable $JDATE is an SPSS system variable. It is automatically created by SPSS to keep track of the current date. It looks at the date function of your computer to find the current date. The names of SPSS system variables begin with a dollar sign, $. Another common system variable is $CASENUM which is the sequential numbering of all the cases in the file in the Date Editor.
The syntax for CTIME.DAYS is as follows
COMPUTE target = CTIME.DAYS(date)
where "date" is a date variable. The value returned to the target variable is the number of days from the beginning of the Gregorian calendar, October 15, 1582, to the date supplied as the argument for CTIME.DAYS. Examples of the use of CTIME.DAYS are shown in Table 10.
Command Syntax |
COMPUTE GREGAGE = $JDATE . COMPUTE DAYSOLD = $JDATE - CTIME.DAYS(birthdat). COMPUTE YEARSOLD = ($JDATE - CTIME.DAYS(birthdat))/365.25 . |
|---|
The command
COMPUTE GREGAGE = $JDATE
returns the the number of days from October 15, 1582 to the date the current date, the date the transformation was run.
The command
COMPUTE DAYSOLD = $JDATE - CTIME.DAYS(birthdat)
finds the number of days between the date in the CTIME.DAYS function and the current date, $JDATE.
The command
COMPUTE YEARSOLD = ($JDATE - CTIME.DAYS(birthdat))/365.25
finds the number of years between the date in the CTIME.DAYS function and the current date. It is the equivalent of dividing the DAYSOLD variable by 365.25. Add your birthday to one of the missing date values and run the daysold transformation to find out how many days old you are. Note the use of parentheses to make sure that the difference between the dates was divided by 365.24.
It is easy to find the number of days between two dates. Suppose that the date of admission to prison was called dateadm and the date of release from prison was daterel. The the number of days spent in prison, jailtime, would be computed as follows
COMPUTE jailtime = CTIME.DAYS(daterel) - CTIME.DAYS(dateadm).
Subtracting the distant date from the most recent date will give you a positive days value.
How would you find the age in years of the respondent at the time the survey was completed?
SPSS has several other date and time functions. Check them out by using the Help button while in the compute dialog box.
Sometimes a date is specified by three variables, the year of the event, the month of the event, and the day of the event. For example, the date that the survey was submitted was stored as three variables in the skills99.dat file. The YRMODA function will transform those three variables into a date variable where the value is the number of days between October 14, 1582 and the date entered. Suppose that the three variables were called yrsubm (the year the survey was submitted), mosubm (the month the survey was submitted) and dasubm (the day the survey was submitted). The the following compute command will return the number of days between October14, 1582 and the date that the survey was submitted --
COMPUTE SUBMIS = YRMODA(yrsubm, mosubm, dasubm).
You could also find the number of days to a particular date using the YRMODA function by entering the specific year, month, and day. For example,
COMPUTE USBIRTH = YRMODA(1776,7,4)
will return the number of days from the beginning of the Julian calendar to the birthday of the United States.
How would you compute, as of today, the age of the United States (in days)? Enter the COMPUTE command in the box below and press the SUBMIT button.
Press the button below to view the responses that have been submitted.
We began this discussion of transformations by finding the mean of the four attitude items. You should go back to Table 1 and make sure you understand how the values for the target variables were computed and why there are missing values for some of the target variables.
ŠLee A. Becker, 1997-1999 -revised 10/06/99