Dummy Variables

 

Menu location: Data_Dummy Variables

 

This function creates dummy (or design) variables from one categorical variable.

 

The reference cell coding model is used (Kleinbaum et al., 1998):

- the source data may be numerical or text, representing categories. The coding scheme shown above is applied to your data in reverse alphanumeric order for the k categories found, so for three categories, say race equal to black, white or other, white (being the last in an alphabetical sorting) is coded 1,0,0 which reduces to dummy variables X (3) = 1, X (2) = 0.

 

In order to represent a categorical variable with more than two levels in a regression model you may wish to convert it to a series of dummy variables using this function.

 

Say a linear regression model is specified with three predictors; the first and third predictors are continuous data, and the second predictor is a classifier (categorical data) with three levels. The second predictor should be converted to two dummy dichotomous variables (e.g. the example below) and put into a multiple linear regression as two predictors.

 

The naming scheme for dummy variables is the original variable name suffixed with (1) if there are only two categories, or suffixed with (j+1) where there are j+1 categories giving rise to j dummy variables.

 

In general form, a regression model where the jth predictor variable is a classifier with k levels can be interpreted as follows, provided the jth variable is converted to dummy variables:

- where Y is the outcome variable, b is a regression coefficient, D is a dummy variable for a classifier variable of k levels and x is a non-classifier predictor variable.

 

Example

Group ID ---> Group ID (2) Group ID (3)
1 0 0
1 0 0
1 0 0
1 0 0
2 1 0
2 1 0
2 1 0
2 1 0
3 0 1
3 0 1
3 0 1