DISCOVER

# How to use dummy variables in Excel regression

Updated March 23, 2017

Microsoft's popular Excel program has data analysis capabilities that include conducting regression analysis with dummy variables. Dummy variables are categorical variables numerically expressed as 1 or 0 to indicate the presence or absence of a particular quality or characteristic. Excel does not require any special functions when a regression model includes a dummy variable among the independent variables. However, regression models with dependent dummy variables require additional add-ins, programs that expand Excel's options and features.

Load the data analysis tool from the Excel add-ins, included in all versions of Excel. You must do this to conduct a regression or any other type of data analysis. Clicking "Tools" opens a drop-down menu. Select "Add-ins" and from the menu that opens, check "Analysis ToolPak" and click "OK." "Data Analysis" should appear in your Tools menu.

Enter the data you will use for your regression into an Excel worksheet, coding any dummy variables with the value 1 or 0, depending on whether the subject has the characteristic in question. Gender is an example of a dummy variable, since a study's subjects can be only male or female. A study of college entrance examination scores that included subjects' gender, for example, could code female students with a 1. Using dummy variables among your independent variables requires no special functions in Excel. Remember that if a dummy variable has only two categories (such as male or female), only one variable is needed to represent the two categories.

Code categorical variables with more than two categories as multiple dummy variables, making sure the number of variables is one less than the number of categories (n-1, in statistical terms). For example, the category ethnicity expressed as five levels (white, black, Hispanic, Asian, American Indian) would require four separate dummy variables. For example, if you were studying college entrance examination scores, you could create the following dummy variables: black, Hispanic, Asian and American Indian, coding each a 1 if the student in question fits that ethnic category.

Expand Excel's capacity for regression with dummy variables with an add-in that will allow the program to conduct regressions with dummy dependent variables. One such program is XLStat, available for purchase and download from the maker, Addinsoft. Programs such as this allow you to conduct regressions in which the dependent variable takes the values or either 1 or 0.

#### Warning

A common mistake among beginners is to use as many dummy variables as there are categories (such as, for example, two dummy variables for male and female). Such an approach would create multicollinearity, in which two independent variables are highly correlated, making it almost impossible to determine their separate effects on the dependent variable. Remember that the number of dummy variables should always one less than the number of categories.