How to Calculate Propensity Scores

Propensity scores are usually used to help compare two or more groups of subjects (most often people) in an observational study where there may be selection bias.

When we have data on more than a few variables about each person, it can be simpler to summarise that information into a single score and then use that score to match people. Alternatively, we can use the score itself as a covariate.

Collect the data and enter it properly into a computer program, such as SAS, R, SPSS or Stata. Data are usually obtained from ongoing studies. They can be entered into a spreadsheet and then read into a statistics program, or they can be entered directly into the statistics program using built-in functions. Many statistics packages offer other ways to import data as well.

• Propensity scores are usually used to help compare two or more groups of subjects (most often people) in an observational study where there may be selection bias.
• When we have data on more than a few variables about each person, it can be simpler to summarise that information into a single score and then use that score to match people.

Use the software's logistic regression function with the dependent variable being the probability of being selected and the independent variables being all the covariates. The proper way to perform logistic regression varies with the program. For example, in SAS, PROC LOGISTIC is appropriate; in R, the glm() function is appropriate. Each statistics package has extensive documentation. According to authorities such as Rosenthal and Rosnow, it is better to include more variables as opposed to fewer.

Output the conditional probability of being selected, given the covariates. For example, in SAS, you would use a statement similar to that used by Parsons:

• Use the software's logistic regression function with the dependent variable being the probability of being selected and the independent variables being all the covariates.

OUTPUT OUT= STUDY.Propen prob=prob

Here, the data set "STUDY" would have a variable called "prob". This is the propensity score.

Check that the model balances the covariates. That is, check that the logistic regression gives a relatively complete explanation. In SAS, you can use the "lackfit" option on the model statement, which implements the Hosmer-Lemeshow test of goodness of fit. A large p-value indicates good fit, but you can look at the tabular output as well.

• OUTPUT OUT= STUDY.Propen prob=prob Here, the data set "STUDY" would have a variable called "prob".
• In SAS, you can use the "lackfit" option on the model statement, which implements the Hosmer-Lemeshow test of goodness of fit.