In statistics, residuals are what is left over after fitting a model. Many statistical methods (such as multiple regression) make assumptions about the residuals. These assumptions can be checked with plots. In addition, in any model, residuals may indicate data entry errors or otherwise problematic points. Plotting is an essential part of statistical analysis.
- Skill level:
Other People Are Reading
Things you need
- Statistical software, such as SAS, R, SPSS or other software.
Plot the residuals against the fitted values from the model. In a linear regression plot, this plot should appear as a blob, with no pattern. This is a way of checking the assumption of homoscedasticity, or equality of variance.
Plot the square roots of the residuals against fitted values from the model. This is another check of similar issues as the first plot, but it shows more clearly the size of the residual because the square roots are all positive.
Make a quantile normal plot of the residuals for a visualisation of the normality of data. Linear regression assumes that the residuals are normally distributed.
Plot Cook's distance against the residuals. Cook's distance is a measure of the influence of a particular data point on a regression equation. If a point has high influence, deleting it would make a big difference in the regression equation. It is a general principle that small changes in the input data should make small changes in the output. High influence points violate this assumption.
Tips and warnings
- Any of these plots may be created in any good statistical package, such as SAS, R or SPSS.