A probability plot is used to assess the shape of your data distribution. This type of plot is more useful than a histogram when your data set is small. The plot compares sample data to expected data if the population from which the sample was drawn is normally distributed. Because the plot requires advanced mathematical calculations to transform data into a straight line, use computer software programs such as SAS, Microsoft Excel or Minitab to generate the plot.
- Skill level:
Other People Are Reading
Things you need
- Computer data analysis software program such as SAS, Microsoft Excel or Minitab
Understand that the horizontal or x-axis of a probability plot contains the values in your data set. The x-axis scale is transformed mathematically by the computer software data analysis program so the data forms a straight line.
Look at the vertical or y-axis, representing cumulative percentiles. You would be able to see the numbers in your data set that comprise any given proportion of your sample. For example, you would be able to tell what values are at the 10th, 20th or 40th percentile. Note that when using percentiles, 99 is the highest possible value.
Understand each data point is located at the intersection of the x and y-axis. If the highest number in your data set is 20, for example, there would be a dot at the intersection of the x-axis value of 20 or the transformed number that corresponds to 20, and the y-axis value of 99, which corresponds to the 99th percentile.
Look at the curved lines on the plot. SAS, Microsoft Excel and Mintab all colour these lines in blue. These lines correspond to confidence intervals for the 95th percentile, meaning that 95 per cent of the values in your data set fall between the blue lines.
Examine the individual data points; if most of the data points are within the two confidence interval lines and the data resemble as straight line, statisticians conclude the data is normally distributed.
Tips and warnings
- SAS, Microsoft Excel and Minitab all calculate the Anderson-Darling test as an adjunct to the probability plot. An Anderson-Darling statistic above .05 is used as further evidence the data is normally distributed.
- 20 of the funniest online reviews ever
- 14 Biggest lies people tell in online dating sites
- Hilarious things Google thinks you're trying to search for