Regression analysis is a branch of mathematics used for analysing numerical data which consists of an independent variable (the variable that we deliberately fix) and a dependent variable. The purpose of the analysis is to find a relationship between the two sets of data. The data is plotted onto a scatter graph, with the independent variable data plotted on the x axis and the dependent variable data plotted on the y axis. When a straight line, or regression line, is drawn through the centre of the plotted data points, the graph will give a good visual picture of the relationship between the two sets of variables.

- Skill level:
- Moderate

### Other People Are Reading

## Instructions

- 1
Construct a scatter graph from your data. Use this to see whether there is likely to be a linear relationship between the two variables, that is, whether you expect R to be close to +1, -1 or 0.

- 2
Prepare a table of five columns. In the first column, list your x data. In the second, list your y data. In the third, calculate the value of x-squared. In the fourth, calculate the value of y-squared. In the final column, calculate x multiplied by y (xy).

- 3
Calculate the average value for each of your five columns. For example, if your x data consists of the values 0, 2.2 and 4.6, your average x value will be (0 + 2.2 + 4.6) / 3 = 2.26.

- 4
Calculate Sxy as follows: (total number of data points multiplied by average of xy) - (average x multiplied by average y). Calculate Sxx as follows: (total number of data points multiplied by average x-squared) - (average of x) squared.

- 5
Calculate R. This is Sxy divided by the square root of (Sxx multiplied by Sxy).

#### Tips and warnings

- Sense check your answer by using the scatter graph. You should be able to see at a glance whether R should have a value close to -1, +1, or 0.
- If the data points cluster around the straight line, there is a strong linear relationship between the x and y data. If the points are randomly scattered across the graph, there is no relationship between the variables, or zero correlation.
- The correlation coefficient, or R, indicates the extent to which the data points deviate from the regression line. In mathematical terms R operates on a scale of -1 to +1. +1 and -1 each represent a perfectly linear correlation between the data. If the R value is positive then the slope of a regression line will rise from left to right on the graph. If it is negative, the slope will fall. An R value of zero indicates no correlation at all.
- If your value for R falls outside the -1 to +1 scale, you have made a mistake.