# How to find the correlation coefficient for 'R' in a scatter plot

Written by jayne thompson
• Share
• Tweet
• Share
• Pin
• Email

Regression analysis is a branch of mathematics used for analysing numerical data which consists of an independent variable (the variable that we deliberately fix) and a dependent variable. The purpose of the analysis is to find a relationship between the two sets of data. The data is plotted onto a scatter graph, with the independent variable data plotted on the x axis and the dependent variable data plotted on the y axis. When a straight line, or regression line, is drawn through the centre of the plotted data points, the graph will give a good visual picture of the relationship between the two sets of variables.

Skill level:
Moderate

## Instructions

1. 1

Construct a scatter graph from your data. Use this to see whether there is likely to be a linear relationship between the two variables, that is, whether you expect R to be close to +1, -1 or 0.

2. 2

Prepare a table of five columns. In the first column, list your x data. In the second, list your y data. In the third, calculate the value of x-squared. In the fourth, calculate the value of y-squared. In the final column, calculate x multiplied by y (xy).

3. 3

Calculate the average value for each of your five columns. For example, if your x data consists of the values 0, 2.2 and 4.6, your average x value will be (0 + 2.2 + 4.6) / 3 = 2.26.

4. 4

Calculate Sxy as follows: (total number of data points multiplied by average of xy) - (average x multiplied by average y). Calculate Sxx as follows: (total number of data points multiplied by average x-squared) - (average of x) squared.

5. 5

Calculate R. This is Sxy divided by the square root of (Sxx multiplied by Sxy).

#### Tips and warnings

• Sense check your answer by using the scatter graph. You should be able to see at a glance whether R should have a value close to -1, +1, or 0.
• If the data points cluster around the straight line, there is a strong linear relationship between the x and y data. If the points are randomly scattered across the graph, there is no relationship between the variables, or zero correlation.
• The correlation coefficient, or R, indicates the extent to which the data points deviate from the regression line. In mathematical terms R operates on a scale of -1 to +1. +1 and -1 each represent a perfectly linear correlation between the data. If the R value is positive then the slope of a regression line will rise from left to right on the graph. If it is negative, the slope will fall. An R value of zero indicates no correlation at all.
• If your value for R falls outside the -1 to +1 scale, you have made a mistake.

### Don't Miss

#### References

• All types
• Articles
• Slideshows
• Videos
##### Sort:
• Most relevant
• Most popular
• Most recent