34 Simple Linear Correlation

epgp books

 

 

 

1. Introduction

 

What is measure of correlation?

 

Correlation refers to the linear relationship among the variables. For example, the blood pressure of a patient may be correlated with age, food habits and family history and so on. If we study the degree of relationship among the above variables, it is known as correlation.

 

In this module, we are going to discuss various types of correlation and different methods of calculating correlation. The estimation of correlation depends of upon the type of data. If the data type is in actual values, we can calculate Karl pearson correlation co efficient. If the data is somewhat qualitative in nature , we have to calculate rank correlation.

 

2.  Objectives

 

1.      To study about various types of correlation

2.      To study the methods of estimating correlation

 

3.  Types of correlation

 

Simple or Partial or Multiple:

 

If the relationship between two variables are analysed, it is called simple correlation. When more than two variables are considered, the correlation between two of them when all other variables are hold constant, i.e., when the linear effects of all other variables on them are removed, is called partial correlation. When more than two variables are considered, the correlation between one of them and its estimate based on the group consisting of the other variables is called multiple correlation.

 

Linear or Non-linear or No Correlation:

 

When we plot the values of X and Y, if all the points lie on a line or scattered around the line, it is called linear correlation. When all the points lie exactly on a curve or scattered around a curve, there is non-linear correlation between the two variables.

 

When the points are scattered neither around a line nor around a curve, there is no correlation between the two variables. The following diagrams show these three kinds

Methods:

 

The following four methods are available under simple linear correlation and among them, product moment method is the best one.

 

i) Scatter Diagram

ii) Karl Pearson’s correlation coefficient or product moment correlation coefficient (r)

iii) Spearman’s rank correlation coefficient (p)

iv) Correlation coefficient by concurrent deviation method (rc)

 

SCATTER DIAGRAM

 

When we plot the values of X and Y in a graph sheet, the resulting diagram with N points is called scatter diagram.

 

Possible types of scatter diagram under simple linear correlation are as given below. From a diagram, it can be found out whether the correlation is positive or negative and whether it is perfect or high or low.

 

The merits of this method are as follows. This is easy to draw, non mathematical and simple to understand. This does not involve computations. The greatest demerit is that this is not quantitative. As no numerical value is computed, comparison is not possible sometimes. Decisions based on this are not as accurate as those based on correlation coefficients.

 

KARL PEARSON’S COEFFICIENT OF CORRELATION (r)

 

This is also called product moment correlation coefficient. This is denoted by r. This is covariance between the two variables divided by the product of their standard deviations. This can be calculated by using any one of the formulae. Choice of a formula depends on the nature of the data. Different formulae are seen under the following examples.

  • Properties:
  1. -1≤r≤+1≤. i.e., correlation coefficient cannot be greater than 1 numerically.
  2. Correlation coefficient is independent of change of origin. That is why we do not add a or b when we use u and v although we have subtracted them from X and Y while finding u and v.
  3. Correlation coefficient is independent of change of scale. That is why we do not multiply by c or d when we use u and v although we have divided X and Y by them while finding u and v.
  4. Correlation coefficient is a pure number. It is not in any unit of measurement.

Interpretation of r. r=0 indicates absence of linear correlation. R=+1 and r=-1 indicate perfect positive and perfect negative correlations respectively. 0<r<0.5 indicates low positive correlation, 0.5≤r≤1 indicates high positive correlation, -1<r≤-0.5 indicates high negative correlation and -0.5<r<0 indicates low negative correlation, according to certain statisticians.

 

Coefficient of Determination: The square of the coefficient of correlation (r) is the coefficient of determination (r2). It indicates the portion of variation in the dependent variable which is due to the independent variable. The remaining variation in the dependent variable is because of other factors.

 

If r=0.5, r2=0.25 and so 25% (0.25 x 100) of the variation in the dependent variable is attributable to the independent variable.

 

5. SPEARMAN’S RANK CORRELATION COEFFICIENT (r)

  • r 6Sd2

For the maximum value of X, 42, rank is 1; for the next lower value 37, rank is 2; Similarly, for 47 of Y, rank is 1, 43 rank is 2.

Rank 1 may be assigned to the least value of X; rank 2 to the next higher value, … Ifso, the least value of Y is to be assigned rank 1, the next higher value rank 2.

 

Tied Ranks:

 

When one or more values are repeated, the two aspects – ranks of the repeated values and change in the formula, are to be considered.

 

Each repeated value is to be considered separately. If a value has occurred m times, for each of them the average of the probable ranks which would have been assigned to them if they had differed slightly is assigned now. This does not affect the ranks of other values.

 

For each such repeated value, m(m2−1)/12 is to be added with Sd2once in the formula, Example : Find the rank correlation coefficient for the percentage of marks secured by a group of 8 students in Economics and Statistics.

  1. CONCLUSION

 

Let us summarise, there are various types of correlation such as perfect positive correlation, negative correlation, high degree positive correlation, high degree negative correlation, low degree positive correlation and no correlation. Various methods of calculation of correlation such as scatter diagram method, Karl pearson correlation, rank correlation and concurrent deviation methods are discussed. Scatter diagram method is very easy compared to other methods though the it will not reveal the exact relationship. But the Pearson correlation is applied to many practical problems if data is in actual values. The rank correlation method is used when the data is qualitative in nature. The concurrent deviation method can be used only to know the direction of change and relationship. This module may give some idea to calculate various methods of calculation of correlation. To familiarise with the calculation, practice of solving problems from text books is necessary.

you can view video on Simple Linear Correlation