35 Simple Linear Regression

S. Gandhimathi

epgp books

 

 

 

  1. Introduction 

What is regression?

 

Correlation coefficient indicates the direction of co variation and the closeness of the linear relation between two variables. If two variables are related, the mathematical equation of their relation is regression. Regression equation gives the value of the dependent variable corresponding to any specified value of independent variable. The cause and effect relationship is measured in regression analysis. That is which variable is cause and which variable is effect is known in regression analysis. However, the measurement of cause and effect relationship is possible only if they are correlated.

 

Objective of the module

 

In this module we are going to discuss about the relationship between the correlation and regression analsysis and methods estimating the regression euation,

 

Difference between correlation and regression

 

Let us discuss the difference between correlation and regression ?

 

The differences between correlation and regression are given in the following table.

 

For the pairs of values X and Y, there are two regression lines.

When X is the independent variable and Y is the dependent variable, the line is called the regression line of Y on X. It is obtained by using the method of least squares. When Y is the independent variable and X is the dependent variable, the line is called the regression line of X on Y.

The form of the regression equation Y on X is

 

Y = a + bx

 

The form of the regression equation X on Y is X = a+ bY

 

The following normal equations are solved to estimate the regression equation Y

 

on X

Sy = Na + bSx and

Sxy = aSx + bSx2

The following two normal equations are solved to estimate the regression equation X on Y

SX = Na + bSY and

 

SXY = aSy + bSY2

Methods of Forming the Regression Equations

  1. Regression Equations on the basis of Normal Equations.
  2. Regression Equations on the basis of X̅, Y̅bxyand bxy.

Both the methods are based on the principle of least squares. They give the same equations.

 

Method 1: Regression Equations on the basis of Normal Equations

 

Example: From the following data, obtain the two regression equations:

 

X 6 2 10 4 8
Y 9 11 5 8 7

(Use normal equations)

 

Solution:

Steps:

  1. A table is formed with the values of X and Y in the first two columns.
  2. XY, X2 and Y2 are found and written in the next three columns.
  3. Totals of the columns are found.
  4. Normal equations for the regression equations of Y on X are considered. Values as per the table are substituted. The equations are then solved and the values of A and B are obtained. By substituting those values in Y = A + BX the required equation is obtained.
  5. Normal equations for the regression equation of X on Y are considered next. Values as per the table are substituted. The equations are then solved and the values of A’ and B’ are obtained. By substituting those values in X = A’ + B’ Y, the required equation is obtained.

Properties of Regression Lines and Coefficients

 

The two regression equations are generally different and are not to be interchanged in their usage. The regression equation of Y on X is to be used to find the value of Y corresponding to any specified value of X. Similarly, the regression equation of X on Y is to be used to find the value of X corresponding to any specified value of Y.

 

The two regression equations become one and the same when r = -1 or +1. In such cases both X and Y are to be found from that equation.

 

The two regression lines intersect at ((X̅, Y̅). When there are two regression lines. Hence, the values obtained for X and Y by solving the two regression equations simultaneously are X̅andY̅ respectively.

 

Correlation coefficient is the geometric mean of the two regression coefficients. That is, correlation coefficient is the square root of the product of the two regression coefficients. r = ± √bYX − bXY

 

The two regression coefficients and the correlation coefficient have the same sign. Both bXY and bYX have the same sign. R is also of the same sign. In other words, there are only two possibilities – bXY, bYX and r are positive or bXY, bYX and r are negative at a time.

 

Both the regression coefficients cannot be greater than 1 numerically simultaneously. When the signs are ignored, both bXY and bYX cannot be greater than 1 simultaneously; either both are less than 1 or one of them is less than 1.

 

Regression coefficients are independent of change of origin but are affected by change of scale.

 

Each regression coefficient indicates the quantum of change in the dependent variable corresponding to unit increase in the independent variable.

 

5.Conclusion

 

Let us summarize, the difference between correlation and regression analysis is discussed. The method of estimating regression lines based on two normal equation and on the basis of X̅, Y̅bxy and bxy. are discussed The regression equations reveals the cause and effect relationship. Hence, the regression models are formed to identify the determinants of dependent variables in research. The estimation of regression equation with R2will reveal the model fit and the explained variation in the dependent variable. Through this we can identify the unexplained variation in the model. The value of f reveals the significance of the explained variation. If it is statistically insignificant, we will respecify the model and re estimate it. To familiarize the regression models, collect data on practical problems and estimate the regression equation based on the above methods.

you can view video on Simple Linear Regression