15 Analysis of Variance
Prof. Aslam Mahmood
E) Content
The ‘t’ test provides us a basis to test the significance of the difference between two sample means. However, there are situations where we have more than two sample means and we have to test the null hypothesis that the universe mean of all samples do not differ significantly. For such a test of significance the technique used is the “Analysis Of Variance” or popularly known as ANOVA.
Consider four normal universes with means as M1, M2, M3 and M4 and equal variance. All the universe mean may be equal or may not be equal. We can test the hypothesis that all the means from different universes are equal or not from their samples values using the technique known as “Analysis of Variance”.
If we draw random samples of equal size from them we will have to test the null hypothesis that all the universe means are equal, then the null hypothesis Ho; would be:
H0: M1 = M2 = M3 = M4 against the alternative hypothesis;
H1: M1, M2, M3 and M4 are not equal.
Basic assumption in this case is that the variations in the sample values of a variable are due to the common effect µ superimposed by the effect of different universesRj( which will remain uniform throughout the Universe)and the random effect eij(it will vary from Universe to Universe and also from observation to observations with in a Universe also). Effect of random error is minor and operates in both positive and negative directions and hence can’t cause significant differences in the means of different universes. However, we have to examine the effect due to Universe. If these effect are large, the variations of the mean values between the samples selected from different Universes will be significantly different.
Thus, if Yijis the ith value of the variable in the Universe j, it can be written as :
Yij = µ + Rj+ eij
Where µ is the value of the common effect, Rj is the effect of the jth Universe and eij will be the effect of marginal random error on ith observation in jth Universe. This effect will vary from observation to observations even with in the same Universe but its effect will be insignificantly small and different values of Y will be quite homogeneous with in the sample values from a Universe. The Universe effect Rj has to play an important role, it will have same effect on all the values of the sample observations from a specific Universe j and the values of the variable from within the a Universe will remain homogeneous. The effect of R will change when we move from one Universe to another.
As we move from Universe 1 to Universe 2, the fixed effect of the Universe, R2 will start operating on all the values of thesamples from Universe 2. If there are significant differences between the Universe effects R1 and R2, the values of the variable between Universe 1 and Universe 2 will not be homogeneous due to the effect of R1 and R2. The same will be true for the sample values from regions R3 and R4 also.
Thus the essence of the exercise of “Analysis of Variance” is to examine the variations between the sample meansin relation to variations due to random factors. If variations between means from different Universe are found to be significantly larger, we reject the null hypothesis that M1=M2=M3=M4 and accept the alternative hypothesisthat the universe means M1, M2, M3 and M4 are not equal.
In case the between variations are not significantly different the null hypothesis is accepted i.e. we conclude that all the means are equal.
Calculation of Total Variation and Variation Between sample from different Universe.
If there are j number of Universes ( J= 1, 2, 3, 4 in the present case) and from each Universe a random sample of size k is drawn and suppose:
Xij is the ith value of jth observation. ( j= 1,2,3,4 and i=1,2,3,……..k)
Total variations in the different values of variables of all the samples from all the Universes will be, summation of the squares( S. S. )of deviations of all the values of the sample from the mean of all the values say M i.e.
Total S. S. = Σ Σ ( Xij – M )2
Variation of the values between the samples will be, the sum of square of each sample mean (m.j) from the total mean M multiplied by the number of observations in each sample “k”:
Between sample S.S. = k Σ ( m.j – M )2
wherem.j is the mean of jth sample of size k, ( j= 1,2,3,and4=n in the present case)
Within or Error S.S. = Difference of Total S.S. and Between S.S.
The logical basis of ( one way ) ANOVA is that if the average sum of square between samples is close to theaverage error sum of square ( caused by random factors), the variation between the samples cannot be considered as significantly different from it and hence null hypothesis cannot be rejected. We accept the null hypothesis and conclude that the means of all the universes are equal.
However, if it is significantly larger compare to average error sum of square, we reject the null hypothesis and accept the alternative hypothesis that the universe means are not equal.
For testing the equality of the average of between and error sum of squares we use the F-ratio test. The procedure is explained in the ANOVA table1, given below also.
Table 1: Analysis Of Variance, Calculations of F-rtio
Example 1
To test the effect of soil types, agricultural productivity (in 000 Rs./acre and normally distributed ) is given below in Table 2, for 5 farms randomly drawn from four regions R1, R2, R3 and R4 of different soil types. Test the hypothesis that productivity significantly varies between regions.
Table 2: Agricultural Productivity (Rs 000/Acre) of four samples from different Region
Agricultural productivity in Region
Table 3: Results of ANOVA , F-ratio
F ratio (for 3, 16 degrees of freedom) at 1 % level of significance is 5.29
The results of the “Analysis of Variance” given in table 3 give the calculated value of F ratio which is found to be much higher than the value given in the F ratio table even at 1% level of significance( 5.29), suggesting to reject the null hypothesis that the regions do not differ in terms of their agricultural productivity (i.e. M1= M2=M3=M4) and we conclude that the mean value of agricultural productivity is significantly different regionally. Regions are different in terms of their soil type. Thus the soil type can be the main factor to cause the variations in the agricultural productivity of the crop.
TWO WAY ANALYSIS OF VRIANCE
In the above exercise we have taken the effect of soil types of different regions on the production of a crop, replicating the production experiment five times in each region. These five experiments could be under similar conditions. But we can have five different types of fertilizers for each region also to analyse the effect of five verities of fertilizers on crop production along with the effect of soil types of four different regions, with the help of “ Two way Analysis Of Variance”. Thus if each of the five rows corresponds to different row effect ( Fertilizer Effect ) trough out all four columns, two way analysis of variance model can be specified as given below:
Yij = µ + Fi + Rj + eij Where:
Yij is the crop production of plot using ith verity of fertilizer in the region j and.
µ is the over all level of production per plot of land. Fi is the effect of it h verity of fertilizer.
Rj is the effect of soil type of jth region and eij is the error effect in the plot using ith fertilizer verity and soil type of jth region.
Basic assumption again is that the variations of random error eijare such that its very small and its average will be zero or very small to the extent that it can be ignored and it operates in both positive and negative directions.
We can compare the variations due to both F and R in relation to eij and evaluate their contribution to Yij. If variations in any one of them are found to be insignificantly different from variations of the error term eij, it is ignored.
Hypothesis to be tested
There will be two hypotheses to be tested in the two Way Analysis of Variance. One will correspond to column differences and the other would correspond to row differences.
For column differences the null hypothesis would be ;
Ho: Means of all the columns (soil types) are equal ( M.1= M.2= M.3=M.4) against the alternative hypothesis
H1: Means of all columns ( Soil types)are not equal.
In case F-ratio between the columns is found to be insignificant, the null hypothesis is accepted and we conclude that the column differences can be ignored. In case the F-ratio is found to be significant we reject the null hypothesis and conclude that the column means are not equal differences due to soil type are significant and can’t be ignored.
And for row differences the null hypothesis would be:
Ho: Means of all the rows (Fertilizer Effect) are equal ( M1.= M2.= M3.= M4.= M5. ) against the alternative hypothesis,
H1: Means of all the rows (Fertilizer Effect) are not equal.
In case F-ratio between the rows is found to be insignificant, the null hypothesis is accepted and we conclude that the differences in productivity due to rows (Fertilizers) can be ignored. In case the F-ratio is found to be significant we reject the null hypothesis and conclude that the row means are not equal and differences in the agricultural productivity due to row (fertilizers) are significant and can’t be ignored.
The “Analysis of Variance” will help us in identifying the factors which show significant variations between the columns or between the rows or between both.
A structure of a Two Way Analysis of Variance Table will be as given below in Table 4:
Table 4: Calculation of F- ratio (Two Way ANOVA )
Example 2
Example given above is repeated for two way ANOVA, taking each row representing the 5 treatment of specific fertilizer across all the four regions.
Agricultural productivity of a crop (in 000 Rs./acre) is given below in four columns and five rows in Table 5. Each of the 4 column corresponds to a region of different soil types, as mentioned above. Whereas each of the 5 rows show the effect of five verities of fertilizer used. Test the hypothesis that average value of both the production in each region as well as for each verity of fertilizer is same .
Table 5: Agricultural Productivity across regions using treatment of different verities of fertilizers
The sum of squares between 4 regions of the above table are already found out and are given in table 4. For SUM OF Square between rows will be similar to SS between columns as given in Table 5. Thus, the Sum Of Square between rows= 4 (76.26- 74.40)2 + 4 (72.25 – 74.40)2 + 4 (73.50 – 74.40)2 + 4 ( 80.00 – 74.40)2 + 4 (70.00 – 74.40)2 = 238.3. Putting these values in ANOVA table given below in Table 6, we get the F-ratios for rows and columns both as shown below:
Table 6; Results of Two way Analysis Of Variance
For testing the significance of the variations due to regional differences or due to soil type (Column) we look the F-ratio for (3, 12 ) degrees of freedom as is found in the above table = 9.157 . Table value of F-ratio for (3, 12 ) degrees of freedom at 1 % level of significance is 5.95 and at 5 % the value is 3.69. As the calculated value of 9.157 is considerably higher than the value 5.95 given in the F-ratio table at 1 % level of significance as given in F table in appendix.
When a value is found to be significant at 1 % level of significance, it will be automatically significant at levels higher than this. So there is no need to check it further for 5 % or higher levels of significance.
For looking the significance of variations due to rows, we find the calculated value of F-ratio as 0.458, the corresponding f-ratio value for ( 4, 12 ) degrees of freedom in the table given in appendix is 5.41 at 1 % level and as 3.26 at 5% level of significance. We find that it is insignificant even at 5 % level of significance.
For testing the significance of the variations due to regional differences or due to soil type we look the F-ratio for (3, 12 ) degrees of freedom as is found in the above table as 9.157 . Table value of F-ratio for (3, 12 ) degrees of freedom at 1 % level of significance is 5.95 and at 5 % the value is 3.69. As the calculated value of 9.157 is considerably higher than the value 5.95 given in the F-ratio table at 1 % level of significance as given in F table in appendix. Thus we reject the null hypothesis that the column means are equal and accept the alternative hypothesis that the column means are not equal.
When a value is found to be significant at 1 % level of significance, it will be automatically significant at levels higher than this. So there is no need to check it further for 5 % or higher levels of significance.
For looking the significance of variations between rows, we find the calculated value of F-ratio as 0.458, the corresponding F-ratio value for ( 4, 12 ) degrees of freedom in the table given in appendix is 5.41 at 1 % level and as 3.26 at 5% level of significance. We find that it is insignificant even at 5 % level of significance. Thus in the case row means the null hypothesis that all the row means are equal can’t be rejected and is accepted.
The above results of the “Analysis Of Variance” suggest that the agricultural productivity is significantly affected by the soil types (Column effect). The analysis also suggests that the effect of fertilizers (Row effect) is not found to be significant.
you can view video on Analysis of Variancey |
References
- Cochran W.G. (1963)Sampling Techniques , John Wiley & Sons, New York.
- R. Hammond and Macullagh, (1974) “Quantitative Techniques in Geography: An introduction “ Clarendon Press.
- R.A. Fisher and F. Yates (ed.1963), Statistical tables for biological, agricultural and medical research, edited by Oliver and Boyed Edinburgh.
- F- ratio is always equal to or greater than unity.
- In case F-ratio is coming as less than unity, interchange the numerator and the denominators.
- F = 1/F
- As further improvements in ANOVA we have ANCOVA i.e. “ Analysis Of Covariance”.
- ANCOVA takes care of interaction between columns and rows.