38 Analysis of Variance

S. Gandhimathi

epgp books

 

 

 

 

 

 

1.   Introduction

 

Analysis of variance is one kind of statistical technique, which helps to test the homogeneity or uniformity by comparing different groups. Generally, total variance is divided in two components, viz.,

 

Total variance = Variance ‘between’ samples + variance ‘within’ samples

 

Thus the sum of two variance is however the total variance. The purpose of considering variance of these two types is to find out the influence of different forces working on them. The difference between group sample means are due the influence of the matter experimented as well as the sampling variation between the samples.

 

Application of Analysis of Variance: Today the analysis of variance technique is being applied in nearly every type of experimental design, in natural sciences as well as social sciences. This technique is predominantly applied in following fields.

 

Testing the significance of difference between several means: Like students ‘t’ test, it is not limited to two sample means of small samples. It is applied to test the significance of the difference of means of more than two samples. This helps in concluding that the different samples have been drawn from the same universe.

 

Testing the significance of difference in variations: The analysis of variance is also applied to test the significance of difference in variance.

 

Testing the Homogeneity in two way classifications: When the samples are divided in several categories on the basis of two attributes, even then this technique is helpful in testing the significance of homogeneity.

 

Testing the correlation ratio and regression: The analysis of variance provides exact tests of significance for the correlation ratio, departure from linearity of regression and the multiple correlation coefficient.

 

In this way, analysis of variance is an important and useful technique of research work in various field of knowledge.

 

Assumption underlying analysis of variance: The analysis of variance technique is based upon following fundamental assumptions.

 

  1. Normality: Each of the samples is drawn from a normal population. However, when the sample sizes are large, the assumption of normality is automatically fulfilled
  2. Homogeneity of variance: Sample variance should be equal i.e.,s 21 = s 22 =…. s2.
  3. Additive Property: The total variance should be equal to the total of variances of different classes. In other words, the variances due to different sources should be additive to total variances.
  4. Independence: The selection of samples should be random, independent and equally likely. In case of samples not being independent, problem of multicollinearity arises which affects the inference of analysis of variance.

 

It should be keep in mind that (theoretically), whenever any of these assumptions are not fulfilled, the analysis of variance technique cannot be applied to give valid result. This is conformed by many economic and business experiments. But in the situations, where these assumption are not fully met, the transformed data may be used for analysis of variance.

 

In practice, the researches find no significant loss in the adequacy of ‘t’ test, if one or more of these assumptions are ‘bent’ provided the data are reasonably close to meeting the assumptions. If the underlying distributions are bimodal or highly skewed, the F-test inferences will not be valid.

 

Remarkably greater the variance around the sample means, the samples must be widely dispersed around the grand mean, very likely not representing random the same population. However, when the sample means are very narrowly dispersed around their sample means, the samples are likely to be random samples from a common population.

 

Techniques of Analysis of Variance: The various methods of analysis of variance can studies under following classification.

One way Classification: Under one-way classification influence of any one factor is considered, in other words, the data are classified according to only one criterions.

 

(a) Direct Method: The following steps will be helpful in reaching to a valid inference from analysis of variance (Anova).

 

(1) Null   Hypothesis: We  may   set up   the  Null   hypothesis   as,

 

Ho: µ1= µ2  = …..=µk  i.e., the arithmetic means of population from which the samples are randomly drawn are equal to one another.

 

Alternative Hypothesis:The alternative hypothesis in this case can be set up as, H1 : µ1 ≠ µ2 ≠..=µk i.e. means of the population from which K samples were randomly drawn are not equal to each other.

 

(2) Test Statistic: Under Ho, we can take test statistic F (Variance Ration) as,

where, 12 = Variance between the samples means and 22 = Variance within the sample means

 

(1) Computation of Variance between the samples (SSC): The variance between samples (Group) shows the difference between the samples mean of each group and the overall mean weighted by the number of observations in each group. It takes into account the random observations from observation to observation. It also measures the difference from one group to another. For calculating thevariance between the samples we take the total of the square of the deviations of the means of various samples from the grand mean and divide this total by degree of freedom. The calculation of the variance between the samples may be performed as below:

 

(i)  Calculate the mean of each sample (column) i.e.,X̅1, X̅2, X̅3, X̅4, ….etc.,

 

(ii)  Compute the Grand Mean based on all items irrespective of column grouping. It is obtained as,

 

= ——

 

X1+X2+X3+…..

 

X=1+2+3+……

 

(iii) Compute the difference between the means of various samples (Columns) and the grand mean, ie.,

 

(X̅1- X̿), ((X̅2 – X̿) etc.

 

(iv) Square these deviations and sum it. This will give the sum of squares between the samples. This is also known as

 

(v) Divide the total obtained in (iv) by degrees of freedom (=C -1) amongst samples (or columns). This will give the mean of the sum of the squares of deviations between mean of the columns and the grand mean designated as msc. This is called the variance between the samples (amongst columns). This indicates the degree of explained variance due to sampling variations.

 

(2) Computation of variance within the samples: It is the sum of the squareof variation between the individual items and the sample mean (column means) and is denoted by SSE. It measures the inter-sample difference due to chance only. It also measures the variability around them of each group. Since the variability is not attached by group difference it can be considered a measure of the random variation of values within a group. The steps in calculating variance within the samples will be.

 

(i)  Compute the mean of each samples, i.e. x̅1, x̅2, x̅3, ….etc.,

 

(ii) Compute the deviations of the various (individual) items in a sample from the mean values of the respective samples.i.e., (x̅1- x̅), (x̅2 – x̅), (x̅3 – x̅) etc.

 

(iii) Square these deviations and obtain the total which gives the sum of squares within the samples and

 

(iv) Divide the total given in (iii) by degrees of freedom (v = N – C) where C refers to the number of samples and N refer to the total number of all the observations.

  (3) Computation of Total Sum of Squares of Variation (SSt): The total sum of squares refers to total of SSC and SSE. It is the sum square of deviations between the individual items and the grand mean.

 

This when divided by degrees of freedom (n – 1) gives the total variation comprising both the explained and the unexplained variance.

 

Analysis of Variance (Anova) Table

  • SST = Total sum of squares of Variances
  • SSC = Sum of squares between Samples (Column)
  • SSE = Sum of squares within samples (rows)
  • MSC = Mean square between samples
  • MSE = Mean square within samples

Remark:

 

The same steps (Procedure) for analysis of variance applicable for both the equal and unequal sample sizes

(4) Computation of F-Ratio: The F-value is the ratio of unexplained variance (MSC) and explained variance (MSE) and is derived by taking the ratio of higher over the lower ones of two variance (MSC) or (MSE).

Decision: Compare the computed value of F with the critical (tabulated) value of F for (c-1, n-c) degrees of freedom at specific level of significance (generally we take 5% or 1%) and use the following for reaching to a conclusion.

(i) If calculated value of f is greater than the tabulated value i.e. Fc>FT, the difference is significant and reject the null hypothesis or accept the alternative hypothesis.

(ii) If Fc< FT, the difference is insignificant and accept the null hypothesis and reject the alternative hypothesis. The difference have arisen due to fluctuations of sampling.

Rationable of F-Ratio Test: We know that the variation between the sample means reflect not only the effect of same chance forces (which has effected the variation within the sample, but also the effect of forces, if any, which cause the various sample means to differ from another. Thus, if there is any such force, i.e., if the hypothesis stands false (not true), the variation between the sample mean will lend to be larger than the variation within the samples. This precisely what the test is designed to identify.

Example: The following table gives the yields on 15 samples plots under three varieties of seed.

We have to find out if the average yields of land under different varieties of seed show significant difference.

 

Solution: Here we shall apply F-ratio test,

 

Null Hypothesis: H0: the average yields of land under different varieties of seed do not show significant differences.

 

Alternative Hypothesis: H1: the average yields of land under different varieties of seed do show significant difference.

 

3.  Decision: The calculated value of F (8.14) is greater than the table value of F at 5% level of significance at u1 = 2, u2 = 12 (=3.38), hence the difference in the mean values of the varieties is significant and we reject the null hypothesis. We conclude, therefore,that the average yields of land, under different varieties of seed show significant difference.

 

Example: To test the significance of the variation of the retail prices of a commodity in three principal cities Bombay, Calcutta and Delhi, four shops were chosen at random in each city and prices observed in rupees were as follows:

Bombay 16 8 12 14
Calcutta 14 10 10 6
Delhi 4 10 8 8

 

Do the data indicate that the prices in the three cities are significantly different.

 

Solution:

 

Since it is one way classification data we can test it by F-ratio test.

Decision Since the calculated value (2.62) is less than the table value of F at 5% level of significance for 2 & 9 degree of freedom (4.26), the difference is not significant. Hence accept the null hypothesis. We conclude, therefore, that the prices in the three cities are not significantly different.

 

(B) shortcut Method: The above method of calculating the sum of square of variance between samples and variance within the samples is not generally followed in practice as it is time consuming. An easier method known as shortcut method is usually followed The procedure of ANOVA under shortcut ,method is as follows:

Conclusion

 

Let us summarize, the analysis of variance is used to compare the mean of more than two groups.. The value of f is calculated as F = sample sum of squares between samples / sample sum of squares within the samples. The sample sum of squares between samples indicates the explained variation. The sample sum of squares within the samples indicates the unexplained variation. In the above modules we discussed both direct method and short cut method.

 

The short cut method is easier than the direct method. Based on the above procedure try to practice more application oriented problems which will give you more understanding.

 

 

you can view video on Analysis of Variance