28 Analysis of Variance and Experimental Design: One-Way ANOVA

Prof. Pankaj Madan

 

  • Introduction
  • Role of Analysis of Variance
  • Analysis of variance approach
  • Assumption for Analysis of variance
  • Sum of Squares, Degrees of freedom, Variance and Error Fundamental Principles of Analysis of Variance:
  • One-Way ANOVA Limitations of Analysis of Variance
  • Summary
  • Self-Check Exercise with solutions

 

Quadrant-I

 

Analysis of Variance and Experimental Design: One-Way ANOVA

 

Learning Objectives:

 

After the completion of this module the student will understand:

  •  An Introduction to Analysis of Variance
  • Role of Analysis of Variance
  • Analysis of variance approach
  • Assumption for Analysis of variance
  • Sum of Squares, Degrees of freedom, Variance and Error
  • Fundamental Principles of Analysis of Variance
  • Limitations of Analysis of Variance

 

1.  Introduction

 

When we note the observations from an experiment pertaining to yield or measurement of any other character, we find that the observations vary from one another greatly. This variation is due to a number of factors known as Source of Variation, and the portions of variation caused by different sources are known as Components of Variation. The statistical analysis aims at assessing this total variation present and then apportioning it between the various factors responsible for the same. The analysis of variance is a simple arithmetical process of sorting out the components of variation in a given data. In the words of Fisher ‘It is a tool by which the total variation may be split up into several physically assignable components’.

 

2.   Role of Analysis of Variance

 

The role of analysis of variance is not the only one described above, but it plays a dual role also.

 

(A) In one, it sorts and estimates the variance components.

 

(B)  In the other, it provides the test of significance.

 

3.  Assumption for Analysis of variance

 

The first step in the analysis of variance is to partition the total variation in the sample data into the following two component variations in such a way that it is possible to estimate the contribution of factors that may cause variation.

  1. The amount of varia tion among the sample means or the variation attributable to the difference among sample means. This variation is either on account of difference in population/sample (treatment) or due to element of chance. This difference is denoted by SSC or SSTR.
  2. The amount of variation within the sample observations. This difference is considered due to chance causes or experimental (random) errors. The difference in the values of various elements in a sample due to chance is called an estimate and is denoted by SSE.

 

4.  Assumption for Analysis of Variance

 

The following assumptions are required for analysis of variance:

  1. Each population under study is normally distributed with a mean µr that may not be equal but with equal variance 2 .
  2. Each sample is drawn randomly and is independent of other samples.

 

5. Sum of Squares, Degrees of Freedom, Variance and Error

 

Before we start learning the principles of analysis of variance it is essential to know the relation among the variance, sum of squares, degrees of freedom and error.

 

(i)  Sum of Square:

 

It means that the sum of squares of the deviations of the variates from their mean. If X1, X2, ……… XN are the N variates with mean ̅,

 

Sum of Squares (or S.S.) = ∑( − )

Or S.S. = ( 2 + 2 + ⋯ + 2 )- (∑ )2

Here, ∑ = ( 1 + 2 + ⋯ + ) =

and (∑ )2 = ( )2 = ( . .)2

This term is known as ‘Correction Factor

Thus, sum of squares = ∑     2 −   .  .

 

(ii)  Degrees of Freedom:

 

The number of degrees of freedom is one less than the number of variates in the sample concerned. In the above case it will be (N-1). If the number of Population/sample (treatments) is ‘t’ degrees of freedom will be (t-1). N= Number of observations

t= Number of categories

 

(iii) Variance

 

It is obtained by dividing the Sum of Squares by the corresponding Degrees of Freedom.

Variance =

 

For finding out the components of variation due to different factors (sources of variation), it is necessary to calculate the sum of squares and the corresponding degrees of freedom for every factor and then the variance by the above formula.

 

The ‘Variance’ is generally spoken of as ‘mean Square’ or ‘Mean sum of Squares’.

 

(iv) Error

 

The component of variation in the total variation, which remains unexplained by the different sources of variation, is considered to be due to ‘Error’.

 

6.      Fundamental Principles of Analysis of Variance: One-way ANOVA

 

Many business applications involve experiments in which different populations (or groups) are classified with respect to only one attribute of interest such a (i) percentage of marks secured by students in a course, (ii) flavor preference of ice cream by customers,

 

(iii) yield of crop due to varieties of seeds, and so on. In all classification such cases observations in the sample data are classified into several groups based on a single attribute and is called one-way classification.

 

As mentioned before, for all the theoretical purposes we refer populations (i.e., several groups classified based on a single factor or criterion in a sample data) as Population/sample (treatment).

 

6.1.Table of one criterion classification of data

The values of   ̅ are called sample means and   ̿is the grand mean of all observations (or measurements) in all the samples.

 

Since there are k rows and r columns in table, therefore total number of observations are rk=n, provided each row has equal number of observations. But, if the number of observations in each row varies, then the total number of observations is n1+ n2 +…..+ nr = n.

 

6.2.Step I : Assumption of null hypothesis

 

State the null and alternative hypothesis to test the equality of population means

H0 = µ1 = µ2 = ……= µr ………….Null hypothesis
H1 = Not all µjs are equal (j= 1, 2, …….r) ……Alternative hypothesis

µ=general mean

 

6.3.Step II: Calculate total variation

 

If a single sample of size n is taken from the population, then estimate of the population variance based on the variance of sampling distribution of mean is given by

 

2 = ∑( − ̅)2/-1

Where s2 = sum of square of deviations of sample values

(x − x̅) = standard deviation

 

The numerator in s2 is called sum of squares of deviations of sample values about the sample mean   ̅and is denoted as SS. Consequently ‘sum of square’ is a measure of variation. Thus when SS is divided by df, the result is often called the mean square which is an alternative term for sample variance.

 

Total variation is represented by the ‘sum of squares total’ (SST) and is equal to the sum of squared differences between each sample value from the grand mean   ̿.

 

= ∑ ∑(    −   ̿)2

=1  =1

Where SST= Total Sum of Square

 

r = number of samples/population (or treatment level)

 

= size of the jth sample.

 

xij= The ith observation value within the sample from jth population

 

The total variation is divided into two parts as shown below:

 

(i) Variation between (or among) sample means {Also called sum of squares for population/sample (treatment)}

 

(ii)  Variation within the samples values (Also called sum of squares for error)

 

6.4. Step III: Calculate variation between sample means

 

This is usually called the ‘sum of squares between’ and measures the variation between samples due to population/sample (treatments). In statistical terms, variation between samples means is also called the between-column variance. The procedure is as follows:

a) Calculate mean values ̅ ̅ ̅ of all r samples.
1, 2 3,……..
(b) Calculated grand mean ̿= 1 ( ̅ ̅ ̅ ̅) =
1, 2 3,……..
(c) Calculate difference between the mean of each sample and grand mean as   ̅− 1

  ̿,   ̅−   ̿,… … … ,   ̅−   ̿.Multiply each of these by number of observations in the corresponding sample and add. The total gives the sum of the squared difference between the sample means in each group and is denoted by SSC or SST.= ∑   (  ̅−   ̿)2

=1

 

This sum of square is also called sum of squares for population/sample (treatment) (SSTR) SSTR= Sum of Square due to population/sample (treatment)

 

nj= size of the jth sample(  ̅−   ̿) = Difference between the mean of each sample and the general mean

 

6.5.Step IV: Calculate variation within samples

 

This is usually called the ‘sum of squares within’ and measures the difference within samples due to chance error. Such variation is also called within sample variance. The procedure is as follows:

(a) Calculate mean values   ̅  ̅        ̅ of all r samples.

1,  2,……,

(b)   Calculate difference of each observation in r samples from the mean values of the respective samples.

(c)    Square all the differences obtained in Step (b) and find the total of these differences. The total gives the sum of the squares of differences within the

 

samples and is denoted by SSE.

 

= ∑ ∑( − ̅)2

 

=1 =1

 

This sum is also called the sum of squares for error, SSE= SST-SSTR.

Whereas SSE= Sum of Square due to Error

SST= Total Sum of Square

SSTR= Sum of Square due to population/sample (Treatment)

xij= The ith observation value within the sample from jth population ̅   = Mean value of the jth sample

−   ̅) = Difference of each observation from the mean value of respective observation.

 

6.6.Step V: Calculate average variation between and within samples-mean squares

 

Since r independent samples are being compared, therefore r-1 degrees of freedom are associated with the sum of the square among samples. As each of the r samples contributes nj-1 degrees of freedom for each independent sample within itself, therefore there are n-r degrees of freedom associated with the sum of squares within samples. Thus total degrees of freedom equal to the degrees of freedom associated with SSC (or SSTR) and SSE. That is,

 

Total df = Between samples/population (treatments) df + Within samples (error) df

n-1= (r-1) + (n-r)

whereas; n-1=Total degree of freedom

r-1= degree of freedom between sample/population (treatment)

n-r= degree of freedom within sample (Error)

 

When these ‘sum of squares are divided by their associated degrees of freedom, we get the following variances or mean square terms:

 

MSTR=SSTR/r-1; MSE=SSE/n-r; MST=SST/n-1

 

MSTR= Mean Sum of Square (population/sample/Treatment)

 

MSE= Mean Sum of Square (Error)

 

MST= Mean Sum of Square (Total)

 

It may be noted that the quantity MSE= SSE/(n-r) is a pooled estimate of σ2(weighted average of all r sample variances whether H0 is true or not)

 

6.7.Step VI:

 

Apply F-test statistic with r-1 degrees of freedom for the numerator and n-r degrees of freedom for the denominator

 

F=σbetween2/σwithin2=SSTR⁄r − 1/SSE⁄n − r=MSTR/MSE

 

SSTR= Sum of Square due to population/sample (Treatment)

 

SSE= Sum of Square due to Error

 

MSTR= Mean Sum of Square (Population/Sample/Treatment)

 

MSE= Mean Sum of Square (Error)

 

 

6.8.Step VII: Make decision regarding null hypothesis

 

If the calculated value of F-test statistic is more than its right tail critical value F(r-1, n-r) at a given level of significance and degrees of freedom r-1 and n-r, then reject the null hypothesis.

 

Reject H0 if the calculated value of F> its critical value Fα(r-1, n-r) Otherwise accept H0

 

7.      Limitations of Analysis of Variance

 

In order that the F-test in the analysis of variance may hold good, the necessary condition is that the experimental errors should be independently and normally distributed with mean zero and variance σ2, the population variance. It is also essential for the analysis of variance that the yield be composed additively of environmental effects, Sample/population (treatment) effects and the zero mean and constant variance. In practice we find that lack of normality in data occurs frequently and hence, the data require transformation to some other scale so that a closer approach to normality in data occurs frequently and hence, the data require transformation to some other scale so that a closer approach to normality may be obtained.

 

If transformation is used, the best estimates of the treatment means on the original scale are obtained by transforming back the means of the transformed variate and then transforming back to the original scale.

 

8. Summary

 

 

MSTR= Mean Sum of Square (Sample/Population/Treatment)

 

SSE= Sum of Square (Error)

 

MSE= Mean Sum of Square (Error)

 

SSTR= Sum of Square (Population/Sample/Treatment)

 

9.      Self-Check Exercise with solutions

 

Q.1. A study investigated the perception of corporate ethical values among individuals specializing in marketing. Using α = 0.05 and the following data (higher scores indicate higher ethical values), test for significant differences in perception among three groups.

Marketing Manager Marketing Research Advertising
6 5 6
5 5 7
4 4 6
5 4 5
6 5 6
4 4 6

 

Solution:

 

Assumption of Null Hypothesis (H0): Let us assume the null hypothesis that there is no significant difference in ethical values among individuals specializing in marketing. Calculations for analysis of variance are as under:

There are r=3 treatments (samples/population) with n1 = n2= n3 = 6 and n=18.

T= Sum of all the observations in three samples

= ∑ 1 + ∑ 2 + ∑ 3 = 30 + 27 + 36 = 93

CF = Correction Factor = T2 = (93)2 = 480.50

n  18

SST = Total sum of squares

= (∑  1 + ∑ 2 + ∑ 3) – CF = (154 + 123 + 218) – 480.50

= 14.50

 

SSTR = Sum of squares between the samples (treatments/sample/population)

=  ((∑ 1)2 + (∑ 2)2 + (∑ 3)2) −123

=     ((30)62 + (27)62 + (36)62)- 480.50

=    (9006 + 7296 + 12966)- 480.50

=   (150 + 121.5 + 216) – 480.5 = 7 SSE = SST – SSTR = 14.50 – 7 = 7.50

(SSE=Sum of Square due to Error

SST= Sum of Square Total

Sum of Square due to Treatment/Sample/Population)

 

Degree of freedom: df1 = r-1 = 3-1 = 2 and df2 = n-r = 18-3 = 15

 

Thus MSTR = = 7/2 = 3.5

 

MSE = = 7.50/1 = 0.5

 

The table value of F for df1 = 2, df2 = 15, and α = 0.05 is 3.68. Since calculated value of F= 7 is more than its table value, the null hypothesis is rejected. Hence, we conclude that there is significant difference in ethical values among individuals specializing in marketing.

 

Q.2.What is Square root transformation for enumeration data?

 

Ans.The counts of number of individuals like the insects caught in the trap or the weeds in a plot are found to be distributed with variance proportional to the mean and non-additive effects. The most appropriate transformation is √ transformation where X is the actual count. If some of the counts are very small or zeros, the appropriate transformed variate will be√(   + 12 .

 

Learn More:

  1. Sharma, J K (2014). In: Business Statistics, II eds., S Chand & Company, N Delhi.
  2. Chandel, S.R.S. (2006). In: A Handbook of Agricultural Statistics, Anchal Prakashan mandir, Kanpur
  3. http://www.biostathandbook.com/onewayanova.html
  4. Hays, W.L. (1994). In: Statistics, fifth ed, Forth Worth: Harcourt Brace College Publishers.