27 Analysis of Variance and Experimental Design: testing for equality of k population means

Prof. Pankaj Madan

Introduction

Experimental Design of Analysis of Variance
Types of Experimental Design and Analysis and Variance
Assumptions for ANOVA
Steps of solving the experimental design
Test for the Equality of k Population Means
Test for the Equality of k Population Means: An Observational Study
Summary
Self-Check Exercise

Quadrant-I

Analysis of Variance and Experimental Design: testing for equality of k population means

Learning Objectives:

After the completion of this module the student will understand:
Experimental Design of Analysis of Variance
Types of Experimental Design and Analysis and Variance Assumptions for ANOVA
Steps of solving the experimental design
Test for the Equality of k Population Means
Test for the Equality of k Population Means: An Observational Study

Introduction of ANOVA

The analysis of variance frequently referred to by the contraction ANOVA is a statistical technique specially designed to test whether the means of more than TWO quantitative populations are equal.

This technique developed by R.A. FISHER. In 1920 is capable of fruitful application to a diversity of practical problems. Basically it consists of classifying and cross-classifying statistical results and testing whether the means of a specified population differ significantly.

Introduction to Experimental Design and Analysis of Variance

The statistical studies can be classified as being either experimental or observational. In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest but as far as in an observational study, no attempt is made to control the factors. One of the important points is that in this technique cause and effect relationship are easier to establish in experimental studies than in observational studies.

Analysis of variance (ANOVA) is used to analyze the data obtained from experimental or observational studies. In this study factor consider as a variable that the experimenter has selected for investigation also a treatment is a level of a factor it can be explained by an example, if location is a factor, and then a treatment of location can be Delhi, Ghaziabad, and Noida. So it concludes that experimental units are the objects of interest in the experiment. A completely randomized design is an experimental design in which the treatments are randomly assigned to the experimental units.

Types of Experimental Design and Analysis and Variance

Three types of experimental designs are introduced.

A completely randomly design
A randomized block design
A factorial experiment

Assumptions for ANOVA

For each population, the response (dependent) variable is normally distributed.

The variance of the response variable, denoted s 2, is the same for all of the populations. The observations must be independent.

Steps for calculation

Between-Treatments Estimate of Population Variance s 2

The estimate of s 2 based on the variation of the sample means is called the mean square due to treatments and is denoted by MSTR.

k is the number of treatments (total of samples)

n _j is the number of observations in treatment j

x_jis the sample mean of treatment j

x is the overall mean, i.e. the average value of ALL the observations from all the treatments

Within-Treatments Estimate of Population Variance s 2

The estimate of s 2 based on the variation of the sample observations within each sample is called the mean square due to error and is denoted by MSE.

Whereas, MSE is mean square due to error

Test for the Equality of k Population Means

Hypothesis

H0 (null hypothesis): m1 = m2 = m3 = . . . = mk

Ha (alternative hypothesis): Not all population means are equal

Test Statistic

(MSTR= Mean square due to treatment, MSE= Mean square due to error)

Rejection Rule

p-value Approach: Reject H0 if p-value < a

Critical Value Approach: Reject H0 if F > Fa

Whereas the value of Fa is based on an F distribution with k – 1 numerator d.f. and nT – k denominator d.f.

6. Test for the Equality of k Population Means: An Observational Study

1. Determine one estimate of the population variance from the ‘variance among the sample means’.

2. Determine a second estimate of the population variance from the ‘variance within the samples’.

3. Compare these two estimates: If they are approximately equal in value, accept the null hypothesis.

where x = sample mean

x = grand mean

k = number of samples

Number of degree of freedom in the numerator of the F ratio = (number of samples – 1).

Number of degree of freedom in the denomination of the F ratio = (nj–1) = nT-K

n_j = size of jth sample n_T = n_j = total sample size

k = number of samples

1. See the F-table (at particular significance level) and find out value of FTab.

2. If Fcal > FTab Reject the hypothesis

Fcal < FTab Accept the hypothesis

Example:

After the completion of the training program, the company’s statistical staff chose 16 new employees assigned at random to the 3 training methods to study which out of the three training programs is best.

Ans. Table 1. Daily production of 16 new employees

HYPOTHESIS

The three samples could have drawn from H0 : = 2 = 3 Null Hypothesis population having the same mean . Means method of training does not influence the productivity of the employee.

H1: 1, 2 and 3 are not equal alternative hypothesis.

Step – 1: Calculate the variance among the sample means

Step – 2: Calculating the variance within the samples

Step – 3: Compare the 100 estimates of the population variance by reputing their ratio

Step – 4: Testing of Hypothesis

Calculate the number of degrees of freedom in the numerator of F ratio.

Number of degrees of freedom for numerator = (number of samples – 1)

= 3 – 1

= 2

Number of degrees of freedom for denominator = (nj-1) = nT-K

= (5-1)+(5-1)+(6-1)

= 16 – 3

= 13

Where nT= total sample size, K=Types of samples

Now, suppose the director wants to test at 0.05 level the hypothesis, look at table value of F-test or particular numerator value (2) and denominator value (13).

The table value of F = 3.81

and since the table value 3.81 sets the upper limit of acceptance and Fcal < Ftab

The Null hypothesis is accepted.

7. Short-Cut Method

The values of SSTR and SSE can be calculated by applying the following short-cut methods:

Calculate the grand total of all observations in sample, T

Calculate the correction factor ;

Find the sum of squares of all observations in samples from each of r samples and subtract CF from this sum to obtain the total sum of squares of deviations SST:

Coding Method: Sometimes the method explained above takes a lot of computational time due to the magnitude of numerical values of observations. The coding method is based on the fact that the F-test statistic used in the analysis of variance is the ratio of variances without unit of measurement. Thus its values does not change if an appropriate constant value is either multiplied, divided, subtracted or added to each of the observations in the sample data. This adjustment reduces the magnitude of numerical values in the sample data and reduces computational time to calculate F value without any change.

SUMMARY

This module provides a statistical test concerning if the means of several groups are all equal and it is a simplest form. ANOVA is equivalent to Student’s t-test when only two groups are involved. ANOVA refers to statistical models and associated procedures, in which the observed variance is partitioned into components due to different explanatory variables. If a statistically significant effect is found in ANOVA, one or more tests of appropriate kinds will follow up, in order to assess which groups are different from which other groups or to test various other focused hypothesis.

Learn More:

Sharma, J K (2014), Business Statistics, S Chand & Company, N Delhi.
Bajpai, N (2010) Business Statistics, Pearson, N Delhi.
Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer.
Darrell Huff (2010), How to Lie with Statistics, W. W. Norton, California.
K.R. Gupta (2012), Practical Statistics, Atlantic Publishers & Distributors (P) Ltd., N. Delhi.