31 Analysis of Variance and Experimental Design: Conclusion
Dr Deependra Sharma
Learning Objectives:
- After the completion of this module the student will understand:
- Why different Designs of experiments
- General purpose of ANOVA
- ANOVA test
- Summary table of One Way ANOVA
- Summary table of Two Way ANOVA
- Summary table for Analysis of Variance: Complete Randomized Design
- Summary table for Analysis of Variance: Randomized Block Design
1. Why different Designs of Experiments (Tests)?
If we want to set an experiment (or test), which shift is more productive or difference in the production levels on various machines and so on in a manufacturing organization that is running production of goods in three shifts, then we need to design this experiment or test. Each problem may have a different design, a design for the experiment defines the size and the number of experimental units, the manner in which the treatments are allotted to the units and also the appropriate type of grouping of testing units. Experiment (test) starts with a problem, an answer to which is obtained from interpretation of a set of observation collected suitably. For this purpose a set of experimental units and adequate experimental material are required. Equal sized plots of land, a single or a group of plants, etc. are used for agricultural experiments. For industrial experiments, machines, man, methods and material form the experimental unit. For social experiments (tests), products, services, persons form the experimental units. Requirement of various types of designs ensure validity, interpretability and accuracy of the results obtainable from an analysis of the observations. These objectives are served by principles of
I. Randomization, which defines the manner of allocation of the treatments (ways of arrangement) to the
allocation units.
II. Replication which specifies the number of units to be provided for each of the treatments (number of
units in each arrangement)
III. Error control which increases the precision by choosing appropriate type of experimental units and
also their grouping
For these different arrangements of population or samples, various experiment or testing techniques are used i.e. One way ANOVA, Two way ANOVA, MANOVA etc.
2. Statistical Models and Analysis of Variance
A statistical model is actually a linear relation of the effects of the different levels of a number of factors involved in an experiment along with one or more terms representing error effects. The effects of any factor can be either fixed or random. For example, the effects of two well-defined levels of irrigation are fixed as each irrigation level can be reasonably taken to have a fixed effect. Again, if the variety of a crop is taken as a factor with a number of varieties of the crop as its levels, then the effects of the varieties will be random if these varieties are selected at random from a large number of varieties. The random effects can again belong to a finite or an infinite population. The errors effects are always random and may belong either to a finite or infinite population.
A model in which each of the factors has fixed effects and only the error factor is random is called a fixed model, models in which some factors have fixed effects and some random effects are called mixed models. Again, models where all the factors have random effects are called random models. Depending on the fitness or otherwise of the random effect populations, mixed and random effect models can be of many different types.
In fixed effects models, the main objectives are to estimate the effects, find a measure of variability among the effects of each of the factors and finally find the variability among the effects of each of the factors and finally find the variability among the error effects . in random effect models the main emphasis is on estimating the variability among the effects of the different factors. The methodology for obtaining expressions of variability is, however, mostly the same in the different models, though the methods appropriate for their testing are different. A fixed effect modals for, say, two factors is written as bellows:
Yijk = µ+ai + bj +eijk
Where Yijk is an observation coming from a unit defined by the levels, i, j, k of the factors involved , ai is the effect of the i-th level of one factor , say, A and bj is the effect of the jth level of another factor, say, B and eijk is an error effect of the jth level of another factor, say, B and eijk is an error effect which is assumed to be normally and independently distributed with zero mean and a constant variance. These assumptions regarding behavior of eijk are necessary for drawing inference by adopting known statistical methodology. The methodology that is adopted is the analysis of variance technique by which interference is drawn by applying F test. For the F test, it is necessary that the observations by adopting various designs to be discussed subsequently it have to be ensured that these assumptions are satisfied by the observations otherwise no valid interference can be drawn from their analysis.
3. General purpose of ANOVA
ANOVA is a statistical technique that assesses potential difference in a scale level dependent variable having two or more categories. Researchers and students use ANOVA in many ways. The use of ANOVA depends on the research design. Commonly, ANOVAs are used in three ways: One way ANOVA, Two way ANOVA and N-way multivariate ANOVA. For example, an ANOVA can examine potential differences in IQ scores by country. ANOVA table is a standard table used to summarize the analysis of variance calculations and results.
One way ANOVA refers to the number of independent variables— not the number of categories in each variable. One way ANOVA has just one independent variable. One way analysis of variance is a statistical technique in which only one criterion (variable or attribute) is used to analyse the difference between more than two population groups. For example, difference in IQ can be assessed by country, and country can have 2, 20, or more different countries in that variable.
Two way ANOVA is also called factorial ANOVA. Factorial ANOVA can be balanced (have the same number of participants in each group) or unbalanced (having different number of participants in each group). Two-way analysis of variance is a statistical technique in which two criteria (variable or attribute) are used to analyze the difference between more than two population means.
N-way ANOVA or MANOVA: A researcher can use many independent variables and this is an n-way ANOVA. For example, potential difference in IQ scores can be examined by country, gender, age group, ethnicity etc. simultaneously.
4. ANOVA Test procedure
In an ANOVA, a researcher first sets up the null and hypothesis. The null hypothesis assumes that there is no significant difference among the groups. The alternative hypothesis assumes that there is a significant difference among the groups. Student must calculate the F-ratio and probability of the F. Next, the student should compare the critical p-value of the F-ratio with the established alpha (α). In general terms, if the p value associated with the F is smaller than p=0.05, then the null hypothesis will be rejected and the alternative hypothesis is accepted. Rejecting the null hypothesis, one concludes that the mean of the groups are not equal.
5. F-test
The aim of analysis of variance is to see whether there exist any real difference between the population/sample (treatments) or they are only the errors of sampling. For that, we start with the null hypothesis that all the population/sample (treatments) are equal so far as their effects on the yield of the crop or any other characteristic are concerned, i.e., the difference between them are zero. If we suppose τ for the sample/population (treatment) effect, our null hypothesis will be
τ1= τ2= τ3=………..= τn
Either this hypothesis will be disapproved or accepted within the limits of chance error. This can be tested by calculating the ratio between the sample/population (treatment) variance and error variance and then testing its significance by comparing it with the expected value of the variance ratio at the desired probability level. This ratio between the two variances is expressed by the symbol F, and the test is known as F-test.
If V1 and V2 are the two variances based on v1 and v2 degree of freedom respectively.
The observed value of F is compared with the value given in the Table for v1 and v2 degree of freedom at any desired probability level. The levels in which we are generally interested are 0.05, 0.01 and 0.001 or 5%, 1% and 0.1%. For example, we have
Population/sample (Treatment) variance or Mean Square Population/Sample/Treatment (VT) = 7.5
Error variance or Mean square Error (VE) = 3.5
= = 7.5 = 2.4 3.5
D.F.; v1=4 and v2=8
Now the value of F at 5% level of significance for v1=4 and v2=8 is 3.84. The observed value of F being less than F5% is non significant, proving thereby, that there are no significant differences among the sample/population (treatments).
Had the observed value of F been proved significant, the interpretation would have been that the differences among the sample/population (treatments) are real and not due to errors of sampling.
6. Summary Table of One Way ANOVA
Whereas; SST=Total sum of square
SSTR= Sum of square for sample/treatment/population SSE= Sum of square for error n-1=Total degree of freedom
r-1=Degree of freedom between sample/population (treatment)
n-r= Within samples (error) degree of freedom
Since r independent samples are being compared, therefore r-1 degrees of freedom are associated with the sum of the square among samples. As each of the r samples contributes nj-1 degrees of freedom for each independent sample within itself, therefore there are n-r degrees of freedom associated with the sum of squares within samples.
7. Summary Table of Two Way ANOVA
c= number of columns and r= number of rows
c-1= degree of freedom between column
r-1= degree of freedom between rows
(c-1)(r-1)=degree of freedom for residual error
SSTR=variation between columns
SSR=variation between rows
SSE=actual variation due to random error
SST= total variation
MSTR=mean square for variation between column
MSR=Mean square for variation between rows
MSE=mean square for error
8. Summary table for Analysis of Variance: Complete Randomized Design
Where (N-1) = total degree of freedom
(n-1) = Sample/population (treatment) degree of freedom
(N-n) = remaining degree of freedom
9. Summary table for Analysis of Variance: Randomized Block Design
Where, (nr-1) = total degree of freedom
r = number of blocks
n = number of sample/population (treatments)
VB= Mean Sqaure (Block)
VT= Mean Square (Sample/Population/Treatment)
VE= Mean Square (Error)
Questions 1: Soap foam filled (in gms) by 6 machines in three different shifts has been shown below. Each machine fills 8 bottles in one shift. Find if the productivity of each machine varies in the three shifts or not.
Machine | 1 | 2 | 3 | 4 | 5 | 6 |
First Shift | 14 | 16 | 12 | 26 | 20 | 17 |
15 | 18 | 16 | 22 | 23 | 15 |
16 | 12 | 23 | 30 | 24 | 19 | |
17 | 18 | 20 | 32 | 30 | 18 | |
16 | 26 | 32 | 24 | 32 | 20 | |
19 | 12 | 22 | 25 | 28 | 18 | |
18 | 24 | 20 | 30 | 20 | 24 | |
16 | 12 | 22 | 38 | 24 | 18 | |
Machine | A | B | C | D | E | F |
Second Shift | 16 | 28 | 24 | 38 | 30 | 24 |
22 | 26 | 24 | 30 | 22 | 20 | |
20 | 20 | 26 | 36 | 29 | 22 | |
19 | 24 | 28 | 26 | 40 | 22 | |
15 | 30 | 20 | 38 | 22 | 20 | |
17 | 20 | 24 | 36 | 32 | 26 | |
16 | 23 | 30 | 30 | 20 | 18 | |
19 | 20 | 28 | 30 | 24 | 22 |
Machine | A | B | C | D | E | F |
Third Shift | 17 | 16 | 25 | 39 | 24 | 24 |
14 | 16 | 27 | 28 | 19 | 30 | |
16 | 26 | 22 | 25 | 18 | 18 | |
15 | 30 | 21 | 35 | 21 | 16 | |
18 | 20 | 20 | 35 | 24 | 26 | |
16 | 18 | 19 | 40 | 26 | 20 | |
20 | 15 | 45 | 39 | 27 | 28 | |
15 | 27 | 38 | 27 | 24 | 16 |
Answer:
Table of cell totals (Each total based on 8 observations)
Sum of Square due to bottles = 81807.75-78120.25 = 3687.50 Sum of Square due to Machines= 81178.33-78120.25=3058.08 Sum of Square due to Shifts=78491.87-78120.25=371.62 Interaction Sum of Square= 3687.50-3058.08-371.62=257.80 Total Sum of Square= 84782.00-78120.25=6661.75
ANOVA table
** Indicates significant difference at 1% level of significance
SUMMARY
In this module, we have learnt that each business or social problem may have a different statistical testing design; a design for the experiment defines the size and the number of experimental units, the manner in which the treatments are allotted to the units and the appropriate type of grouping of testing units. We also understood that in a statistical model number of factors are involved with one or more terms representing error effects. The effects of any factor can be either fixed or random. Explanation and equations of One-way ANOVA, Two-way ANOVA and MNOVA have also been mentioned in this module with generic procedure to solve the ANOVA problem.
Learn More:
- http://www.statisticssolutions.com/manova-analysis-anova/
- Rutherford, A. (2001). Introducing ANOVA and ANCOVA: A GLM approach.Thousand Oaks, CA: Sage Publications.
- http://www.biostathandbook.com/onewayanova.html
- Chandel, S.R.S. (2006). In: A Handbook of Agricultural Statistics, Anchal Prakashan mandir, Kanpur.
- Sharma, J K (2014). In: Business Statistics, II eds., S Chand & Company, N Delhi.