30 Analysis of variance and Experimental Design: An Introduction to Experimental, Randomized and Block Design

Prof. Pankaj Madan

 

Learning Objectives:

  • After the completion of this module the student will understand:
  • An Introduction to Experimental Design
  • Complete Randomized Design
  • Randomization of sample/population (treatments)
  • Merits and Demerits of Complete Randomized Design
  • Applications of complete randomization
  • Randomized Block Design
  • Advantages of the Randomized Block Design

 

1.    Introduction

 

Experimental design is a way to carefully plan experiments in advance so that your results are both objective and valid. Ideally your experimental design should:

 

Describe how participants are allocated to experimental groups. A common method is completely randomized design, where participants are assigned to groups at random. A second method is randomized block design, where participants are divided into homogenous blocks before being randomly assigned to groups.

 

Minimize or eliminate confounding variables, which can offer alternative explanations for the experimental results.

 

Allow you to make inferences about the relationship between independent variables and dependent variables.

 

Reduce variability, to make it easier for you to find differences in sample/population (treatment) outcomes.

 

The choice of experimental design depends on the number and nature of the sample/population (treatments) under study. It also depends on the object of the experiment. Another consideration for the choice of the design is the question of available resources.

 

Some considerations under which the different designs are appropriate are as follows:

 

2.      Complete Randomized Design

 

This is the simplest type of design in which the whole experimental material is divided into a number of experimental units depending upon the number of sample/population (treatments) and the number of replications for each. After that the sample/population(treatments) are allotted to the units entirely by chance. In case of field experiments, the whole field is divided into a required number of equal plots and then the sample/population (treatments) are randomized in those plots.

 

If there are 5 sample/population (treatments) A, B, C, D and E and 4 replication to each, the number of plots will be 20 and each sample/population (treatment) will be allotted to four plots selected at random by means of random numbers.

 

3.      Randomization of sample/population (treatments)

 

Here since the number of units is 20, a two digit random number table will be consulted and a series of 20 random numbers will be taken excluding those which are greater than 20. Suppose the random numbers are 4, 18, 2, 14, 3, 7, 13, 1, 6, 10, 17, 20, 8, 15, 11, 5, 9, 12, 16, 19.

 

After this the plots will be serially numbered and the sample/population (treatment) A will be allotted to the plots bearing the serial numbers 4, 18, 2 and 14, sample/population (treatment) B will be allotted to the plots bearing the serial numbers 3, 7, 13 and 1, and so on for the other sample/population (treatments) C.D. and E.

 

In a similar way the randomization can be done for any number of sample/population (treatments).

 

4.      Structure of Analysis of Variance

 

If we suppose the number of sample/population (treatments) to be n, and the number of replications to be r for every sample/population (treatment), the total number of experimental units will be nr = N. If the sample/population (treatments) have varying number of sample/population (treatments), r1, r2, r3……..rn then the total number of units (=N) will be given by

 

N   = r1 + r2 + r3 ……+rn Whereas r= number of replication

 

In this design, the total number of degrees of freedom will be divided into two parts representing the independent comparisons. These two will be the two independent sources of variation.

 

(a)    Between sample/population (treatments)

 

(b)   Within sample/population (treatments) (Within sample/population/treatment components provides a basis for the estimation of error)

 

Thus the structure of analysis of variance will be as follows:

 

Whereas n= number of sample/population (treatments)

N= Total number of observations

 

Here, since the total number of observations is N, the total degrees of freedom will be (N-1). Similarly, population/sample (treatment) degrees of freedom will be (n-1), one less than the number of population/sample (treatments), and the remaining (N-n) degrees of freedom will be for ‘Within sample/population (Treatments)’ or ‘Error’.

 

5.      Standard Errors

 

The standard error of the difference between the two population/sample (treatment) means based on r1 and r2 replications is estimated by the following relation:

1         1

( .  . )       . = √   ( 1 + 2)

 

Where VE is the pooled error variance or error mean square. If r1 = r2= r the formula reduces to √2VrE  . The degree of freedom for t-test are the error degrees of freedom.

 

Where r is the number of replication.

 

6.      Merits and Demerits of Complete Randomized Design Merits

 

(i)  In this design any number of sample/population (treatments) and replicates may be used. The number of replicates may be used. The number of replicates can also be varies at will form population to population (treatment to treatment).

 

(ii) The statistical analysis of the data is very easy and it remains easy even if the numbers of replicates are not the same for all the sample/population (treatments).

 

(iii) The method of analysis remains simple when the results from some units are missing or rejected

 

(iv) The relative loss of information due to missing data is smaller than that with any other design.

 

(v) This design is especially useful in small experiments where the supply of experimental material is scarce and homogeneous, as the whole of the material is utilized in the experiment.

 

(vi) The design provides maximum number of degrees of freedom for the estimation of error as compared with other design, for a given number of sample/population/treatments and a given number of experimental units.

 

Demerits

 

There is one and the main objection in this design and that is on the grounds of accuracy. Since there is no restriction on the randomization of the population/sample (treatments), we cannot be sure about the fact that the units receiving one population/sample (treatment) are similar to those receiving the other population/sample (treatment) and, therefore, the whole of the variation among the units enters into the experimental error.

 

 

7. Applications of Complete Randomization

 

This design is especially advantageous and appropriate under the following circumstances:

 

(i)  Where the experimental material is limited in quantity and homogeneous.

 

(ii) Where it is expected that some of the units will be destroyed or will fail to respond.

 

(iii) In small experiments where the increased accuracy from the alternative design is not sufficient to exceed in importance the loss of error degrees of freedom.

 

8.      Randomized Block Design

 

In this design the whole experimental material is divided into homogeneous groups, each of which constitutes a single replication. Each of these groups is further divided into a number of experimental units which are equal in all respects. The sample/population (treatments) are applied to these units by any random process. In case of field experiments, if it is observed that the fertility gradient of the field is in one direction, the whole field may be divided into a number of equal plots. The number of plots in each block is equal to the number of sample/population (treatments), so that each block is a replicate of each sample/population (treatment).

 

The following important points are to be kept in mind for this design:

 

(1) In this design the number of blocks must be equal to the number of replications fixed for each sample/population (treatment).

 

(2)   The number of plots in each block should be equal to the number of sample/population (treatments).

 

(3)   An important and essential point, on which the attention is kept, is that the experimental errors within each block are to be kept as small as practically possible and the variation from block to block as great as possible. In this way all the Population/sample (treatments) which are assigned to one block, experience the same type of environmental effects, and are, therefore, comparable.

 

(4)   Randomization of population/sample (treatments) in each block should be afresh.

 

Experiments other than the field experiments

 

In other types of experiments the replicates can be identified with the sources of variation corresponding with position, time, classification of the experimental units, etc. For instance, if the experimental material is cows, they can be divided into groups according to breed, age, weight, location number etc. If in a herd the other sources of variation except the ‘lactation number’ are constant the cows can be grouped according to ‘lactation number’. One group will consist of cows of one ‘lactation number’, the other group will consist of cows of the 2nd ‘lactation number’, and so on. Here the ‘lactation number’ will be the replicates and a cow will be the experimental unit.

 

9.      Randomization of Population/Sample (Treatments)

 

The sample/population (treatments) are assigned to the units (plots) within each group (block) entirely at random with the help of random numbers. It is important to note that for every group the randomization should be afresh. The same set of random numbers should not be used for all the groups.

 

10.  Structure of Analysis of Variance

 

Taking the case of agricultural experiment, if we suppose the number of sample/population (treatments) to be n and the number of replications to be r, the total number of degrees of freedom will be divided into three parts representing the independent comparisons:

 

(a)    Between blocks

(b)   Between sample/population (treatments)

(c)    Random variation which provides a basis for the estimation of error. Thus the structure of analysis will

be as follows:

 

(r-1)= Degree of freedom between blocks

 

n-1= Degree of freedom between sample/population (treatment)

 

(n-1)(r-1)=Degree of freedom due to Error

 

nr-1= Total degree of freedom

 

VB= Mean Sum of Square (Blocks)

 

VT= Mean Sum of Square (Sample/Population/Treatment)

 

VE= Mean Sum of Square (Error)

 

Here, since the total number of observations is nr the total degrees of freedom will be (nr-1). Similarly, since the blocks and the sample/population (treatments) are respectively r and n in number, their corresponding degrees of freedom will be (r-1) and (n-1).

 

11.  Standard Errors and Critical Difference

 

The standard error of the difference between the sample/population (treatment) means based on r replications is estimated by the relation

2

( .  . )    = √

 

Where VE is the pooled Error Mean Square.

 

Critical differences at 5% level of significance= ( .  . )     × 5%

 

If some sample/population (treatments) receive extra replications, the general formula for the standard error is

1         1

( .  . )      = √   ( 1 + 2)

 

Where, r1 and r2 are the numbers of replications of the sample/population (treatments) to be compared.

 

12.  Advantages of the Randomized Block Design

 

(1)   When the material is heterogeneous, the residual variance can be reduced by choosing blocks or plots such that the plots within any block are fairly similar.

 

(2)   This design allows of any number of sample/population (treatments) and any number of replications, but when the number of sample/population (treatments) is very large (approximately 20 or more) the efficiency of error control decreases.

 

(3)   Although a reduction in the number of replications leads to a larger standard error, yet it furnishes a result of some value at least.

 

(4)   By means of grouping more accurate results are usually obtained than with the completely randomized design.

 

(5)   If we find that the experimental error variance is larger for some sample/population (treatments) than for others, we can obtain an unbiased error for testing any specific combination of the sample/population (treatment) means.

 

13.  Self-Check Exercise with solutions

 

Q.1. The following table gives the yields in pounds per plot, of five varieties of wheat after being applied to each of 4 plots, completely randomized.

Analyse the data and state your conclusions.

 

Analysis

224)2

= = 2508.8

. . = (82 + 82 + … … … . +92 + 82) − . . = 207.2

. . = (322+ 442+ 642+482+362) − . . = 155.2

4

. . = . . − . .

= 207.2 − 155.2 = 52.

 

Table for Analysis of Variance

 

 

Here, F-test indicates that there are significant differences between the sample/population (treatment) means, since the observed value of the variance ratio is highly significant at 0.1% level of significance. Now we wish to know as to which variety is the best and also which varieties show the significant differences among themselves. This can be done with the help of critical difference and confidence interval.

 

Standard error of the difference between two sample/population (treatment) means

 

= √ 2 = √ 2 × 3.47 = 1.317

= ( .  . )  × 5% .  . = 15

= 1.317 × 2.131 = 2.81

= ( . . ) × 5% . . = 15

 

Here, F-test in the analysis of variance indicates significant differences between the varieties and therefore we are justified in comparing the individual varieties with the help of critical difference.

 

Summary of results

 

In agricultural experiments it is advisable to express the results in the commercial units of measurement like pounds per acre, or quintals per hectare, etc. This can be done by calculating the appropriate conversion factor, depending upon the area of each plot and unit of measurement and multiplying each sample/population (treatment) mean by this conversion factor.

 

The varieties which do not differ significantly have been underlined by a bar. This method of underlying the sample/population (treatments) which do not differ significantly is the concise way of indicating the significance and non significance of individual comparisons.

 

Q.2. The yields of 6 varieties of crop in lbs., along-with the plan of the experiment, are given bellow. The number of blocks is 5, plot size is 1/20 acre and the varieties have been represented by A, B, C, D, E and F.

Test the significance of the variation due to strains.

 

Analysis

 

Tabulation:

Varieties Blocks Variety Variety
I II III IV V Totals means
A 20 26 30 28 23 127 25.4
B 9 12 10 9 7 47 9.4
C 12 15 16 14 14 71 14.2
D 17 10 20 23 20 90 18.0
E 28 26 23 35 30 142 28.4
F 70 62 56 64 75 327 65.4
Totals 156 151 155 173 169 804 G.M.=26.8

 

Sum of Squares for different sources of variation

8042

=  30  = 21547.2

.  . = (202 + 92 + … … … + 302 + 752) −  .  . = 10646.8

.  .= (1562 + ⋯ + 1692) −  .  . = 61.5 6

.  .= (1272 + … … + 3272) −  .  . = 10167.2 5

.  .= (        .  . ) −  .  . (  . ) −  .  . (  )

= 10646.8 − 61.5 − 10167.2 = 418.1

 

Table of analysis of variance

Sources of D.F. S.S. M.S. F (Cal.)
variation 5% 1% 0.1%
Blocks 4 61.5 15.38 97.25*** 2.71 4.10 6.46
Varieties 5 10167.2 2033.44
Error 20 418.1 20.91
Total 29 10646.8

***Significant at 0.1% level of significance.

 

It is clear from the table that this high observed value of F is significant at 0.1% level of significance which proves that there are significant differences between the variety means. Now, we have to test the significance of the difference between the individual varieties, and that will be done with the help of critical difference.

 

Summary

 

Randomization, replication and error control are the three main principles of design of experiments.    Randomization    which    defines   the    manner    of    allocation    of    the sample/population (treatments) to the experimental units, replication which specifies the number of units to be provided for each of the sample/population (treatments) and error control which increases the precision by choosing appropriate type of experimental units and also their grouping. We  have  seen  that  in  a  completely  randomized  design  no  local  control  is  adopted excepting that the experimental units should be homogenous. Usually when experiments require a large number of experimental units, completely randomized designs cannot ensure   precision    of   the   estimates   of   sample/population    (treatment)   effects.   An improvement  of completely  randomized  designs  can  be  obtained  by providing  error control measures as randomized block design. The error control measures in this design consist of making the units in each of these groups homogeneous. These groups are commonly known as blocks and the experimental units in the blocks are known as plots. This type of homogeneous grouping of the experimental units and the random allocation of sample/population (treatments) separately in each block are the two main characteristic features of randomized block design.

Learn More:

  1. http://www.statisticshowto.com/experimental-design/
  2. Chandel, S.R.S. (2006). In: A Handbook of Agricultural Statistics, Anchal Prakashan mandir, Kanpur.
  3. Sharma, J. K. (2014). In: Business Statistics, II eds., S. Chand & Company, N Delhi.
  4. http://stattrek.com/experiments/experimental-design.aspx?Tutorial=AP
  5. Das, M.N. and Giri, N.C. (1991). In: Design and Analysis of Experiments. Wiley Eastern Limited, Second Eds., New Delhi.