27 Completely Randomized Design-II
Dr. Harmanpreet Singh Kapoor
Module 33: Completely Randomized Design-II
- Learning Objectives
- Introduction
- Checking the Adequacy of the Model
- Testing for the Equality of the Variance
- Testing the Treatment Means Using Least Square Methods
- Confidence Interval for Difference of Treatment Means
- Summary
- Suggested Readings
- Appendix 1
- Learning Objectives
This module is in the continuation of the module “Completely Randomized Design- I”. In this module, we will discuss some other topics like checking the adequacy of the model, test for the equality of the variance, post hoc analysis, comparison of two treatments using different parametric tests, confidence interval of the parameter and some introduction to random effect model. This module will help you to understand that before applying ANOVA how we have to check for the validity of the assumptions. It is also essential to know how to proceed further if our null hypothesis will be rejected for that testing of pair of treatment population mean parameters as well construction of the confidence interval for difference of mean of pair of treatments will be discussed. Some examples will be discussed in this manner for better understanding.
- Introduction
We have already covered the topic of completely randomized design with complete derivation starting from the building of the model, defining the null and alternative hypothesis, method of least square to estimate the parameters of the model including the variance 2 with the construction of ANOVA table and how to derive the values for different sum of square in the module “Completely Randomized Design”. In this module, we will discuss about the adequacy of the assumptions as one can only apply any model on the data if the data satisfy some assumptions. So this module will consider the methods that are available in the literature to test these assumptions. There are many methods that are available in the literature we will discuss some of the most important methods only. We will discuss about the confidence interval of the parameters as well as the comparison of two treatments when our null hypothesis is rejected.
- Checking the Adequacy of the Model
In this section, we will discuss about the validity of the assumptions before applying the ANOVA. We have to check the assumption of normality of the error term. There are methods like histogram, dot plot, line chart, Q-Q plot, Cumulative Frequency (P-P) plots and through test Kolmogorov- Smirnov (KS) test, W/S test, Jarque-Bera test, Shapiro-Wilks test and D’Agostino test one can determine the normality of the error term. Graphical representation will not be the exact method to check about the normality of the data if the values seem like normally distributed but they are not actually normal. Graphical methods are not very useful when the sample size is very small. Then the histogram of the data does not look normal but in reality it may follow normal distribution. Statistical tests give more actual probabilities about the normality of the data. In the testing of hypothesis we want to test about whether the data was drawn from the normal population or not.
The hypotheses to test about the normality of the data are:
0: The sample data are taken from the normal distribution or the sample data are not significantly different from the normal population.
1: The sample data are not from the normal distribution or the sample data are significantly different from the normal distribution.
There is a difference in testing of this hypothesis then others where our main motive is to estimate about the difference in mean, median and variance of the data.
In these types of hypothesis of differences, our interest in to reject the null hypothesis. If the probability value (p-value) of these events is coming out to be less than 5% then we reject the null hypothesis. Here in the testing for the normality of the data, our main interest is to see our p-value to be greater than 5% so that we can accept our null hypothesis.
So if probability value > 0.05 then it mean that the data follow normal distribution otherwise the data are significantly different from the normal distribution.
Note: As the sample size increases, it is assumed by central limit theorem that data follow normal distribution on approximate basis. So for very large data sets, the testing for the normality of the data become less important. Also it is generally assumed that the data follow normal distribution, when the sample size is more than 30.
3.1 W/S Test for Normality
This test is very simple test that require only the standard deviation and the range of the data. The test statistics is defined as
=
where is the test statistics, r is the range of the data and is the standard deviation.
Steps for applying this statistics
- (i) First evaluate the standard deviation of the data is square root of the variance.
- (ii) Second find the range of the data that is Highest value of the data-lowest value of the data-lowest value of the data (r=H-L).
- (iii) Evaluate the proportion = .
Now W/S test uses a critical value to test the hypothesis. If the calculated value of the test statistics falls within the critical value given in the table of W/S test. Then we accept our null hypothesis and do not reject our null hypothesis. If the calculated value falls outside the range then reject our null hypothesis.
Sometime we are interested to find out the p-value of test statistics. Then if W/S statistics i.e. q>0.05 then the data are not significantly different from normal distribution.
3.2 Jarque-Bera Test
A test that has similarity with the goodness of fit test. Here sample data have been tested whether skewness and kurtosis of the data matched with the normal distribution.
It is considered as the simplest test among all the test. It also has some drawback that it is not available in statistical softwares like SPSS.
Jarque Bera
This test statistic is basically used to test the normality through skewness and kurtosis and it is very effective.
D’Agostino
It is the most powerful among all the test.
Kolmogorov-Smirnov
This test is generally used in the practice and this test is not sensitive to problems in the tails. This test is only give more appropriate results when the sample size is greater than 50. This test statistic is generally available in statistical software.
Note: As we discussed about the validity of the assumptions and how to verify them but if the data (error terms) are not from the normal distribution then what is the problem? As we already discussed that our results are only valid if our data meet some assumptions. As errors are not follow normal distribution then it may lead to serious consequences for the experimenter to make any statement about the problem.
So it is essential for a person to cross check about the normality of the data not just by graphical methods but also through the above tests.
In the next section, we check the validity of the second assumption that is equality of variance of the dataset.
- Testing for the Equality of the Variance
The most widely test statistic that is used to test about the equality of the variance is Bartlett’s test. The null hypothesis is
0: 12 = 22 = ⋯ = 2
1:
The test statistic is defined as
time and perform the task of testing of mean parameter of pairs using t-test. In this module, we also discussed about the confidence interval of the difference of the population mean parameter of the pair of treatments.
- Suggested Readings
- Chakarbarti, M.C., Mathematics of Design and Analysis of Experiments, Asia Publishing House, 1970.
- Cochran W. G. and G. M. Cox, Design of Experiments, Wiley, 1992.
- Dass, M. N. and N. C. Giri, Design and Analysis of Experiments, New Age International Publishers, 1986.
- Kempthorne, O., Design and Analysis of Experiments Vol I-II, Wiley, 2007.
- Montgomery, D. C., Design and Analysis of Experiment, Wiley, 2004.
- Raghavarao, D., Construction and Combinatorial Problems in Design of Experiments, Wiley, 1971.