14 Tests of Significance based on small samples
Prof. Aslam Mahmood
1) E-Text
Tests of Significance based on small samples Introduction:
A sampling distribution of means of large samples size (n ≥ 30 ) drawn from any universe with known mean M and standard deviation SD is found to be normally distributed with mean equal to the value of the universe mean and standard error as SD ∕√n as described in module on test of significance based on large samples.
However, sampling distribution of means will not be normal if the size of the sample is less than 30(n< 30). For small sample size the sampling distribution of means, drawn from a normal universe, will acquire another type of distribution known as Student’s “ t” distribution.
Sampling distribution of means or “ t” distribution will be flatter than a normal distribution For “t” distribution 1% ( or 5% ) the limit will not be 3 times ( or 2 times ) of the standard error as is the case with normal distribution , it will be more than these values and size of the sample will become smaller the limit will become further larger. It will not give any fixed limit. Every sample size will have different limiting values for different levels of significance.
The sampling distributions for all the possible values of means and standard deviations are not possible to study. William Sealy Gosset, the noted statistician, has standardized them by taking the ratio of the difference between the sample and population mean (sample mean – universe mean) to its “standard error” and has worked out the limits for different levels of significance. These values have been tabulated for different (n – 1) values, known as degrees of freedom where n is the size of the sample.
Since then, the ratio of the difference between sample mean and Universe mean to its standard error is being to test whether a given sample mean can correspond to a universe mean or not. The test is knowns as student’s “ t “ test -named after him as he use to write under the pen name of Student – These values have also been published in Statistical tables for biological, agricultural and medical research, edited by R.A. Fisher and F. Yates; popularly known as Fisher and Yates tables. These tables are available in appendix of almost every book on statistics.
With the increase in the sample size the sampling distribution of means will keep on approaching to be normal and for degrees of freedom equal to or more than 30 the sampling distribution of means converges to normal distribution.
2.0 Student’s ‘ t’ test for One Sample
If a small sample of size less than 30 is drawn from a normally distributed universe with a given mean M and standard deviation as SD, the sampling distribution of the ratio:
t = I M – m I / (SD/√ )
will follow ‘ t ‘ distribution with ( n – 1 ) degrees of freedom. Note that the numerator of ‘t’ is in modulus and the sign of two vertical lines i.e. I I indicates it
In general, the Standard Deviation of the Universe is not known. In such cases we estimate it from the Standard Deviation of the sample only and use a slightly different formula given below:
t = I M – m I/ σ/(√ − 1 ) where is the standard deviation of sample.
Using the above mentioned properties of ‘ t’ distribution we can test weather a sample mean ‘m’ could have a proposed Universe mean ‘M’ or not.
We make a null hypothesis denoted by H0; that a given sample mean m can have a proposed universe mean M and test it against an alternative hypothesis denoted by H1; that the sample mean m does not suggest that the universe mean could be M. More precisely it can be written as given below:
Null hypothesis : I M – m I = 0 as against the
Alternative hypothesis : I M – m I ≠ 0
Under the null hypothesis I M – m I= 0 the calculated value of the ‘t’ statistics will be:
t = I M – m I/ σ/(√ − 1 )
If the given sample meets the proposition of the null hypothesis calculated value of ‘t’ should be less than or equal to the tabulate value of “t” at 5% or 1% level of significance for (n – 1 ) degrees of freedom. In such a case the difference between proposed mean M and the observed mean m i.e. I M – m I will be considered as insignificant at the given level of significance and the null hypothesis will be accepted.
In case the calculated value of ‘t’ ratio is found to be more than the tabulated value, the difference I M – m I is said to be significant at the given level of significance and the null hypothesis will rejected and the alternative hypothesis will be accepted.
The absolute value of ‘ t ‘ ratio should not exceed the tabulated value of ‘t’ for ( n-1 ) degrees of freedom. These table values are given in annexure for few levels of significance. In most of the cases we choose these values for 1% or 5% levels of significance as level of significance more than 5% or more than 10% will mean a higher level of the probability of error.
It is also to be noted that the rejection of the null hypothesis:
Ho : I M – m I = 0
At say 5% level of significance, will mean that the given sample mean ‘m’ suggest that the universe mean can not be considered as ‘M’. It could be more than ‘M’ or could be less than ‘M’.
Since the probability relates to either of the two events, it could be 0.025 for less and 0.025 for more and we call it a two tailed test. If we take only one possibility say “ universe mean could be more than ‘M’, its level of significance reduces to half and the test will be known as one tailed test. Thus the same value of ‘t’ for two tailed test at 5% level of significance will correspond to only 0.025% level of significance in one tailed test. One can verify the above point from the limiting values of ‘t’ test given in annexure.
If the calculated value of ‘t’ exceeds the table value, the absolute difference between sample mean and universe mean i.e. I M – m I is said to be statistically significant and so the null hypothesis is rejected and we conclude that the Universe value of the mean can’t be considered as M. In case the calculated ‘ t ‘ value is less than or equal to the table value of ‘ t
‘ it is considered as insignificant and the null hypothesis is accepted i.e. we conclude that the universe mean may be equal to M.
2.1 Example
Agricultural productivity of farms of a region is normally distributed over space. A random sample of 10 plots is drawn and their productivity X (000Rs./ac.) is found to be 150, 200, 175, 180, 200, 210, 180, 130, 205 and 190 .
Does the data suggest that the mean productivity of the universe (region) could be Rs.
20500/- per acre.
In the present example M= 20.5 (000). The value of SD is not given which can be estimated from the sample as , sample men ‘m’ also will be worked out from the sample values as shown below.
Table-1: Agricultural Productivity of Ten Farms.
Farm size | Agricultural Productivity | (X – m) | (X – m )2 |
(Rs 000/acre) | |||
X | |||
1 | 150 | 150 -182= -32 | 1024 |
2 | 200 | 200-182= 18 | 324 |
3 | 175 | 175-182=- 7 | 49 |
4 | 180 | 180-182= 2 | 4 |
5 | 200 | 200-182=18 | 324 |
6 | 210 | 210-182=28 | 784 |
7 | 180 | 180-182=-2 | 4 |
8 | 130 | 130-182=-52 | 2704 |
9 | 205 | 205-182= 23 | 529 |
10 | 190 | 190-182= 8 | 64 |
Total | 1820 | 5810 |
Sample mean m = 1820/10 = Rs. 182 (000/acres )
Sample Standard Deviation = 5810/10 = 581.5 = 24.104
t = I205 – 182 I/ 24.104/( √10 − 1 )
t = 23 / (24.104/3) = 23 / 8.036 = 2.862
Table value of ‘t’ for 9 degrees of freedom is 2.26 at 5 % level of significance and 3.25 at 1 % level of significance. Thus we reject the null hypothesis at 5 % level of significance only and accept it only at 1 % level of significance.
2.2Explanation
The results of ‘t’ test given above shows that the null hypothesis of the mean productivity of the region ( universe ) =Rs. 20500/- per acer can be rejected but with 5% probability of error. However, if we want to be more accurate we have to accept the null hypothesis with probability of error only 1%.
3.0 Student’s ‘t’ test for Two Samples
In the above case we had one Universe(which has normal distribution) from where a sample was drawn. Another important situation in which‘t’ test can be helpful is when we have two Universes (both normally distributed and have common variance) whose mean values are not known. With the help of the sample means of random samples drawn from each Universe we can test the null hypothesis whether their universe means are equal or not.
Thus if two small random samples of size n1 and n2 are drawn from two normally distributed universe R1and R2 with equal means M1 and M2 and common variances, the sampling distribution of the absolute values of two sample means I m1 – m2 I will follow the ‘t’ distribution when the ‘t’ ratio is as given below:
t = I m1 – m2 I/ S√ 11 + 21
With (n1+ n2 – 2) degrees of freedom.also.
Where S is the pooled estimate of the common variance of both the Universes Given by
F-ratio gives a distribution which varies according to the numerator degrees of freedom (n1-1) and denominator degrees of freedom (n2 -1) indicated in the parenthesis with F value. Like ‘t values the values of F- ratios are also given in appendix of almost every book on statistics.
These values are borrowed from Fisher and Yeates tablesf-ratio tables referred above.
3.1 Example
Per acre yield of Wheat is given to be normally distributed in two regions R1 and R2. The yield of wheat (tons per acre) of a random sample of 8 plots from region R1 (X1) and 10 plots from region R2 (X2)are given below. Test the hypothesis that both the regions R1 and R2 have equal mean of yield of wheat.
Table-2: Wheat Yield in Two Regions with plot sample of 8 for Region-1 and 10 for Region-2
We can apply‘t’ test as given above only after ascertaining that the variance of the two regions R1 and R2 are same by testing the significance of the ratio of the two sample variances S1 and S2.
F-Test
Computation of these two sample variances is carried out as given below:
S2 = 0.20/ (10 – 1) = 0.0222
F – 0.0457/0.0222 = 2.0586 ( 7 , 9 ) degrees of freedom
Table value of F for ( 7 , 9 ) degrees of freedom at 1 % level of significance is 5.47 ( taken from Fisher and Yeates tables not given here)
Our calculated value of F is much lower than the tabulated value, hence it is insignificant at 1% level of significance and we conclude that the two regions may have equal variance.
‘t’- Test
Once it is verified that two regions have equal variance we can apply ‘t’ test for two samples as below:
Null hypothesis :
H0; Average yield of wheat per acre in two regions is equal,
against the alternative hypothesis:
H1: Average yield of wheat per acre in two regions is not equal.
Pooled estimate of common variance,
The tabulated value of ‘t’ for 16 degrees of freedom ( 8 + 10 – 2 = 16 ) at 1 % level of significance is found to be 2.92 . As our computed value of ‘ t’ ( =7.02) here is found to be much higher than the tabulated or limiting (=2.92), the ‘t’ value is statistically significant at 1 % level of significance suggesting that the two regions R1 and R2 differ in terms of their Universe mean.
Note that if a ‘t’ value is significant at 1% level of significance it will be automatically significant at higher levels of significance also. So the there is no need to check its significance at 5% level. It will be significant.
Limiting values of students’s t – distribution
you can view video on Tests of Significance based on small samples |
References
- Hammond and Macullagh, (1974) “Quantitative Techniques in Geography: An introduction “ Clarendon Press.
- W. G. Cochran 1953 Sampling Techniques Asia Publishing House, New Delhi.
- Aslam Mahmood (1998) Statistical Methods in Geographical Studies, Rajesh Publications New Delhi.
- R.A. Fisher and F. Yates (ed.1963), Statistical tables for biological, agricultural and medical research, edited by Oliver and Boyed Edinburgh.