24 Statistical Inference with two populations: Hypothesis Techniques-two sample tests: σ1 and σ2 (known and un-known)
Prof. Pankaj Madan
1. Introduction
2. Test for two population means (Independent Samples)
3. Test for two population means (dependent Samples)
4. Two Sample t-test
5. Hypothesis Testing of the Difference Between Two Means
6. Hypothesis Testing For a Difference Between Means for Small Samples Using Pooled Standard Deviations (Optional)
7. Hypothesis Testing for a Difference Between Proportions
8. Paired Differences
9. Summary
10. Self-Check Exercise
Quadrant-I
Statistical Inference with two populations: Hypothesis Techniques-two sample tests: σ1 and σ2 (known and un-known)
Learning Objectives:
After the completion of this module the student will understand:
i- Test for two population means (Independent Samples) ii- Test for two population means (Dependent Samples) iii- Two Sample t-test
iv- Hypothesis Testing of the Difference Between Two Means
v- Hypothesis Testing For a Difference Between Means for Small Samples Using Pooled Standard Deviations (Optional)
vi- Hypothesis Testing for a Difference Between Proportions vii- Paired Differences
Introduction
This technique helps to test a claim to comparing parameters from two populations. It can be explain by this example
Example
Suppose an entrepreneur wants to open a new startup business and is contemplating of doing so in either Business District- 1 or Business District- 2. The economic affluence of a business district is the key in finalizing the entrepreneur’s decision. The entrepreneur wants to look at the household income as a key indicator of a business district’s economic condition to settle the issue. Thus, from a statistical point of view, we are dealing with two populations, namely:
Population- 1: collection of households in the Business District- 1;
Population- 1: collection of households in the Business District- 2;
The common variable under study (i.e., the variable applicable to both the population for comparison) is household income.
If the entrepreneur wants to compare the populations in term of mean household income, then the parameters of interest would be
µ1 = mean household income of population – 1 and µ2 = mean household income of population – 2.
First, one wants to know whether the populations are identical in terms of mean household income or not. To check this, one test the null hypothesis H0: µ1 = µ2 against the alternative HA :
µ1 ≠ µ2.
The idea is that if H0 is accepted then it is concluded that the business districts are identical in term of mean household income, and the business person can then focus on the secondary factors (like transportation, tax structure etc. ) to make the final decision. If the above H0 is rejected or there are some other reasons to believe that one population has higher mean than the other does, then one might be inclined to test
H0: µ1 = µ2 (µ1 ≤ µ2) against HA: µ1 › µ2)
Similarly, one can test
H0: µ1 = µ2 (µ1 ≥ µ2) against HA : (µ1 › µ2 )
On the other hand, if the entrepreneur wants to compare the populations in term of proportion of household having a minimum income level (say, K units per year), then the parameters of interest would be
p1 = proportion of household in population – 1 having a minimum specified income; and p1 = proportion of household in population – 2 having a minimum specified income.
One can now check whether the population are identical or not (in term of proportion of households having the minimum specified income) by testing
H0: p1 = p2 against HA: p1 ≠ p2;
Depending on the practicality, one can also test either
H0 : p1 = p2 ( p 1 ≤ p 2 ) against HA : p 1 › p 2 )
Similarly, one can test
H0 : p1 = p2 ( p 1 ≥ p 2 ) against HA : ( p 1 › p 2 )
s-11
Test for two populations means (Independent Samples)
For two population say population-1 and population-2, let the mean of a common variable be µ1 and µ2
respectively, are also completely unknown. Our objective is to test
H0: µ1 = µ2 against HA :
Create three conditions
(i) µ1 › µ2
or
(ii) µ1 ‹ µ2
or
(iii) µ1 ≠ µ2 at a significance level
s-12
To test the above null hypothesis, random samples of size n1 and n2 are drawn from the above populations and the sample observations from the one population are independent of those from the other population. Let
X1 = average of sample observation from population -1 X2 = average of sample observation from population -2
s1 = standard deviation of sample observation from population -1 s2 = standard deviation of sample observation from population -2
s-13
This formula will apply in case if population standard deviation is unknown but equal
s-14
Example
An experiment is conducted to compare the mean lengths of time required for bodily absorption of two popular insulins, say from company-X and company-Y. Ten women are randomly selected and given a dose of company-X. Similarly, another group of ten randomly selected women are administrated company-Y. The length of time in minutes for the insulins to reach a specified level in the blood is recorded. The sample averages, standard deviation and sample size are given below in Table-1.
Table-1
company-X | company-Y |
n1= 10 | n2= 10 |
mean 1=20.2 | mean 2=17.9 |
s1=8.1 | s2=7.3 |
Using = 0.05, test the claim that the insulins are identical in terms of mean time required for bodily
absorption.
s-15
Solution
First, we need to identify the populations
Population -1 = collection of all users represented by the group of ten individuals who received from company-X; and
Population -2 = collection of all users represented by the group of ten individuals who received from company-Y;
The common variable under study = time required for bodily absorption of insulins.
The parameter of interest:
µ1 = mean for the variable under study for Population-1 µ2 = mean for the variable under study for Population-2 The aim here is to test
H0: µ1 = µ 2 against HA: µ1 ≠ µ 2
s-16
At level = 0.05.
We conclude that the insulins are identical in terms of mean body absorption time.
s-17
Test for two population means (Dependent Samples)
Two samples are dependent if each member of one sample corresponds to a member of the other sample.
Dependent samples are also known as matched samples or paired samples. Use of such depend (paired)
samples will enable as to perform a more precise analysis, because they will allow us to control for
extraneous. With dependent samples we still follow the same basic procedure that we have followed in all
our hypothesis testing.
S-18
i.e. sample of water before (inlet) and after (outlet) the Effluent Treatment Plant (ETP) is tested to measure the effectiveness of ETP considering that the two samples of water have different properties of pollution level once treated by ETP.
s-19
Hypothesis Testing of the Difference Between Two Means (Dependent sample)
Do employees perform better at work with music playing. The music was turned on during the working hours of a business with 45 employees. There productivity level averaged 5.2 with a standard deviation of 2.4. On a different day the music was turned off and there were 40 workers. The workers’ productivity level averaged 4.8 with a standard deviation of 1.2. What can we conclude at the 0.05 level?
Solution
We first develop the hypotheses
H0: | 1 – | 2 | = 0 |
H1: | 1 – | 2 | > 0 |
s-20
Next we need to find the standard deviation. Recall from before, we had that the mean of the difference is
We can substitute the sample means and sample standard deviations for a point estimate of the population means and standard deviations. We have and Now we can calculate the z-score. We have
s-21
Since this is a one tailed test, the critical value is 1.645 and 0.988 does not lie in the critical region. We fail to reject the null hypothesis and conclude that there is insufficient evidence to conclude that workers perform better at work when the music is on. Using the P-Value technique, we see that the P-value associated with 0.988 is
P = 1 – 0.8389 = 0.1611
which is larger than 0.05. Yet another way of seeing that we fail to reject the null hypothesis.
Note: It would have been slightly more accurate had we used the t-table instead of the z-table. To calculate the degrees of freedom, we can take the smaller of the two numbers n1 – 1 and n2 – 1. So in this example, a better estimate would use 39 degrees of freedom. The t-table gives a value of 1.690 for the t.95 value. Notice that 0.988 is still smaller than 1.690 and the result is the same. This is an example that demonstrates that using the t-table and z-table for large samples results in practically the same results.
s-22
Hypothesis Testing For a Difference between Means for Small Samples Using Pooled Standard Deviations (Optional)
Recall that for small samples we need to make the following assumptions:
1. Random unbiased sample.
2. Both population distributions are normal.
3. The two standard deviations are equal.
s-23
Putting this together with hypothesis testing we can find the t-statistic. and use n1 + n2 – 2 degrees of freedom.
s-24
Example
Nine dogs and ten cats were tested to determine if there is a difference in the average number of days that the animal can survive without food. The dogs averaged 11 days with a standard deviation of 2 days while the cats averaged 12 days with a standard deviation of 3 days. What can be concluded? (Use = .05)
Solution
We write:
H0: dog – cat = 0
H1: dog – cat 0
We have:
n1 | = 9, | n2 = 10 | |
x1 | = 11, | x2 | = 12 |
s1 | = 2, | s2 | = 3 |
s-25
The t-critical value corresponding to a = 0.05 with 10 + 9 – 2 = 17 degrees of freedom is 2.11 which is greater than 0.84. Hence we fail to reject the null hypothesis and conclude that there is not sufficient evidence to suggest that there is a difference between the mean starvation time for cats and dogs.
Hypothesis Testing for a Difference between Proportions
Inferences on the Difference between Population Proportions
If two samples are counted independently of each other we use the test statistic:
s-26
Example
Is the severity of the drug problem in high school the same for boys and girls? 85 boys and 70 girls were questioned and 34 of the boys and 14 of the girls admitted to having tried some sort of drug. What can be concluded at the 0.05 level?
s-27
Solution
The hypothesis are
H0: p1 – p2 = 0
H1: p1 – p20
We have
p1
= 34/85 = 0.4
p2
= 14/70 = 0.2
p = 48/155 = 0.31
q = 0.69
Now compute the z-score
Since we are using a significance level of .05 and it is a two tailed test, the critical value is 1.96. Clearly 2.68 is in the critical region, hence we can reject the null hypothesis and accept the alternative hypothesis and conclude that gender does make a difference for drug use. Notice that the P-Value is
P = 2(1 – .9963) = 0.0074 is less than 0.05. Yet another way to see that we reject the null hypothesis.
Summary
This module help in understanding of sampling distributions when two sets of samples from the same or different populations. If they are random samples from the same population, then any differences across conditions or groups can be attributed to random sampling variability. However, if the two sets of scores are random samples from different populations, then we can attribute any difference between mean scores conditions to the independent variable or the treatment effect.
Self exercise question
The self exercise question is also mentioned on above describe pages of every topic.
Learn More:
-
- Sharma, J K (2014), Business Statistics, S Chand & Company, N Delhi.
- Bajpai, N (2010) Business Statistics, Pearson, N Delhi.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer.
- Darrell Huff (2010), How to Lie with Statistics, W. W. Norton, California.
- K.R. Gupta (2012), Practical Statistics, Atlantic Publishers & Distributors (P) Ltd., N. Delhi