20 Estimation: Point Estimation, Interval Estimation, Population mean-(known or unknown)
Prof. Pankaj Madan
Estimation: Point Estimation, Interval Estimation, Population mean-(known or unknown)
Learning Objectives:
Ø Point Estimation
Ø Properties of Point Estimator
Ø Drawback of Point Estimates
Ø Confidence Interval Estimation
Ø Interval Estimation of Population Mean (σ known)
Ø Interval Estimation of Population Mean (σ unknown)
1. Introduction
Estimation statistics is data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning and meta analysis to plan experiments analyze data and interpret results. It is distinct from null hypothesis significance testing, which is considered to be less informative. Estimation statistics or simply estimation is also known as the new statistics; a distinction introduced in the fields of psychology, medical research, life sciences and wide range of other experimental sciences where NHST still remains prevalent, despite estimation statistics having been recommended as preferable for several decades.
The primary aim of estimation methods is to estimate the size of an effect and report an effect size along with its confidence intervals, the latter of which is related to the precision of the estimate. Estimation at its core involves analyzing data to obtain a point estimate and an interval estimate that summarizes a range of likely values of the underlying population effect.
2. Point Estimation
A sample statistics (such as ,̅ s, or ̅) that is calculated using sample data to estimate most likely value of the corresponding unknown population parameter (such as µ, σ or p) is termed as point estimator, and the numerical value of the estimator is termed as point estimate. For example if we calculate that 10 percent of the items in a random sample taken from a day’s production are defective, then the result ‘10 percent’ is a point estimate of the percentage of the items in the whole lot that are defective. Thus, until the next sample of items is not drawn and examined, we may proceed on manufacturing with the assumption that any day’s production contains 10 percent defective items.
3. Properties of Point Estimator
For a statistical point estimate, the sampling distributor of the estimator provides information about the best estimator. Before any statistical inference is drawn, it is essential to resolve following two important issues:
(i) Selection of an appropriate statistics to serve as the best estimator of a population parameter.
(ii) The nature of the sampling distribution of this selected statistic. Since the sample statistic value varies from sample to sample, the accuracy of a given estimator also varies from sample to sample. This means that there is no certainty of the accuracy achieved for the sample one happens to draw. Although in practice only one sample is selected at any given time, we should judge the accuracy of an estimator based on its average value over all possible samples of equal size. Hence, we prefer to choose the estimator whose ‘average accuracy’ is close to the value of population parameter being estimated. The criteria of selecting an estimator are:
Unbiasedness Consistency Efficiency
As different sample statistics can be used as point estimators of different population parameters, the following general notations will be used:
θ= Population parameter (such as µ, σ, p) of interest being estimated
̅= θ
θ Sample statistic (such as ̅, , ̅) or point estimator of
Unbiasedness: The value of a statistic measured from a given sample is likely to be above or below the actual value of a population parameter of interest due to sampling error. Thus it is desirable that the mean of sampling distribution of sample means taken from a population is equal to the population mean. If it is true, then the sample statistic is said to be an unbiased estimator of the population parameter. Hence the
sample statistic ̅ θ is said to be an unbiased estimator of the population parameter ̅ ̅ provided E(θ) = θ where E(θ) = expected value or mean of the sample statisticθ. If E(θ) ≠ θ for the sampling distribution of θ, then θ is said to be a biased estimator.
For any point estimator with a normal distribution, it has been proved that approximately 95 percent of all point estimates will lie within 2 (or more exactly 1.96) standard deviations of the mean of that distribution. This implies that for the unbiased estimators, the difference between the point estimator and the true value of the parameter will be less than 1.96 standard deviations (or standard error). This quantity is called the margin of error and which provides an upper bound for the error of estimation.
= 1.96 × ( ) = 1.96√
If σ is unknown and sample size ≥30, or large, the sample standard deviation s can be used to approximate σ.
Consistency: A point estimator is said to be consistent if its value ̅ tends to becomeθ closer to the population parameter θ as the sample size increases. For example, the standard error of sampling distribution of the mean, ̅= /√ , tends to become smaller as sample size n increases. Thus the sample mean ̅is a consistent estimator of the population mean µ. Similarly, the sample proportion ̅is a consistent estimator of the population proportion p because ̅= /√ .
Efficiency: For the sample population, out of two unbiased point estimators, the desirable characteristic of an unbiased estimator is that the spread (as measured by the variance of the sampling distribution should be as small as possible). Such unbiased estimator is said to be efficient because an individual estimate will fall close to the true value of population parameter with high probability. It is because of the reason that there is less variation in the sampling distribution of the statistic. For example,
for a sample random sample of size n, if ̅̅̅ and ̅̅̅ are two unbiased point estimators 1 2
of the population parameter θ, then relative efficiency of ̅̅̅ to ̅̅̅ is given by 2 1
4. Drawback of Point Estimates
The drawback of point estimate is that no information is available regarding its reliability, i.e. how close it is to its true population parameter. In fact, the probability that a single sample statistic actually equals the population parameter is extremely small. For this reason, point estimates are rarely used alone to estimate population parameters. It is better to offer a range of values within which the population parameters are expected to fall so that reliability (probability) of the estimate can be measured. This is the purpose of interval estimation.
5. Confidence Interval Estimation
A point estimate does not provide information about ‘how close is the estimate’ to the population parameter unless accompanied by a statement of possible sampling error involved based on the sampling distribution of the statistic. It is therefore important to know the precision of an estimate before depending on it to make a decision. Thus decision makers prefer to use an interval estimate (i.e. the range of values defined around a sample statistic) that is likely to contain the population parameter value. An interval estimation is a rule for calculating two numerical values, say a and b that create an interval that contains the population parameter of interest. This interval is therefore commonly referred to as confidence coefficient and denoted by (1-α). However, it is also important to state ‘how confident’ one should be that the interval estimate contains the parameter value. Hence an interval estimate of the population parameter is a confidence interval with a statement of confidence that the interval contains the parameter value. In other words, a confidence interval estimation is an interval of values computed from
sample data that is likely to contain the true population parameter value.
The confidence interval estimate a population parameter is obtained by applying the formula:
Point estimate ± Margin of error
Where Margin of error = ×
= critical value of standard normal variable that represents confidence level (probability of being correct) such as 0.90, 0.95, and so on.
6. Interval estimation of population mean (σ known)
Suppose the population mean µ is unknown and the true population standard deviation σ is known. Then for a large sample size (n≥30), the sample mean ̅is the best point estimator for the population mean µ. Since sampling distribution is approximately normal, it can be used to compute confidence interval of population mean µ as follows:
Where ⁄2is the z-value representing an area ⁄2 in the right tail of the standard normal probability distribution, and (1-α) is the level of confidence.
7. Interval estimation of population mean (σ unknown)
If the standard deviation of σ of a population is not known, then it can be approximated by the sample standard deviation, s when the sample size, n(≥30) is large. So, the interval estimator of a population mean µ for a large sample n(≥30) with confidence coefficient (1-α) is given by
When the population standard deviation is not known and the sample size is small, the procedure of interval estimation of population mean is based on a probability distribution known as the t-distribution. This distribution is very similar to the normal distribution. However the t-distribution has more area in the tails and the less in the centre than does normal distribution. The t-distribution depends on a parameter known as degree of freedom. As the number of degrees of freedom increases, t-distribution gradually approaches the normal distribution, and the sample standard deviation s becomes a better estimate of population standard deviation σ.
The interval estimate of a population mean when the sample size is small (n≤30) with confidence coefficient (1-α) is given by
Where ⁄2 is the critical value of t-test statistic providing an area α/2 in the right tail of the t-distribution with n-1 degrees of freedom, and
= √∑( − ̅)2 − 1
The critical values of t for the given degrees of freedom can be obtained from the table of t-distribution
8. Summary
In any estimation problem, we need to obtain both a point estimate and an interval estimate. The point estimate is our best guess of the true value of the parameter, while the interval estimate gives a measure of accuracy of that point estimate by providing an interval that contains plausible values. When the variable of interest is quantitative, the sample mean ̅provides a point estimates of unknown mean. When the variable has a binomial distribution, the sample proportion is a point estimate of the unknown population proportion is a point estimate of the unknown population proportion p.
Confidence interval are frequently used as interval estimates Articles in the literature commonly report 95% confidence intervals (95% CI). The 95% CI is calculated in such a way that under repeated sampling it will contain the true population parameter.
9. Self-Check Exercise with solutions
Q.1. The average monthly electricity consumption for a sample of 100 families is 1250 units. Assuming the standard deviation of electric consumption of all families is 150 units, construct a 95 percent confidence interval estimate of the actual mean electric consumption.
Solution:
The information given is: ̅= 1250, = 150, = 100 and confidence level (1-α)= 95 percent. Using the standard normal curve we find that the half of 0.95 yields a confidence coefficient ⁄2 = 1.96. Thus confidence limits with ⁄2 = ±1.96 for 95% confidence are given by
Thus for 95 percent level of confidence, the population mean µ is likely to fall between 1220.60 units, that is 1220.60≤µ≤1274.40.
Q.2. A random sample of 64 sales invoices was taken from a large population of sales invoice. The average value was found to be Rs 2000 with a standard deviation of Rs 540. Find a 90 percent confidence interval for the true mean value of all the sales. Solution:
The information given is: ̅= 2000, = 540, = 64 = 10 Therefore
540
̅= √ = √64 = 67.50
⁄2 = 1.64
The required confidence interval of population mean µ is given by
̅± ⁄2 √ = 2000 ± 1.64(67.50) = 2000 ± 110.70
Thus the mean of the sales invoices for the whole population is likely to fall between Rs 1889.30 and Rs 2110.70, that is 1889.30≤µ≤2110.70.
Q.3. A survey conducted by a shopping mall group showed that a family in a metro city spends an average of Rs 500 on clothes every month. Suppose a sample of 81 families resulted in a sample mean of Rs 540 per month and a sample standard deviation of Rs 150, develop a 95 percent confidence interval estimator of the mean amount spent per month by family.
=540±40.67 or Rs 499.33and Rs 580.67
Hence for 95 percent level of confidence, the population mean µ is likely to fall between Rs 499.33 and Rs 580.67, i.e. 499.33≤µ≤580.67.
Learn More:
- https://en.wikipedia.org/wiki/Estimation_statistics
- http://iimk.ac.in/gsdl/cgi-bin/library?e=d-000-00—0statis–00-0-0–0prompt-10—4——0-1l–1-en-50—20-about—00031-001-1-0utfZz-8-00&cl=CL2&d=HASHe00909ac46143070d8f732.3&x=1
- Sharma, J K (2014). In: Business Statistics, II eds., S Chand & Company, N Delhi.