1 Introduction to point estimation

Shirsendu Mukherjee

Introduction to statistical inference

In the present module we are going to introduce the concept of point estima-tion which is a part of statistical inference. Statistical inference is the process of going from information gained from a sample to infer about a population from which the sample is taken. There are two aspects of statistical inference that we will be studying in this course: (i) Estimation and (ii) Hypothesis testing. In an estimation problem some features of the population in which an enquirer is interested may be completely unknown to him, and he may want to make a guess about this feature completely on the basis of a random sample from the population. There are two types of estimation problem : (i) Point estimation and (ii) Interval estimation. In this lecture we shall discuss some preliminary concept of point estimation.Let us start our discussion with a brief history of estimation problem.

Historical Perspective The problem of estimation arose in a very natural way in problems of Astronomy and Geodesy in the rst half of the 18-th century. For example in Astronomy, the determination of interplanetary distances, determining the position of planets and their movements in time were some of the important problems. Whereas in Geodesy, determining the spheroidical shape of the earth was one of the most important problems. It is known that the gure of the earth is almost a sphere except for some atness near the poles. Observations were obtained on the measurement of the length of one degree of a certain meridian and the problem was to determine the parameters say and which speci ed the spheroid of the earth. Indirect observations on ( ; ) were given by the relation

Yi = + xi; i = 1; 2; : : : ; n

Where xi’s are known xed constants. Note that ( ; ) are uniquely deter-mined if only two observations on Y at di erent values of (x1; x2) are avail-able.However, as is customary in science, several observations were made at di erent values (x1; x2; : : : xn) and this led to the theory of combination of observations with random error which directly or indirectly measured “mag-nitudes of interest or parameters. To estimate and on the basis of the given data the rst attempt was made by Rogerr Boscovich(1757) in course of a geodetic study of ellipticity (extent of atness at the poles) of the earth. He suggested that the estimates of ( ; ) are to be determined such that

(i) The sum of positive and negative residuals or errors should balance i.e. Pni=1(yi xi) = 0 and

(ii) Subject to the above constraint we determine ( ; ) such that R = Pni=1 jyi xij The sum of absolute values of errors ei = yi xi is as small as possible.

Using geometric argument Boscovich solve the problem for the ve obser-vations that he had. Laplace (1789) gave a general algebric algorithm to obtain estimates of ( and ) on the above principles for any number of observations. This problem was later solved by Gauss and Legendre using the method of least squares.

Boscovich has made the assumption that the errors of overestimation and un-derestimation must balance out. This idea was used by so many researchers in future time. For estimating the parameter in the simplest model Yi = +ei, Simpson(1776) used this idea by assuming that errors are symmetrically uni-formly distributed about zero or the probability density function of the error is given by f(e) = 21h ; h < e < h; h > 0: Euler (1778) proposed the arc of a parabolic curve given by f(e) = 43r3 (r2 e2); r < e < r; r > 0 as the pdf of the random error. Laplace suggested the probability density function

h i

f(e) = 21h exp ; 1 < e < 1: As the model for distribution of errors

and Gauss proposed the normal distribution with probability density func- distribution led to the median of the sample of the “best” estimator of the “True Value” of the parameter whereas the normal distribution used by Gauss led to the mean of the sample as the “best” estimator of the “True value”.

Theory of Point estimation Background

We consider a random experiment E. The outcome of E is represented by a Observable random vector X = (X1; X2; : : : ; Xn); n 1. A particular value of X is denoted by x = (x1; x2; : : : ; xn). The character X could be real or vector valued and the set of all values of X is called the sample space and it is denoted by X Rn.

The random vector X is generated by F (x) = P (X x), x 2 X , the dis-tribution function of X.

In a parametric point estimation problem we assume that the functional form of F (x) is known except perhaps for a certain number of parameters. Let = ( 1; 2; : : : ; k) be the unknown parameters associated with F (x). The parameter may be real valued or vector valued and is usually called a la-

belling or indexing parameter. The labelling parameter varies over a set of values, called as parameter space and is denoted by Rk. So F (x) can be looked upon as a function of and henceforth we will write it as F (x). If X is discrete or absolutely continuous then F (x) is generated by f (x), the

probability mass function(p.m.f.) or of probability density function (p.d.f.) X: We write F = fp(x; ) : 2 g, as the class of all probability mass or density functions. The object of inference is the parameter or a function of the parameter say g( ), which is of interest. Let us consider few examples.

Example 1 Suppose a coin is tossed 50 times.

The outcome of ith toss can be described by a random variable Xi such that Xi = 1or 0 according as the ith toss results in a head or a tail.

X = f(x1; x2; : : : x50) : xi = 0 or 1 for all ig

If be the probability of getting a head in any toss then = (0; 1) and the 50

probability function of X is p(x; ) = xi (1 )1 xi ; x 2 X ; 2 : We

i=1

may want to estimate or any function of .

Example 2 Suppose that 100 seeds of a certain ower were planted one in each pot and let Xi equal one or zero according as the seed in the ith pot ger-minates or not. The data consists of (x1; x2; : : : x100) a sequence of ones and zeroes and is regarded as a realization of (X1; X2; : : : X100) such that compo-nents are i.i.d. random variables with P [X1 = 1] = and P [X1 = 0] = 1 , where represents the probability that a seed germinates. The object of estimation is itself or a function g( ) that may be of interest. For example,consider g( ) =108 8(1 )2, which is the probability that in a batch of 10seeds exactly 8 seeds will germinate.

Example 3 In a pathbreaking experiment Rutherford Chadwick and Elis(1920) observed 2608 time intervals of 7.5 seconds each and counted the number of time intervals Nr in which exactly r number of particles hit the counter. They obtained the following table

r	0	1	2	3	4	5	6	7	8	910
Nr	57	203	383	525	532	408	273	139	45	27	18

It is quite well known that the Poisson distribution with p.m.f f (x) =

exp( ) x ; x = 1; 2; : : : > 0 serves as a good model for the number of times a

given event E occurs in a unit time interval. If Xi denotes the number of particles hitting the counter in the i th time interval then (X1; X2; : : : Xn) where n = 2608 are i.i.d. Poisson random variables with parameter . We may want to estimate on the basis of the given data.

Example 4 Consider determination for an ideal physical constant such as gravity g. Usual way to estimate g is by the pendulum experiment and observe X = 4T2l , where l is the length of the pendulum and T the time re-quired for a xed number of oscillations. Due to variation which depends on several factors such as the skill of the experimentor and measurement errors, the i th observation Xi = g + ei where ei is the random error. Assuming the distribution of error is normal with zero mean and variance 2 we have X1; X2; : : : ; Xn are i.i.d.N(g; 2). Here the parameter is a two dimensional vector, = (g; 2). Here we can view estimation of g. On the other hand one may be interested in estimating the error variance 2 through which we can estimate the ability of the experimenter.

Example 5 Suppose an experiment is conducted by measuring the length of lives in hours of n electric bulbs produced by a certain company. Let Xi be the length of live for the ith bulb.

X = f(x1; x2; : : : xn) : xi 0 for all ig

If we assume that the distribution of each Xi is exponential with mean then = (0; 1) and and the probability function of X is p(x; ) = n 1 e xi ; x 2

i=1

X ; 2 : We may want to estimate the parameter or g( ) = e 60 , which represents the probability that the lifetime of a bulb will be at least 60 hours.

Objective

The distribution of X is characterized by the unknown parameter about which we only know that it belongs to the parameter space . To discuss the problem of point estimation, for the sake of simplicity, we consider the case when the parameter of interest is a real valued function g = g( ) of . In point estimation we try to approximate g( ) on the basis of the observed value x of X. In other words we try to put forward a particular statistic or a function of X, say T = T (X), which would represent the unknown g( ) very closely. Such a statistic T is called an estimator or a point estimator of g( ). Mathematically, T is a measurable mapping from X to the space of g( ) and it is called an admissible estimator. Any observed value of T is called an estimate of g( ). In a nutshell, a point estimate of a parameter is a single number that can be regarded as a sensible value for . A point estimate is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called a point estimator of . It is to be noted that for a particular estimator T for a parameter the estimate of may vary from sample to sample.

Example Suppose we want to estimate in Example 1.We may use the m(0; 1) and it is admissible.

Learn More

Casella, G. and Berger, R.L. (2002) : Statistical Inference. Duxbury.
Kale, B.K. (1999) : A First Course on Parametric Inference Narosa Publishing House.
Lehmann, E.L. and Casella, G. (1998) : Theory of Point Estimation Springer. Chapter 5.
Rohatgi, V.K. and Md.Saleh, A.K.(2001): An Introduction to Proba-bility and Statistics. John Wiley.