11 Goodness of Fit Tests

Mr Taranga Mukherjee

1 What is goodness of t?

Often the underlying distribution of the data is not known. Traditional assumption of nor-mality makes the situation weak in such cases and therefore the experimenter wish to know whether the distribution is of a given form. Thus the objective is to know which distribution ” ts” good to the observed data. Tests for determining the underlying distribution, which is a good t to the observed data are known as goodness-of- t tests.

1.1 A motivating example

For better understanding, let us consider an example, where a random sample of size 9 is taken from a continuous distribution on [0,1] as .01,.11, .23, .34, .55, .65 , .77 ,.78,.89 Naturally, the sample size , that is, 9 is too small to assume normality. Again the observations are taken from a distribution with bounded support. Thus normality assumption will only be a forceful assumption. Suppose from a previous idea, it is known that the distribution can be assumed to be a Beta(:5; :5) distribution. Then the usual methods can not be applied to test the hypothesis that the observations are coming from a Beta(:5; :5) distribution.

1.2 Another example

Suppose the number of misprints per page of a book of 350 pages are represented in the form of the following grouped distribution:

Here the number of observations is quite large to assume normality. But the variable is discrete with small number of categories. It is well known that the number of misprints/page has, in general, a Poisson distribution. But the mean, i.e. the only parameter, is not speci ed in advance. Then with the usual methods, testing the hypothesis that the observations are coming from a Poisson distribution with unknown parameter is not possible.

1.3 A further example

Now consider a data on the heights of 122 students in a certain class in the form of the grouped distribution given below:

For the above data, the number of observations is quite large to assume normality. But the data is presented in the form of a frequency distribution. Moreover, the distribution of height is known to be normal. But even under the assumption of normality, the parameters are not given. Thus we need to develop further methods to check that the observations are coming from a normal distribution.

1.4 Related observations

From these examples, it is observed that:

We often have data from an unknown continuous distribution with support other than the whole real line.

We can have data from a known discrete distribution but with unspeci ed parameter(s).

We can also have data from a known continuous distribution with unspeci ed param-eter(s).

In these examples the data is given in either in the form of a frequency distribution or in the full and the objective was to test whether a given distribution ts the data well.

2 The hypothesis for goodness of t

Suppose Xi; i = 1; 2; ::; n are iid observations from a distribution with DF F (x). Then in the goodness-of- t problems the objective is to test the null hypothesis H0 : F (x) = F0(x) 8x against Ha : F (x) 6= F0(x) for at least one x:Now, we have two possibilities: F0(x) is completely known or F0(x) is not completely speci-ed, i.e. some parameters remain unknown. For completely known F0(x), the null hypothesis is simple but in the other case the null hypothesis is composite. However, in each case the alternative is composite. Now we develop a number of tests for such hypotheses.

3 Pearsonian Chi-square test

Suppose, we have data in the form of a frequency distribution. For discrete distributions, the categories are natural(i.e. how many observations are zero or 1 or 2 etc?). But for continuous data, the experimenter prepares his/her own frequency distribution. Obviously conversion of data in the form of a grouped frequency distribution incurs a loss in information. Naturally, the lower the number of categories, the higher is the loss. Then, it is reasonable to classify the data set in to a higher number of categories.

Thus, the large sample test rejects the null hypothesis if Vk > 2k r 1; at 100(1 )% level of signi cance.

3.3 Chi-square test: Some facts

Now we shall discuss some important facts related to the above test. First of all, note that the choice of k is subjective, too small k fails to capture the features of the underlying distribution. Thus the experimenter needs to classify data in to as many categories as possible to gather more information about the underlying distribution. Often the data is available in the form of a grouped distribution. If npi(or its estimate) is less than 5 for some i, the corresponding class is pooled with one or more neighbour classes so that the expected frequency for the combined class is at least 5. However, in such a case, degrees of freedom becomes number of classes after combining minus 1 minus the number of parameters estimated.

4 Kolmogorov-Smirnov tests

Next we discuss another test, which needs full data i.e. data is not available in the form of a grouped frequency distribution. Assume that F0 is continuous and completely speci ed. Since, we need to measure the discrepancy between F (x), the actual distribution and F0(x), a postulated known distribution, we use an estimate of F (x). From the theory of U statistic, the empirical DF Fn(x) is the U statistic for the estimation of F (x). Then it is known that E(Fn(x)) = F (x) and for each xed x,

4.1 Kolmogorov-Smirnov distance metric

Now to develop a measure of discrepancy, we consider the plots of ecdf based on n iid observations from F (x) and DF F0(x). Naturally ecdf is a step function whereas DF is continuous. For demonstration, we plot ecdf Fn(x) based on 10 observations from N(0; 1) and take F0(x) as N(-.5,1),N(0,1) and N(.5,1) , respectively. All these are given in the following plots.

Consider the gures and look at the vertical distance between Fn(x) and F0(x) for xed x. Observe that the distance is a measure of departure from F0(x). However, for some x, the distance is lower and for some x, the distance is higher. Therefore, we consider the maximum of such distances to measure the distance from F0(x). Naturally higher values of maximum of such distances would indicate departure from F0(x). This suggests to use the distance metric supx(Fn(x) F0(x)) or supx(F0(x) Fn(x)), which are known as Kolmogorov-Smirnov(KS) statistics.

4.1.1 Kolmogorov-Smirnov test: Critical region

Suppose the alternative is Ha : F (x) F0(x) for some x. Then our statistic is Dn+ = supx(Fn(x) F0(x)) and we reject H0 if Dn+ is too large. For the alternative is Ha : F (x) F0(x) for some x we use Dn = supx(F0(x) Fn(x)) and we reject H0 if Dn is too large. However, for the alternative is Ha : F (x) 6= F0(x) for some x, we use Dn = max(Dn+; Dn ) = supx jFn(x) F0(x)j and we reject H0 if Dn is too large. The exact values of the cut o can be obtained from the tables of Owen(1962) for speci c choices of n and level .

4.1.2 Kolmogorov-Smirnov Large sample Test

where U(i); i = 0; 1; 2; ::; n are the order statistics for a random sample of size n from a R(0,1) distribution. Thus Dn+ is a function of U(i); i = 0; 1; 2; ::; n, where U(0) = 0. Since the joint dis-tribution of U(i); i = 1; 2; ::; n is independent of any F under the null hypothesis, distribution of Dn+ does not depend on any F . Thus the test given by Dn+ is exactly nonparametric.Since + D + Dn = Dn , the test given by Dn is also nonparametric. Again, Dn = max(Dn ; Dn ), and hence the corresponding test is also nonparametric.

4.1.4 Consistency of Kolmogorov-Smirnov test

5 Few other goodness-of- t statistic

In this module we have discussed only two distance metrics. But there are a number of such metrics which can be used as goodness of t statistic. Some examples include Cramer-Von Mises(CvM) statistic de ned by

6 An application: Test for normality

Kolmogorov-Smirnov test can also be applied to test the normality of the given data in small samples. or normality, we take F0(x) as the DF of a standard normal variable. However, for the purpose of comparison, we consider the standardised version of the data set(say X ). Then application of Kolmogorov-Smirnov test on X can reveal the underlying normality. Therefore, the procedure can be applied to identify non normal data even in small samples.

you can view video on Goodness of Fit Tests