7 Non-Parametrics Hypothesis Testing and Confidence Interval
Prof Rahul Bhattacharya
1 Nonparametric hypothesis testing and con dence in-terval: Why the need?
The central idea of parametric (or classical) inference is the assumption regarding the un-derlying population. The entire theory of the parametric inference is developed under this assumption and consequently these procedures are valid as long as these assumptions are satis ed. For example, students t test is appropriate only when the underlying distribution is normal. But normal is not the only distribution having applications in real life. For example, in survival trials, the lifetime distributions are mostly exponential, gamma or Weibull, that is non normal. Therefore, inference based on t test in such a situation is often misleading. Therefore, parametric tests are useful only when the experimenter is su ciently con dent about the underlying distribution. But unfortunately, the available methods of identifying the underlying distribution is limited to standard distributions only.
Therefore, it will be better if hypothesis testing procedures can be developed with minimal assumptions about the underlying distribution(say, continuity of observations). Suppose there exists a statistic T , relevant to the testing problem such that the exact and/or condi-tional and/or asymptotic distribution of T under the null hypothesis is independent of the underlying distribution. Naturally, the signi cance level for the test based on T does not depend on the underlying distribution, that is, these tests are level robust. Tests based on T are commonly termed as nonparametric or distribution free.
In real practice, T is often formed by taking only the signs or ranks of the actual obser-vations. But such a T does not take into account the full information contained in the individual observations and hence are often less e cient than their parametric counterparts. However, nonparametric tests are the valid choices as long as the validity of the parametric assumptions are questionable.
As in parametric inference problems, the experimenter may be interested in constructing a con dence interval with coverage probability independent of the underlying distribution. However, in nonparametric inference, the unknown quantity of interest are mostly location or scale parameters. As in the parametric counterpart, we can invert the acceptance region of a distribution free test to get a distribution free con dence interval with high coverage probability. These will be discussed later. However, if the unknown quantity of interest is a population quantile, we can develop a distribution free con dence interval based on the sample order statistics.
2 Components of a nonparametric test
In most of the problems of nonparametric inference, the underlying distribution is not spec-i ed except for continuity. Therefore the available methods(e.g. Neyman Pearson lemma, likelihood ratio method) of test construction are of no use and hence, we need to develop tests with intuitive appeal. In particular, if we can identify a distribution free statistic T for the problem, we can develop a distribution free test. However, for a meaningful development, we maintain the following sequence:
1. Model assumption(i.e. the minimal set of assumptions about the underlying distribu-tion)
2. Hypothesis of interest(i.e. speci cation of the null and alternative hypotheses)
3. Available tests for the problem(i.e. the usual parametric procedures for speci c prob-ability models)
4. Suggesting a distribution free statistic(i.e. specifying some T )
5. Justifying the form of critical region.
6. Investigating unbiasedness and consistency of the suggested test and nally
7. Providing a large sample test corresponding to the given test.
3 Consistency of tests-Some basics
However, tests based on such a T are often far from being optimal and hence properties like maxmising power is not immediate. Consistency is, therefore, an important concern to measure the sensitivity of the test in large samples to a little departure from the null hypothesis. We provide below the notion of consistency of tests in little details.
Suppose X1; X2; ::; XN are iid observations from an unknown distribution G. We are inter-ested in testing H0 : G 2 0 against Ha : G 2 a, where 0( a) is the class of distributions speci ed by the null(alternative) hypothesis.
A sequence of tests f N g is said to be consistent if for every G 2 a,
However, asymptotic normality or asymptotic size condition is not immediate and hence we provide some simpler conditions. Assume that G is indexed by a real parameter , that is, G(x) = G(x; ) and the testing problem can be expressed as H0 : = 0 against Ha > 0. Suppose a level test rejects the null hypothesis if SN ( 0) cN ( 0), where SN ( 0) is so constructed that
for varying n. Suppose we x 0 = 0 and = :2. It is easy to observe that the power for Test 1 remains very close to .05 for varying n whereas the power for Test 2 increases sharply. The same is observed for the assumed choices of . However, the rate increase for the power of Test 2 is higher for higher values of . This is expected, as the rst test is based on an inconsistent estimator(i.e. X), i.e. , it does not concentrate around the true value for large ~ n. Consequently the power increases at a very slow rate for Test 1. On the other hand, X is consistent and hence for large n approaches the true parameter and consequently power increases for increasing values of n.
4 Different nonparametric hypothesis testing problems
Now we shall discuss the di erent types of hypotheses considered in nonparametric hypothesis testing together with their relevances. Based on the availability of data, hypothesis testing problems are either single sample or two sample or multi-sample. We, therefore, discuss hypotheses for each type of problems.
4.1 Single sample problems
Suppose X1; X2; ::; Xn are iid observations from an unknown distribution F . F is unknown but known to be continuous. Then depending on the requirement, we have the following di erent hypotheses.
4.1.1 Problem of location
Suppose p 2 (0; 1) is a known quantity and let (F ) = p(F ) be the quantile of order p for F , that is F ( (F )) = p. Then the problem of location is to test H0 : (F ) = 0 against one of the alternatives Ha : (F ) > 0 or Ha : (F ) < 0 or Ha : (F ) 6= 0 for some known 0.
For the above hypothesis, only the continuity of F is required. However, if we consider the same hypothesis with the added assumption of symmetry of F , it will be the problem location under symmetry. Since, the main concern in a problem of location or location under symmetry, is the location(e.g. median), the hypothesis is an analogue to the test of a location parameter in parametric counterpart.
4.1.2 Goodness of t problem
In a goodness of t problem, the interest lies in investigating whether the sample comes from a speci ed distribution. Then the hypothesis of interest in a goodness of t problem can be described as H0 : F (x) = F0(x) for all x, where F0 is a completely known DF. The alternative hypothesis is naturally Ha : F (x) 6= F0(x) for at least one x. The problem of goodness of a t also arises in parametric inference after some known distribution is tted to a data.
4.2 Two sample problems
Suppose Xi; i = 1; 2; ::; n and Yj; j = 1; 2; ::; m are independent samples from unknown distributions F and G respectively. We only assume the continuity of observations from F and G. The basic hypothesis in any two sample problem is H0 : F (x) = G(x) for all x against the usual one sided or two sided hypothesis. De ning 0 = f(F; G) : F (x) = G(x)8xg, we can express the null hypothesis as H0 : (F; G) 2 0. Depending on different speci cations of 0 and a = f(F; G) : F (x) 6= G(x) for some x g, we have the following possible hypotheses.
1. General/ Homogeneity Alternative: Suppose the two underlying populations dif-fer in any manner(in location, scale or skewness). Then such an alternative hypothesis can be expressed as a = f(F; G) : F (x) 6= G(x) for some xg. The general hypothesis is also termed as Homogeneity alternative.
2. Stochastic Alternative: A stochastic alternative is a restricted alternative, where a = f(F; G) : G(x) F (x) for all x with strict inequality for some xg:Actually G(x) F (x) for all x implies that X observations are tend to be larger than Y observations or, in other words, X is stochastically larger than Y. This is a general class of alternatives.
3. Location Alternative: Suppose G(x) = F (x ), where 6= 0. That is, the un-derlying distributions di er only in location. Then F (x) > G(x) or F (x) = G(x) or F (x) < G(x) according as < 0 or = 0 or > 0. Thus the null hypothesis can be re-stated as H0 : = 0 and the alternative is either Ha : > 0 or Ha : < 0 or Ha : 6= 0. Clearly > 0 implies a = f(F; G) : F is shifted to the right of G for some xg. This is a special case of a stochastic alternative, where G(x) = F (x ). Stochastic alter-native,in general, relates to the location alternative in a less restrictive sense because G(x) > F (x) indicates larger Y observations and hence corresponds to larger location of Y observations.
4. Scale Alternative: Suppose G(x) = F ( x ) with > 0, that is, the two underlying populations are assumed to di er only in scale. Since, F (x) R G(x) , R 1, the null hypothesis reduces to H0 : = 1. The alternative is either of Ha : > 1 or Ha : < 1
or Ha : 6= 1. Scale alternative can also be viewed as a stochastic alternative, where G(x) = F ( x ).
It is worthwhile to mention that tests meant for general or stochastic alternative can also be used for location and scale alternatives. But they will be less e cient than the tests developed for the speci c alternative. A similar set of alternatives can be found corresponding to any multiple sample problems and will be discussed later considering speci c situations and hence are not discussed separately.
4.3 Paired sample problems
Suppose (Xi; Yi)i = 1; 2; ::; n are sample observations from an unknown bivariate distribution F (x; y) with FX (x) and FY (y) as the marginal DF’s. Then the following two hypotheses are of main interest:
1. Problem of association: Here the interest lies in testing H0 : F (x; y) = FX (x)FY (y) for all (x,y)
2. Problem of location: Suppose X represents the response before a drug is admin-istered and Y denote that after the application of the drug. Then naturally Y are in uenced by X and hence X and Y are correlated. Then the natural objective in this situation is to determine whether the drug has any e ect. In statistical terms, this means X and Y are exchangeable. Thus the null hypothesis can be expressed as
H0 : X and Y are exchangeable. D H0 : (X; Y ) = (Y; X) H0 : F (x; y) = F (y; x) 8 (x; y):
De ne D = Y X, Then under the null hypothesis distribution of D is symmetric about the origin. If X and Y di ers only in location under the alternative, then D = (D ), where is the location di erence. Naturally, under the null hypothesis, the distribution of D has median at the origin but under the alternative, the median becomes . Then the problem reduces to testing H0 : = 0 against all alternatives. However, the median of the distribution of the di erence is not always the di erence of the the marginal medians. If the marginal distributions and the distribution of the di erence are all symmetric, then median of the distribution of di erence and the di erence of the two medians coincide(see, Gibbons and Chakraborti, 2006, for details).
5 Distribution free con dence interval for quantiles
Suppose Xi; i = 1; 2; ::; n are iid observations from a continuous but unknown DF F (x). The objective is to provide a con dence interval of the p th order quantile p satisfying F ( p) = p. Since p is a population quantile , it is natural to consider intervals based on the sample quantiles as a con dence interval. Thus we can start with X(r) (i.e. the sample nr th quantile) and X(s) (i.e. the sample ns th quantile). That is, we suggest to consider the interval (X(r); X(s)) with r < s as a con dence interval for p. Now we shall show that the coverage probability of [X(r); X(s)] is independent of any F. Note that for any k, X(k) p , Z k; where Z has a binomial distribution with parameters n and success probability F ( p) = p. Thus
Thus [X(r); X(s)] is a con dence interval for p with con dence coeient (n; r; s). Clearly, (n; r; s) is independent of any F and hence [X(r); X(s)] gives a distribution free con dence interval of p. However, in practice the con dence coe cient is set at least (1 ) with