8 Sign Test I

Mr Taranga Mukherjee

epgp books

 

 

 

1 A motivating example

 

We start with a motivating example. Consider the following data on the lifetime of an electric equipment:

 

1046.541 1110.841 1259.690 1014.233 1156.425 1001.439 1299.962 1116.045

1022.895 1106.415 1023.236 1093.674 1103.354 1005.930 1202.124 1001.251

1129.546 1051.215 1043.066 1054.430.

 

Suppose interest is to test the null hypothesis that the mean lifetime is 1050 hours. The usual practice is to use Student’s t distribution for the purpose. But the question is natural ”   Whether normality assumption holds?” We perform some exploratory data analysis. We provide below the histogram and Q-Q plot for the data.

 

Figure 1: Histogram and Q-Q plot of the data

 

 

The histogram shows that the distribution of the data is far from symmetric. The QQ plot clearly reveals the non normality of data. Then t test is not appropriate.A test for this data will be appropriate , if the assumed distribution is appropriate. However, deciding an appropriate distribution su ers from subjectivity and no such thumb-rule is present. Thus, we need alternative procedures to test the hypothesis appropriately.

 

2 What is Sign Test?

 

This is a nonparametric analogue of Student’s t test for the mean(i.e. a location parameter) of the population. The t-test is based on the assumption of normality of the underlying population. Sign test is a nonparametric alternative to t test. This test does not require the assumption of normality. It also provides a test of location but uses quantiles of the distribution as the location parameter. Moreover, sign test is based on only the continuity of the underlying population.

 

2.1 Assumptions & the hypothesis

 

Suppose X1; X2; ::; Xn are iid observations from a population characterised by the DF F , where F is unknown but assumed to be continuous. Suppose (F ) is the quantile of order p, that is F ( (F )) = p for known p. Then in Sign test, the objective is to test H0 : (F ) = 0 against one of the alternatives Ha : (F ) > 0 or Ha : (F ) < 0 or Ha : (F ) 6= 0 for some known 0. For our discussion, we choose p = 0:5 so that (F ) reduces to the median.

 

2.2 Sign test statistic-The intuitive argument

 

Note that if the observed data is consistent with (F ) = 0, then one can expect that almost 50% values of the data set lie above and below 0. This suggests to use the number of observations exceeding 0 as the test statistic. Formally this implies the use of the statistic S( 0) = Pn I(Xi 0 > 0) Since S counts the number of positive signs among Xi 0; i = i=1 1; 2; ::; n, the test based on S is called Sign test.

 

2.3  Sign test statistic: Another look

 

Assume that   0  = 0. Then  (F ) > 0 , F (0) < :5 , that is P (X1  > 0) > :5. Similarly, (F ) < 0 , F (0) > :5 , that is P (X1 < 0) > :5. Thus (F ) > 0(or < 0) implies more positive(negative) observations. The following gures will make the idea clear.

Consider the alternative H0 :  > 0. This suggests to use the number of positive observa- Pntions as our test statistic. Formally this implies the use of the statistic S =      i=1 I(Xi > 0). Similarly, the form of the statistic for other alternatives can also be justi ed.

 

3 Distribution of S

 

Assume that P (Xi = 0) = 0 for every i = 1; 2; ::; n. Since each I(Xi 0 > 0) can be thought of a Bernoulli random variable, S can be looked upon as a sum of n Bernoulli ran-dom variables. Since observations are iid, I(Xi 0 > 0) are iid random variables. Now the distribution of each I(Xi 0 > 0) is Bernoulli with success probability P (X1 > 0).Thus S  is the sum of n iid Bernoulli random variables with success probability P (X1 >  0).

 

We see that, S has a Binomial(n; P (X1 > 0) distribution. Naturally the success probability P (X1 > 0) depends on the underlying F . However under the null hypothe-sis F ( 0) = 0:5 and hence S becomes a distribution free statistic.Therefore tests based on S are exactly nonparametric.

 

4 Critical region

 

Since S Binomial(n; 12 ), under H0, E(S) = n2 . However, under any (F ), E(S) = n(1 F ( 0)). Suppose > 0, then it is expected to have more than 50% observations exceeding 0. Thus S( 0) is expected to be larger under > 0 than under = 0. Therefore, larger values of S( 0) indicates evidence against = 0.Naturally a right tailed test based on S( 0) seems appropriate for testing H0 : = 0 against Ha : > 0.

 

Again less than 50% observations exceeding 0 are expected under < 0. Thus S( 0) is expected to be smaller under < 0 than under = 0. Thus a left tailed test based on S( 0) seems appropriate for the alternative Ha : < 0. However, if 6= 0, then S( 0) is expected to be either smaller or larger than under = 0. Therefore, a two tailed test based on S( 0) is appropriate for the alternative Ha : 6= 0.

 

5 Symmetry of S

 

Since the distribution of S( 0) is Binomial(n,.5) under the null hypothesis, S( 0) has a sym-metric distribution about n2 . We explore the implication of the symmetric nature. Suppose we have observed Yi = 2 0 Xi for each i, instead of Xi. Under = 0, the median of the distributions of both Yi and Xi is 0. Then tests for the two sided alternative based on Yis and Xis are expected to give similar results. But test applied on Y’s will give similar result to that applied on X’s, if the statistic has a symmetric distribution. This gives the justi cation of the requirement of symmetry.

 

5.1 Di erent Tests

 

Since, S has a discrete distribution, tests based on it will be randomized. For the alternative Ha : > 0, a size test can be expressed as 0 = I(S > S ) + aI(S = S ), where S is such that EH0 0 = .For the alternative Ha : < 0, a size test can be expressed as 0 = I(S < S1 ) + aI(S = S1 ), where S1 is such that EH0 0 = .

 

5.2 Test based on p values

 

Suppose Sobs is the observed value of S. For the alternative Ha : > 0, the one sided p value is PH0 (S Sobs). We accept the null hypothesis if this p value exceeds . For the alternative Ha : < 0, the one sided p value is PH0 (S Sobs). We accept the null hypothesis if this p value exceeds . However, for the two sided alternative Ha : 6= 0, the two sided p value is 2minfPH0 (S Sobs); PH0 (S Sobs)g. We reject the null hypothesis if this p value does not exceed .

 

6 Presence of ties

 

We have already assumed continuity of F so that P (Xi = 0) = 0 for every i = 1; 2; ::; n. But in practice, we can have observations equal to 0. Thus we get some zero’s in S. Presence of a large number of 0’s can give misleading results. The usual method, in this context

taken as our new statistic and tests based on it can be performed as earlier. These tests are known as conditional sign tests.

 

7 Optimality of Sign Test

 

Consider testing H0  :  =  0  against Ha  :   > 0.  De ne 0  = fF : F ( 0) = 12  and a = fF : F ( 0) < 12 . Then the above hypothesis testing can be equivalently expressed as testing H0 : F 2 0 against Ha : F 2 a. Then H0 and Ha are both composite. It can be shown that the UMP size test for the above testing problem is nothing but the Sign test based on S. In a similar way the two sided Sign test is UMPU size (see, Fraser, 1957, for details).

 

8 Consistency of Sign test

 

Consider testing H0 : = 0 against Ha : > 0. Now Sign test can be equivalently expressed in terms of Sn . For simplicity assume 0 = 0. Since S has a binomial distribution, we have under any = (F ),

Thus Sign test is consistent against the alternative Ha : > 0. Consistency against the other alternatives can be proved also.

you can view video on Sign Test I