9 Sign Test II

Mr Taranga Mukherjee

epgp books

 

 

 

 

1 Unbiasedness of Sign Test

 

Suppose X1; X2; ::; Xn are iid observations from a population characterised by the DF F , where F is unknown but assumed to be continuous. Suppose (F ) is the quantile of order p, that is F ( (F )) = p for known p. Consider testing H0 : (F ) = 0 against the alternative Ha : (F ) > 0. Then the power function can be expressed as E 0 = P (S( 0) > S ) + aP (S( 0) = S ) = aP (S( 0) > S 1) + (1 a)P (S( 0) > S ) Pn Now observe that S( 0) = i=1 I(Xi > 0) is expected to be larger under than under 0. Thus S( 0) under is stochastically larger than that under 0. Then we nd that E 0 P 0 (S( 0) > S ) + aP 0 (S( 0) = S ). It is easy to observe that the RHS of above equals E 0 0 = . Thus we nd that E 0 for any 0 and hence unbiasedness follows.Unbiasedness for the other tests can also be proved in a similar way.

 

2 Con dence interval using Sign test

 

In parametric inference, we often obtain a con dence interval from the acceptance region of the test. The same technique, though di cult, can also be adopted in nonparametric procedures. Thus we can use the cut o s of two sided Sign test to get a con dence interval of with con dence coe cient at least (1 ). Consider the acceptance region of the non randomized level Sign test for H0 : = 0 against Ha : 6= 0. With the already introduced notations, the acceptance region can be expressed as c S(0) n c, where c is such that PH0 (c S(0) n c) 1 . We have already seen that S( ) is non increasing in . Then using the properties of order statistics, we at once obtain S( ) n c , X(c) and S( ) c , < X(n  c+1) . Thus c  S( ) n  c is equivalent to X(c) < X(n  c+1). Then P f(X(c); X(n  c+1)) 3 g = P (c S( )  n c): D Since S( )  = S(0)H0 , we have P (c S( )  n  c) = PH0 (c S(0) n c):Since the test is of level , the RHS probability in the above is at least 1 . Thus the coverage probability of the random interval [X(c); X(n c+1)) is at least 1 . Hence [X(c); X(n c+1)) is a con dence interval for with con dence probability at least 1 .

 

3 Large sample test

 

Under the null hypothesis, S Binomial(n; 12 ). Now by DeMoiver-Laplace limit theorem S   n S    = pn2  is asymptotically N(0; 1).  Therefore, di erent tests can be performed in large samples using S . Consider testing H0 : = 0 against Ha : > 0. Then the corresponding large sample test is non-randomized and rejects the null hypothesis if the observed value of S exceeds . Similarly, the large sample tests for the other hypotheses can also be constructed.

 

4 Sign test for quantiles

 

Suppose p 6= 0:5, that is (F ) is the quantile of order p. Sign test can be still used to test H0 : (F ) = 0 against the usual alternatives. Properties like consistency and unbiasedness will be retained. However, the distribution of S( 0), in this case will be Binomial(n; 1 p) under the null hypothesis. Naturally the statistic for the large sample test will be S = , which is asymptotically N(0; 1) under the null hypothesis.np(1  p)

 

5 Paired Sample Sign Test

 

Suppose (Xi; Yi)i = 1; 2; ::; n are sample observations from an unknown continuous bivariate distribution F (x; y). Suppose the median of the distribution of di erence Z = X Y is Z . The objective is to test H0 : Z = 0 against the usual one sided and two sided alternatives. Naturally, Z = 0 indicates that X observations tend to be 0 units larger than the corresponding Y observations. Thus statements about Z give information about the relative locations of the marginal distributions. Then the appropriate procedure is simply Sign test based on the new sets of observations Zi = Xi Yi; i = 1; 2; ::; n.

 

6 Sample size determination

 

To perform a Sign test, we need a random sample. Suppose the objective is to determine a shift in the median. Consider testing H0 : = 0 against Ha : = 1(> 0) for speci ed 1. Now the test can be based on a small sample size or a large sample size. But often the observations are subject to some cost and time constraint. Therefore, the experimenter needs to choose the sample size, which will be su cient to reach a decision. The usual technique is to determine n in such a way that the test has size and power at the alternative 1 , where and are speci ed in advance.

 

Consider the non-randomized level Sign test for Ha : = 1, which rejects the null hypothesis if S 6= k, where k is such that PH0 (S k) . Since S has a binomial distribution with parameters n and success probability 0:5, the above condition reduces to

 

6.1  Calculation for various F

 

Suppose is set at 6% and power at 80%. Then we get n = 15 and k = 11. Now we take 1 = :8 and consider F as Normal, Cauchy and Logistic. For normal distribution, we get power .801 but for the Cauchy distribution it is .57 and for Logistic distribution it is .47. Thus for normal population, the required sample size is 15. However, we need more samples to achieve 80% power for the other distributions. Then for normal distribution, we get power .851 but for the Cauchy distribution it is .585 and for Logistic distribution it is .48. A similar exercise with = :01 gives n = 25 and k = 18. Then for normal distribution, we get power .851 but for the Cauchy distribution it is .585 and for Logistic distribution it is .48, which supports the need of further observations.

 

6.2 Approximate sample size

 

The determination of exact sample size is not easier in practice. We can use the large sample approximations to get a simple formula for sample size. Consider testing H0 : = 0 against H1 : = 1(> 0) for speci ed 1. The large sample test rejects the null hypothesis if Sn > c. Then the size and power requirements are

 

 

6.3 Comparison of approximate sample size

 

For the purpose of comparison, we consider three distributions, namely, Normal, Cauchy and Logistic. The median for each distribution is taken as and scale parameter unity. Then we consider testing H0 : = 0 against H1 : > 0. For various choices of (> 0), we have computed the sample size by the derived formula with = :05 and = :2. The nature of the approximate sample size is plotted in the next page.

 

6.4 Observations

 

The plot depicts the same fact as is observed in the exact numerical study. Normal dis-tribution takes the lowest number of samples to reach the desired power level among the candidates. Logistic distribution takes the highest number of observations to reach 80% power. In addition, as we increase , the required sample size decreases for each candidate. This is a consequence of increasing power functions.

 

7 Application of Sign Test

 

7.1 Test of Trend: Cox-Stuart test

 

For data coming from some measurement processes, it is often desirable to check the presence of trend. That is, we are interested in knowing whether the observations depend on time.

 

Suppose Xi; i = 1; 2; ::; n are iid observations from a continuous population F . The hypothe-ses for such tests are expressed as H0 : Absence of trend in data against H1 : Presence of upward trend in data.

 

Assume that the observations are sequentially observed and X1; X2; ::; Xn are ordered as they are observed. Also assume that n is even and de ne c = n2 (if n is odd, the middlemost observation, that is the n+12 th observation is removed). Then, the whole set of observations are grouped into Xi; i = 1; 2; ::; c and Xi; i = c + 1; c + 2; ::; n. We club the observations into pairs (Xi; Xi+c); i = 1; 2; ::; c. If an upward trend is present, the event Xi+c > Xi is more probable than the event Xi+c < Xi for every i. If p = P (Xi+c > Xi), then the testing problem can be expressed as testing H0 : p = 12 against H1 : p > 12 . Thus the problem can be looked up on as that for the Sign test.

 

Then the test statistic T is the number of pairs (Xi; Xi+c); i = 1; 2; ::; c for which Xi < Xi+c.Pc That is T =         i=1 I(Xi < Xi+c). Thus the test statistic is nothing but the Sign test statistic based on Xi+c Xi; i = 1; 2; ::; c. Naturally higher values of T indicates presence of an upward trend. Under the null hypothesis T has a binomial distribution with parameters c and success probability 12 . Then as usual depending on the given level , the test can be constructed.

 

7.2 Test of correlation

 

Cox-Stuart test as discussed above can also be used to test for possible correlation. Suppose patients are given two drugs, one after another. Since the drugs are applied on the same patient, responses are correlated. Assume that a higher response indicates a favourable condition. Also assume that the paired response has a continuous bivariate distribution. Then the hypotheses in such a case can be expressed as H0 : Absence of positive correlation against H1 : Presence of positive correlation.

 

For details, assume that the response of the i the patient for the rst drug(second drug) is Xi(Yi); i = 1; 2; ::; n. Order the pairs according to the increasing values of the X observations. For example if three pairs of observations are (5,3),(3,2) and (7,8), then the ordered pairs, ordered according to the magnitude of the rst elements, are (3,2),(5,3) and (7,8). Then testing existence of positive (negative) correlation is equivalent to testing the presence of an upward(downward )trend in the ordered Y observations. Thus the test becomes the same as Cox-Stuart test of trend determination on the ordered Y observations.

 

8. Why use a Sign test?

 

The full information contained in the observations are not used in Sign test. Thus Sign test is less powerful, so one must use t test. But t test is based on normality. If the underlying distribution is non-normal, optimal test is rare to exist. In addition, the assumption about the underlying distribution is often instrumental. Thus Sign test is a safe option when there is any doubt about the normality of the underlying population though a sacri ce in the power.

you can view video on Sign Test II