42 Non –parametric test II – counts or percent, chi square, Mc Nemar test, Cocharn Q test, Spearman correlation.

P. Deivanai

epgp books

 

 

 

 

Introduction

 

The real life situations, there are a number of occasions when the universe distribution is not normal. Hence to test hypothesis relating to non-normal populations, there is a necessity to develop a different group of tests which are not based on the too restrictive assumptions. Such tests which are free from distribution are called Non-Parametric tests. In addition to the above, chi-square test enables one to find out the significance of difference between two or more population proportions. When the dependent variable samples are continuous in nature, then the sign and Wilcoxon tests are appropriate for two dependent sample studies. Count means determine the total number of collected items. Sometimes the investigator may be interested in finding out whether to qualitative characters tend to occur together in a given population.

 

Non-Parametric Test

 

These non-parametric tests can always address situations where the data are nominal in nature like increase, decrease, no change, or ordinal like ranked. The non-parametric tests in that sense are now strengthening the hands of statisticians to analyzer and explain and type of situation distribution based or distribution free. As regards the historical background about non-parametric tests, it could be stated that the Chi-square test developed by karl Pearson in 1900 was the earliest of non-parametric tests formulated. This was followed by the contributions of several statisticians like spearman, with rank correlation tests, which became popular after Harold Hotelling who used the rank correlation coefficients in 1936. It was in 1945 when wilcozxon developed a test for two sample cases, the real growth of non-parametric tests started taking place.

 

Advantages in using the non-parametric tests are:

  1. They are not dependent on the nature of distribution of the population.
  2. They are simple to understand and easy to apply/
  3. They are less time consuming and once the result is statistically significant, no further ork is necessary.
  4. They are very useful to researchers where the sample size is very small and adequate data from samples cannot be compiled.
  5. They are amenable for application for any type of data –qualitative, quantitative or ranked. f)They are not based on very restrictive assumptions and in the case of parametric tests.

Chi square test

 

Karl Pearson in 1900 developed a statistic procedure for testing the significance of discrepancy between experimental values and theoretical values obtained under some theory or hypothesis. This test is known as 2 (chi-square) test. It is a test of goodness of fit and is used to find whether the deviation between observation and theory may be attributable to change or any other factor. Chi-square test is applicable to enumeration data and not to data collected on measurements on continuous scale .

 

Data applicable to Chi square Test

 

1 Qualitative or categorical variables

 

There are a number of qualitative variables inhuman populations such as colour of the skin, eye, hair, etc. The individuals possessing these characters are counted and the number of occurrences in the sample is given in the form of frequency. These characteristics are expected to occur in certain frequencies, called as expected frequencies. Similarly, there are a number of categorical variables such as blood group education level, employment level, etc. These variables are categorized and the number of individuals possessing these characters is counted and the numbers of occurrences in the sample is given in the form of frequency against each category.

 

For example number of persons with the same blood group are counted and given against each category like A,B.AB and O.

 

ii) Qualitative or categorical variables given in contingency table

 

Sometimes the investigator may be interested in finding out whether to qualitative characters tend to occur together in a given population. In such a case, the individuals are counted for the presence or absence of to specific characters and are given in the form of a contingency table 1

 

Table 1

 

The table 1 presented in the form of a table with variables, one each in row and column and under each category the observations are placed in two categories. This table is a 2×2 contingency table, since data are presented in two rows and to columns. Depending upon the number o categories for each variable the number of rows and columns vary. iii) Qualitative or categorical variables in the form of proportions

 

A demographer may be interested in finding out whether the proportion of material status of men in four different cities is the same. He counts the number of persons – married and single and presents it in the form of proportions like married / Total and single /total against each city.

 

Conditions for validity of Chi-square test Chi-square test can be used only if the following conditions are satisfied

  1. ‘N’ the total frequency should be reasonably large, say greater than 50.
  2. the observations in the sample should be independent. This means no individual item should be included twice or more in the sample.
  3. No theoretical frequency should be small. Preferably, each theoretical frequency should be larger than 10 but should not be less than 5.
  4. The data should be given in original unit

Application of Chi-square Test

 

i) Chi –square test of goodness of fit or chi –square test with a priori hypothesis.

 

Goodness of fit is a generic term used to indicate how far an observed frequency distribution fits ell with the expected frequency distribution based on some theory or expectation. Since the expected frequencies are calculated based on some theory or hypothesis, it is also called Chi – square test with a priori hypothesis.

 

In this case the null hypothesis(H0) is “ there is no difference between observed frequency and expected frequency” . The alternate hypothesis (HA) is “there is difference between observed frequency and expected frequency.

 

ii)Chi Square test for association between attributes or independence of attributes or chi-square test without a priori hypothesis. It is used to find whether two characters have the tendency to remain together or remain independently. For example, whether a patient suffering from a disease is cured because of the drug or the recovery from illness is independent of the drug. In this case, the frequencies are given with the form of a contingency table.

 

Hence, the null hypothesis (H0) is “ There is no association between two characters”. i.e.,the drug id not effective in curing the disease. The alternate hypothesis is “There is association between the two characters”. i.e., the drug is effective in curing the disease.

 

iii)   Chi-square test to test the equality of proportions.

 

As stated if an investigator is interested in finding out whether the proportion of married men is same in four major cities. The null hypothesis (H0) is “The proportion of married men is the same in four major cities. The alternate hypothesis (HA) is “ The proportion of married men is not the same in four major cities.

 

iv) Chi-square test of homogeneity

 

sometimes e need to compare samples taken from two population differeing in some characters. For example samples are drawn from two population namely “normal” and liver disease patients’; these to samples are then classified into two:

  • Persons with hepatitis and
  • Persons without hepatitis

Here Chi-square is used to find out whether two populations from where the samples ere taken are homogenous or not. In this case the null hypothesis (H0) is “The two population s from here the samples are drawn are homogenous”. The alternate hypothesis is “ The two populations from where the samples are drawn are not homogenous”. The alternate hypothesis is “The two populations from here the samples are drawn are not homogenous”.

 

Though, four different applications are given for 2 test, all these tests are basically same as these tests find whether the observed frequency fits well with expected frequency i.e., null hypothesis (H0)= Oi – Ei = 0 but the statement of null hypothesis differs.

 

Procedure to carry out Chi-square Test

  1. Propose Null and Alternate hypothesis
  2. calculate expected frequeicies
  3. Find the deviations between expected and observed frequeicies.
  4. Square the deviations and divide by respective edpected frequeicies
  5. substitute in the formula
  6. Write the degrees of freedom
  7. Refer the tabulated 2 test values for the specified degrees of freedom at 0.01 (or 0.)1)

  Probability level Compare calculated chi-square values with tabulated chi-square values The above procedure is routinely done to work out any problem manually or with calculator. But when we use SPSS, there is no need to do any of the steps from ii- viii, we need to propose the hypotheses, enter the data correctly run the analysis and interpret the results.

 

Table 2

a.0.cells (0.%) have expected count less thant 5.

 

The minimum expected count is 7.50

 

Output

 

on table 2 gives the frequencies of blood groups in male and female in the table form for 3 degrees of freedom i.e (r-1) c-1)= (2-1) (4-1)= 3, the P-Value 0.852 is greater than 0.05. the difference is considered insignificant. The null hypothesis is accepted and therefore, there is no association between sex and blood group. In other words, the gender and blood groups are independent in human. Non-parametric significance tests which is used to calculate two dependent samples. The researcher would like to study correlated, or matched, samples. It includes the calculation of before-after effect and matched paired studies:

 

Mc Nemar’s Test

 

Mc Nemar’s test was fist time published by Psychometrika article in the 1947. Mc Nemar’s test was created by Quinn McNemar. Who was a professor worked as a Psychology and statistics department at stand ford University. When a category of the sample is more than two, marginal homogeneity tests are appropriate; they are essentially an extension of the McNemar test for dependent samples.

 

In medical research, if a researcher wants to determine whether or not a particular drug has an effect on a disease (e.g., yes vs. no), then a count of the individuals is recorded (as + and – sign, or 0 and 1) in a table before and after being given the drug. Then, McNemar’s test is applied which is help to take statistical decisions (using the Chi Square test statistic) as to whether or not a drug has an effect on the disease.

 

The Mc Nemar’s test work out the following ways:

 

We have to assume that the row total is = (equal) to the column total.

In other works is

 

(A+B) = (A+C)

(C+D) = (B+D)

 

In this workout is cancel A and D equation and this implies that B=C. Through on this calculation explain the equation we will calculate the test as:

 

Questions Answered:

 

Is there a change in the proportion of voters prior to and following the press conference?

 

Does the proportion of success vs. failure significantly change after treatment?

 

Hypothesis:

 

Null hypothesis: Assumes that the total rows are equal to the sum of columns. The mean of paired samples are equal and no (significant) change has occurred. for example, the null hypothesis assumes that the drug has no impact on disease.

 

Alternative Hypothesis: Assumes that the total number of rows is not equal to the total number of columns, or that the paired sample means are not equal. In medical research, for example, alternative hypothesis assumes that the drug has an impact on the disease.

 

COCHRAN’s Q TEST

 

It include two various statistical tests: Cochran’s Q test, a non-parametric test which is applied to the Cochran test to analysis of two-way randomized block designs with the the help of binary respondent variable. Cochran’s Q test is an extension to the McNemar test for related samples that provides a method for testing for differences between three or more matched sets of frequencies or proportions. Cocharan’s Q tests whether the probability of a target response is  equal across all condition, verify if k treatments have identical effects. Example: 12 subjects are asked to perform 3 tasks. The outcome of each task is a dichotomous value, success or failure.

Cochran’s Q test is base on the following assumptions

  1. A large sample approximation in particular, it assumes that b is “large”
  2. The blocks (rows) were randomly selected from the population of all possible blocks.
  3. The outcomes of the treatments can be coded as binary responses ( ie 0 or 1) in a way that is common to all treatments within each block.

For example the researcher who had collected the pet shop data wanted to examine wheter pet stores displayed different types of reptiles during different times of the year. So the researcher visited each of the 12 stores four times during the next year that were chosen because of their proximity to holidays, valentine’s day, July 4 halloween and Christmas. During each visit, the researcher recorded if the shop displayed only snakes (coded=0) or both types of reptiles (coded=1) In this analysis the one variable is the time of the year and the response variable is the type of reptiles displayed.

 

Research Hypothesis

 

The researcher hypothesized that pet shops would be more likely to display both reptiles to Christmas than during the other times of the year Ho= stores are equally likely to display both types of reptiles during all parts of the year/

 

SPEARMAN’S RANK CORRELATION

 

This is one of the most popular non parametric tests. In this test the difference between the ranks of one of one sample and the other is computed and based on the difference which is squared, the correlation is computed. Then the standard error of the rank correlation coefficient is calculated with the help of this formula

This process is followed when the number of items in a sample is less than 30. When the sample size is greater than 30, then the following process will have to be followed to test the results of rank correlation.

 

Example 1. A health organization studies a problem relating to the relationship between air quality and the evidence of pulmonary related diseases. The details are given below test the hypothesis that there is no correlation between air quality and pulmonary related diseases

Now the rank correlation is found using the formula

The rank correlation answer 0.736 suggests a substantial positive association between average air quality and the occurrence of pulmonary diseases in the selected eleven cities. To test the hypothesis set for this problem Using the table value for the sample size – 11, it could be noted that the critical value for r lies between +0.6091 and -0.6091. Since the calculated value of rank correlation lies outside the limit, we reject the null hypothesis; hence, there is relationship between air quality and occurrence of pulmonary diseases.

 

Conclusion

 

Non-parametric significance tests which is used to calculate two dependent samples. . I In the chochran’s Q test The outcomes of the treatments can be coded as binary responses ( ie 0 or 1) in a way that is common to all treatments within each block. Since the calculated value of rank correlation lies outside the limit, we reject the null hypothesis. The adoption of research methodology find out testing the hypothesis was redesigned to enable our research team to efficiently measure health related quality. The q test s a test the management capability with the use of software which is used by small as well as large scale organisation. The Non-parametric test assesses which is a statistically significant change in proportions have occurred. The McNemar test is the best test for dichotomous variables with two dependent sample studies. Hence, there is relationship between air quality and occurrence of pulmonary diseases. So the application of non-parametric test is more useful in to find out the solution to the problems.

you can view video on Non –parametric test II – counts or percent, chi square, Mc Nemar test, Cocharn Q test, Spearman correlation.

 

Web links

  • https://www.investopedia.com/terms/c/chi-square-statistic.asp
  • https://www.medcalc.org/manual/cochranq.php
  • https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php
  • http://www.biostathandbook.com/spearman.html
  • http://study.com/academy/lesson/chi-square-definition-analysis.html
  • http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/cochran.htm https://ncss-
  • wpengine.netdnassl.com/wpcontent/themes/ncss/pdf/Procedures/NCSS/Cochrans_Q_Test.pdf
  • https://www.wikihow.com/Calculate-Spearman%27s-Rank-Correlation-Coefficient