41 Non-parametric test I – Mean ranks, Mann Whitney test, Wilcoxon test, Kruskal test, Friedman
K. Ramya
1 Introduction
We are aware that parametric test has certain assumptions about one or more parameters of the population distribution from which the sample is drawn. Parametric test assumes the population data are normally distributed and is defined by its parameters such as mean and standard deviation. But in real world scenario, we come across data that does not follow any said distribution. To analyse such data, we use non-parametric test which can be used for non-normal variables. Non-Parametric statistics refer to a statistical method wherein the data does not have normal distribution. These tests are also called as distribution-free tests because they don’t assume any distribution.
Non-parametric test can be used under the following circumstances:
- Variables are distribution free;
- The data may not be put into metric form appropriately (Nominal data);
- Data may be rank ordered;
- Data may be from small samples;
- There may be non-normal distribution of the variables (Skewed data);
- Outliers may be present.
The equivalent non-parametric tests for various parametric tests are discussed below:
1.2 Non-Parametric Tests
List of Non-Parametric test discussed in this session are
- Mann Whitney U Test
- Wilcoxon Test
- Kruskal-Wallis Tests
- Friedman Test
Mean Ranks
A method of handling data which has the same observed frequency occurring at two or more consecutive ranks; it consists of assigning the average of the ranks as the ranks for the common frequency.
1.2.1 Mann Whitney U Test
Mann-Whitney U test is the non-parametric alternative test to the independent sample t-test. It is used to compare two sample means that come from the same population, and used to test whether two sample means are equal or not.
Example:
Imagine that a researcher wants to know the difference between the effects of two fertilisers on yield, and if data is not normal, we use Mann-Whitney U test
Formula:
Where:
U = Mann-Whitney U test
n1 = Sample size one n2 = Sample size two Ri = Rank of the sample size
Conditions/Assumptions:
- The sample drawn from the population is random.
- Independence within the samples and mutual independence is assumed. That means that an observation belonging to one group cannot be found in another group.
- Ordinal measurement scale is assumed.
Data Required :
Ordinal and numerical data for the variable to be tested Procedure using SPSS:
- From the menu choose: Analyze – nonparametric tests – legacy dialogs – 2 independent samples test…
- Select test variables (variables with numerical scale) – select grouping variable (variables with categorical scale) and define the groups.
- Select test type as Mann-Whitney U test
- Then click Ok
Let us practically see Mann Whitney U Test using SPSS:
Example: Imagine that a researcher wants to determine wants to know the difference between the effect of drug A and B administered on patients suffering from cancer during the 6th week. Therefore, the dependent variable is “cancer condition” and the independent variable is “drug”, which is split into two groups: “Drug A” and “Drug B”.
Here cancer condition is the test variable and drugs is the grouping variable.
Research Question: Is there a significant mean difference between the effects of drug A and drug B?
Hypotheses:
H0: There is no significant mean difference between the effects of drug A and drug B
H1: There is significant mean difference between the effects of drug A and drug B
The data sheet consists of responses of 20 respondents: The data includes name, drug type, age, age group, weights in pounds, cancer stage, initial condition, cancer condition after 2 weeks, after 4 weeks, and after 6 weeks.
Let us use the sample file Non-Parametric Tests.sav which includes the data as given above.
From the menu choose: Analyze – nonparametric tests – legacy dialogs – 2 independent samples test…
Select test variables (numerical scale) as condition after 6 weeks
Select grouping variable (categorical scale) as drugs and
Define the groups as 1,2 as defined by in the sample data file
1 refers to Drug A and 2 refers to Drug B
In test type, check if Mann-Whitney U test is selected …… Then click Ok.
The result will be displayed in the output screen of SPSS which is as follows:
Results and Inference
The Ranks table provides information regarding the output of the actual Mann-Whitney U test. It shows mean rank and sum of ranks for the two groups tested. In the Ranks table, it is observed that the difference between the mean ranks between the two groups is negligible (0.62).
The next table i.e. the Test Statistics table provides the test statistic, U statistic, as well as the asymptotic significance (2-tailed) p-value. From the table, it can be inferred that the sig. is low as the p-value is > 0.05 i.e. 0.837. Therefore, we can accept Null Hypothesis and conclude that there is no significant mean difference between the effect of drug A and drug B on cancer (during 6th week).
1.2.2 Wilcoxon Test
The Wilcoxon test or Wilcoxon signed-rank test is a non-parametric test used to compare two related samples or matched samples to assess whether their population mean ranks differ. It can be used as an alternative to the paired t-test when the population cannot be assumed to be normally distributed.
Example:
A researcher wants to know the difference in yield of a plant at the end of 1st month and 2nd month after treated with fertiliser.
Formula:
Where
WS = Smallest of absolute values of the sums
n = Number of pairs where difference is not zero
Conditions/Assumptions:
It is assumed that the two samples need to be dependent observations of the cases. The Wilcoxon sign test assess for differences between a before and after measurement, while accounting for individual differences in the baseline.
The data is assumed to be numerical in nature.
Data Required :
Numerical data for the variables to be tested Procedure using SPSS:
- From the menu choose: Analyze – nonparametric tests – legacy dialogs – 2 related Samples…
- Select the test pairs (variables with numerical scale).
- Select test type as Wilcoxon
- Then click Ok
Let us now practically see Wilcoxon Test using SPSS:
Example: A research team wants to test whether there is any difference between the condition of cancer patients at the end of 2nd week and at the end of 4th week after taking drugs.
Research Question: Is there a difference between the condition of cancer patients at the end of 2nd week and at the end of 4th week after taking drugs?
Hypotheses:
H0: There is no significant difference between the condition of cancer patients at the end of 2nd week and at the end of 4th week after taking drugs
Ha: There is significant difference between the condition of cancer patients at the end of 2nd week and at the end of 4th week after taking drugs Let us use the sample file Non-Parametric Tests.sav.
Go to Analyse – Nonparametric tests – Legacy Dialogs – 2 Related Samples
Select test pairs (variables 1 as condition after 2nd week and variable 2 as condition after 4th week) and select test type as Wilcoxon.
Then click Ok.
The result will be displayed in the output screen of SPSS which is as follows:
Output and Inference:
The Ranks table provides some interesting data on the comparison of the condition of cancer patients in 2nd week and condition of cancer patients in 4th week after taking drugs. We can see from the table that the 3 patients’ cancer condition had an improvement in the 2nd week. However, 17 patients’ cancer condition got deteriorated in the 4th week after taking drugs and 6 patients’ cancer condition remained the same both in the 2nd week and 4th week.
By examining the final Test Statistics table, we can discover whether these changes, due to drugs, led overall to a statistically significant difference in cancer condition of the patients. The Asymp. Sig. (2-tailed) value is found to be statistically significant i.e. the p-value < 0.05, which in this case is 0.005. Therefore, it can be concluded that there is significant difference between the condition of cancer patients at the end of 2nd week and at the end of 4th week after taking drugs. Therefore, Ho can be rejected.
1.2.3 Kruskal-Wallis Test
The Kruskal-Wallis test is a nonparametric test, and is used when the assumptions of one-way ANOVA are not met. Both the Kruskal-Wallis test and one-way ANOVA assess for significant differences on a continuous dependent variable by a categorical independent variable (with two or more groups). In the ANOVA, we assume that the dependent variable is normally distributed and there is approximately equal variance on the scores across groups. However, when using the Kruskal-Wallis Test, we do not have to make any of these assumptions. Therefore, the Kruskal-Wallis test can be used for both continuous and ordinal-level dependent variables. However, like most non-parametric tests, the Kruskal-Wallis Test is not as powerful as the ANOVA.
Example:
A research team wants to test whether there is any difference in yield of plant based on seed varieties.
Formula:
Where
n = Number of sample sizes in all groups
K = Number of samples
Ri = Sum of ranks in the ith group
ni = Sixe of the ith group
Conditions/Assumptions:
- Samples drawn from the population are random.
- We also assume that the observations are independent of each other.
- The measurement scale for the dependent variable should be at least ordinal.
Data Required:
Ordinal scale, ratio scale or interval scale variables can be used.
Procedure using SPSS:
- From the menu choose: Analyze – nonparametric tests – legacy dialogs – K independent Samples…
- Select the test variables (variables with numerical scale) and select the grouping variable
- Define the grouping variable
- Select test type as Kruskal-Wallis H
- Then click Ok
Let us now practice Kruskal-Wallis Test using SPSS:
Example: A research team wants to test whether there is any difference in cancer condition of patients after the 6th week based on their cancer stages.
We have considered 4 stages in our sample data (Initial, serious, critical, and hopeless)
Research Question: Is there any difference in cancer condition of patients after the 6th week based on their cancer stages?
Hypotheses:
H0: There is no significant difference in cancer condition of patients after 6 weeks based on cancer stages
Ha: There is significant difference in cancer condition of patients after 6 weeks based on cancer stages
Let us use the sample file Non-Parametric Tests.sav.
Go to Analyse – Nonparametric tests – Legacy Dialogs – K Independent Samples
Select test variable as condition after 6 weeks, grouping variable as stage, define the grouping range and select test type Kruskal-Wallis H.
Then click Ok.
The result will be displayed in the output screen of SPSS which is as follows:
Output and Inference:
The Ranks table shows that there is difference in mean ranks of cancer conditions of patients after 6th week based on their cancer stages. Mean rank of patients in 3rd stage is higher, and lowest in 1st stage.
A Kruskal-Wallis test showed that there was a statistically significant difference in cancer condition of patients after 6 weeks based on cancer stages. The chi-square value of 10.765 is found to be significant as the p-value < 0.05. Therefore, it is inferred that H0 is rejected which means that there is significant difference in cancer condition of patients after 6 weeks based on cancer stages.
The last test of discussion is
1.2.4 Friedman Test
Friedman’s test is a non-parametric test for finding differences in treatments across multiple attempts. Basically, it’s used in place of the 2 way ANOVA test when you don’t know the distribution of your data.
Example: A research wants to test if there is any difference in yield of various seed varieties.
Formula:
Where
c = Number of treatment levels (columns)
b = Number of blocks (rows)
Ri = Total ranks for a particular treatment level
j = Particular treatment level
Conditions/Assumptions:
We assume that the samples drawn from the population are random.
- Data should be ordinal (e.g. the Likert scale) or continuous.
- Data comes from a single group, measured on at least three different occasions.
- The sample was created with a random sampling method.
- Blocks are mutually independent (i.e. all of the pairs are independent – one doesn’t affect the other).
- Observations are ranked within blocks with no ties.
Data Required:
- Data should be ordinal (e.g. the Likert scale) or continuous. Procedure using SPSS:
- From the menu choose: Analyze – nonparametric tests – legacy dialogs – K Related Samples…
- Select the test variables
- Select test type as Kruskal-Wallis H
- Then click Ok
Let us practice Friedman Test using SPSS:
Example: A research team wants to test whether there is any difference in the 4 groups of cancer conditions i.e. initial condition, condition after 2 weeks, condition after 4 weeks, and condition after 6 weeks.
Research Question: Is there any difference in the 4 groups of cancer conditions i.e. initial condition, condition after 2 weeks, condition after 4 weeks, and condition after 6 weeks?
Hypotheses:
H0: There is no significant difference in the 4 groups of cancer conditions i.e. initial condition, condition after 2 weeks, condition after 4 weeks, and condition after 6 weeks
Ha: There is significant difference in the 4 groups of cancer conditions i.e. initial condition, condition after 2 weeks, condition after 4 weeks, and condition after 6 weeks
Let us use the sample file Non-Parametric Tests.sav.
Go to Analyse – Nonparametric tests – Legacy Dialogs – K Related Samples
Select test variables as initial condition, condition after 2 weeks, condition after 4 weeks, and condition after 6 weeks and select test type as Friedman.
Then click Ok.
The result will be displayed in the output screen of SPSS which is as follows:
Output and Inference:
The Ranks table shows the mean rank for each of the related groups. The Friedman test compares the mean ranks between the related groups and indicates how the groups differed. Mean rank of cancer at initial condition is 1.42, condition after 2 weeks is 2.40, condition after 4 weeks is 3.29 and condition after 6 weeks is 2.88. Based on the observed mean ranks of the groups it is inferred that there is difference among the groups.
The Test Statistics table informs the actual result of the Friedman test, and whether there is significant difference between the mean ranks of related groups. From our example, we can see that the p-value of chi-square statistics (34.95) was found to be statistically significant at the 5% level (p-value < 0.05) Therefore, H0 is rejected which means that there is significant difference in the 4 groups of cancer conditions i.e. initial condition, condition after 2 weeks, condition after 4 weeks, and condition after 6 weeks.
1.3 Conclusion
We have seen that non-parametric tests are applicable when data are not normally distributed.
However, non-parametric tests can be applied when data is both normal and non-normal.
There are certain non parametric test which examines difference in means such as Mann Whitney Test, Kruskal-Wallis Test, Wilcoxon Signed Rank Testand Friedman Test. Certain other non-parametric test is used to examine the difference in medians such as
Sign Test, McNemar Test, Cochran Q Test and others.
We have discussed the non-parametric tests based on means in this session.
Non-parametric test are advantageous for the reasons that
- Assumptions on Population is immaterial
- No much statistical knowledge required
- Test results are not affected by outliers (extreme values) as they rely on sign or mean ranks.
- Can be used for small or large sample
However, they lack importance among researchers for the following reasons:
- Results do not convey much about actual differences in a population as it is distribution –free.
- In case of large sample, computations are complicated.
- It is practically difficult to do modeling using multiple regressions.
Therefore, researchers shall be cautious before non-parametric tests as they have less statistical powerful when compared to parametric tests. They need to apply only when situation demands the use of non-parametric test.