30 Parametric and non – parametric test – meaning, importance, application, merits and demerits

M. Shanthi

epgp books

 

 

 

 

 

 

INTRODUCTION:

 

In today’s world, development in education relies on empirical research. Research is a scientific and systematic search for pertinent information on a specific topic. The term ‘research’ refers to the systematic method consisting of enunciating the problem, formulating a hypothesis, collecting the facts of data, analyzing the facts and reaching certain conclusions neither in the form of solutions towards the concerned problem or in certain generalizations for some theoretical formulation.

 

Statistics have played a significant role in the analysis and interpretation of data. Statistics is the science and practice of developing human knowledge through the use of empirical data expressed in quantitative form. A statistical research can analyze data from the entire population or only on a sample. The population is the set of all objects for which we want to infer information or relations. In this case, data set is complete and statistical research simply describes the situation without going on to any other objective and without using any statistical test.

 

When data are instead available only on a sample, a subset of the population, statistical research analyses whether information and relations found on the sample can be extended on the entire population from which the sample comes from or they are valid only for that particular sample choice.

 

Thus, a population is the entire set of individuals or objects the researcher is studying. A sample is a smaller group within the population that is studied to make inferences about the larger population. Measures describing the characteristics of a sample are called statistics. However, measures when they describe the characteristics of a population are called parameters.

 

OBJECTIVES

 

At the end of this module you will be able to,

  • Understand the meaning and difference between the concept of parametric and non – parametric test
  • Familiarize the advantages and disadvantages of parametric and non – parametric tests
  • Describe the various methods involved in parametric and non – parametric tests

Hypothesis is generally considered as a principal instrument, which enables us to make probability statements about population parameter(s). Hypothesis is a testable statement and it’s simply means a mere assumption to be proved or disproved. Basically there are two types of hypothesis, such as null and alternative hypothesis.

 

A null hypothesis is the statement about the parameters, which is usually a hypothesis of no difference and is denoted by Ho. On the other hand, any hypothesis, which is complementary to the null hypothesis, is called an alternative hypothesis, usually denoted by H1.

 

Statisticians have developed several tests of hypothesis which are also known as the tests of significance. When we talk about testing of hypothesis, we have parametric and nonparametric tests.

 

The statistical tests which are make assumptions about the parameters of the population are known as parametric tests. While, the alternative technique where no assumption about the distribution of population parameter is made, are known as non – parametric test.

 

The major assumptions of parametric tests are:

  • Normality of data that is the data has a normal distribution, which is a bell shape curve.
  • Homogeneity of variances, which means the data from multiple groups have the same variance.
  • Linearity; that means, there is a linear relationship among the variables and
  • Independence, which means the variables are independent of each other.

If the data are deviates from these assumptions, the parametric test could lead to incorrect conclusions.

 

The major difference between the parametric and non parametric tests lies in the assumptions about the data to be analysed. When the data are interval or ratio – scaled and the sample size is large, parametric statistical procedures are appropriate which are based on the assumption that the data in the study are drawn from populations with normal distributions.

 

On the other hand when the data are either ordinal or nominal – scaled, it is generally inappropriate to make the assumption that the sampling distribution is normal, in this case non – parametric statistical procedures are appropriate, thus non – parametric statistic are referred as distribution – free tests.

 

The basic assumptions for Nonparametric Statistics are,

  • The sample(s) are randomly selected and
  • If two or more samples are used, they must be independent of each other.

The major advantages of Nonparametric Methods are,

  • The first advantage is that they can be used to test population parameters, when the variable is not normally distributed.
  • Secondly, they can be used when we have nominal or ordinal scaled data.
  • In some cases, the computations are easier than the parametric tests.
  • The non parametric tests are easy to understand and
  • There are fewer assumptions that have to be met, and the assumptions are easier to verify.

However, there are certain disadvantages in nonparametric methods that they are less sensitive than parametric tests, nonparametric methods uses less information and are less efficient than the parametric tests.

 

Key Differences between Parametric and Nonparametric Tests:

 

A statistical test in which specific assumptions are made about the population parameter is known as the parametric test, wherein a statistical test which is used in the case of non – metric independent variables is called nonparametric test.

  • In the parametric test, the test statistic is based on distribution. On the other hand, the test statistic is arbitrary in the case of the nonparametric test.
  • It is usually assumed that the measurement of variables of interest is done on interval or ratio level in the parametric test. Whereas the variable of interest in nonparametric test are measured on nominal or ordinal scale.
  • In the parametric test, we can have complete information about the population parameters. Conversely, in the nonparametric test, there is no information about the population parameters.
  • The parametric tests are applicable only for variables, but nonparametric tests are applicable to both variables and attributes.
  • In the parametric test we normally use Pearson’s coefficient of correlation while in the nonparametric test we generally use spearman’s rank correlation for measuring the degree of association between two quantitative variables.

Important parametric tests and its application in statistical analysis:

 

The Important parametric tests are,

  • t –test
  • Z – test and
  • F- test,

All these tests are based on the assumptions of normality that is the source of data is considered to be normally distributed. Since, these tests require certain assumptions about the population parameters from which the samples are drawn, these tests are known as parametric tests.

 

The most important parametric test is t – test, a t -test is a statistical hypothesis test, it was introduced by W.S. Gossett under the pen name “Student”. It is also referred as the “Student t-test”. This test is used for judging the significant difference between the means of two samples in case of small sample(s) that is the number of observations is less than 30 and used when population variance is not known. This test statistic follows a t – distribution with n – 1 degrees of freedom (d.f.).

 

To get the critical value of t we have to refer the table for t – distribution against (n – 1) degrees of freedom and the specific level of significance. By comparing the calculated value of t with critical value, we can accept or reject the null hypothesis.

 

There are various t-tests and the most commonly applied tests are the one-sample and paired-sample t-tests. One-sample t-tests are used to compare a sample mean with the known population mean on the other hand; two-sample t-tests are used to compare either independent samples or dependent samples. While, in case two samples are related, we use paired t – test for judging the significance of the mean difference between the two related samples. Generally paired t test is used to compare the means before and after something is done to the samples.

 

Say for example, to determine the significant difference in blood pressure before and after administration of an experimental pressure substance we use paired t test.

 

Z – test is generally used for comparing the mean of a sample to some hypothesized mean for the population in case of large sample that is the sample size is more than 30 (n > 30), or when population variance is known. It is also used for judging the significance difference between means of two independent samples in case of large samples.

 

For example, while comparing the average salaries of men versus women in an organisation we may use z test.

 

The other important parametric test is F – test, it is used to compare the variance of the two – independent samples, that is for comparing one sample variance with another sample variance. It determines whether there is more variability in the scores of one sample than in the scores of another sample. Say for example we can compare the variability of bolt diameters from two different machines. In F test, the samples can be any size. This test is also used in the context of analysis of variance (ANOVA) for judging the significance of more than two sample means at same time. To obtain the F – statistic or F –ratio, we have to divide the larger sample variance by the smaller sample variance and to test the null hypothesis of no difference between the sample variances, a table of F – distribution is necessary.

 

Important non parametric tests:

 

Chi – square test is one of the important non parametric tests and is used for comparing a sample variance to a theoretical population variance, when categorical variables are used. This test is applicable in large number of problems. Such as,

  • For testing the independence of attributes
  • To test the goodness of fit
  • For testing of linkage in genetic problems
  • To make comparison of sample variance with population variance
  • For testing the homogeneity of variances and
  • For testing the homogeneity of correlation coefficient

The calculation of Chi – square test:

 

Calculation of chi – square statistic allows us to determine if the difference between the observed frequency distribution and the expected frequency distribution can be attributed to sampling variation.

 

The first step involved in the calculation of Chi – square test is,

 

  • Formulation of null and alternative hypothesis,
  • Determine the appropriate significance level that is basically 5 percent or 1 percent level of significance.
  • Calculate the expected frequencies on basis of observed frequencies.

 Make the statistical decision by comparing the calculated chi -square value and the critical value. If the calculated value of chi square is lesser than table value ofchi square at certain level of significance, it is considered to be a good one, but if the calculated value of chi – square is greater than its table value, it is usually not considered to be a good one.

 

Say for example, to anaylse website awareness about 100 sample students, we start with the null hypothesis (H0) suggesting that the number of students aware of the website will equal the number of students unaware of it. Thus, 50 people would be expected to respond yes, or be aware, and 50 would be expected to respond no, or be unaware. After the calculation chi square test, we would determine the significant level that is whether the chi – square test is appropriate at the 5 percent or 1 percent level of significance, based on that we will accept or reject the null hypothesis.

 

1.  Sign test.

 

The sign test is the simplest of all the nonparametric methods. Usually it is used to compare a single sample with some hypothesized value. The sign test is so called because it allocates a sign, either positive (+) or negative (-), to each observation according to whether it is greater or less than some hypothesized value, and considers whether this is substantially different from what we would expect by chance. If any observations are exactly equal to the hypothesized value they are ignored and dropped from the sample size.

 

The sign test is basically classified as one sample sign test and two sample sign test.

 

The one sample sign test is a very simple non – parametric test and it is applicable when we sample a continuous symmetrical population. On the other hand, two sample sign test applicable in problems where we deal with paired samples. In that, each pair of values can be replaced with a positive (+) sign if the first value of the first sample (say X) is greater than the first value of the second sample (say Y) and we take negative (-) sign if the first value of first sample (X) is less than the first value of second sample (Y). In case when the two values are equal, the concerning pair is discarded.

 

2. Wilcoxon signed rank test:

 

The sign test is extremely simple to perform. However, the major disadvantage is that it simply allocates a sign to each observation, according to whether it lies above or below some hypothesized value, and does not take the magnitude of the observation into account. An alternative that does account for the magnitude of the observations is the Wilcoxon signed rank test. This test is used to study when we have matched pairs such as, a study when we compare the output of two similar machines or where some subjects are studied in context of before and after experiment.

 

3. Mann–Whitney test:

 

A nonparametric alternative to the unpaired t-test is the Wilcoxon rank sum test, which is also known as the Mann–Whitney test. This is used when comparison is made between two independent groups with ordinal level variables.

 

4. Kruskal – Wallis test (H test):

 

The Kruskal – Wallis test is used to find out significant difference between sample means for two sample cases, firstly with ordinal level variables and secondly when the data are based on an interval or ratio scale and this test is basically used to test whether the independent random samples come from identical universe or not.

 

5.One sample Runs Test:

 

 One sample runs test is used to judge the randomness of a sample on the basis of the order in which the observations are taken.

 

6.Spearman’s Rank Correlation:

 

This test is applicable when the data are not available to use in numerical form for doing correlation analysis but the information is sufficient to rank the data as first, second and third and we work out the coefficient of rank correlation. In fact, the rank correlation coefficient is a measure of correlation that exists between the two sets of ranks observations and not on the numerical values of the data.

 

Summary:

 

Towards the end of this module, each one of you would have learnt the concept of hypothesis, types of hypotheses, meaning of parametric and non parametric test, basic assumptions, key difference between parametric and non parametric, advantages and disadvantages and important tests of parametric and non parametric methods. By learning the key difference between parametric and non parametric test you would have understand pros and cons of parametric and non parametric tests. Through the knowledge about the application of parametric and non parametric test you would learn to make a choice between parametric and the nonparametric test.

you can view video on Parametric and non – parametric test – meaning, importance, application, merits and demerits

Web links

  • http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/glossary.pdf
  • https://www.fraserhealth.ca/media/Introduction-to-Statistics-and-Quantitative-Research-Methods.pdf
  • http://www.wdbqschools.org/Downloads/13-1%20and%2013-2%20Notes.pdf
  • https://www.enotes.com/research-starters/introduction-nonparametric-methods
  • http://www2.sal.tohoku.ac.jp/ling/corpus3/3gloss.htm
  • https://brettscaife.net/statistics/introstat/07nonpara/lecture.html