8 Measures of Dispersion I

Dr. Harmanpreet Singh Kapoor

epgp books

 

 

Learning Objectives

  • Introduction.
  • Summary
  • Suggested Readings

    1. Learning Objectives

 

In this module, a complete introduction to measures of dispersion and different measures of dispersion are discussed with examples. Properties of different measures are also discussed with their merits and demerits. Through this module, one can easily understand about which method to use under what type of conditions. The topic of measures of dispersion is covered in two modules. This module will cover the absolute measures of dispersions. Other topic of relative measures will be covered in the module “Measures of Dispersion- II with Skewness and Kurtosis”. Questions with answers are included to give an in-depth knowledge of the topic.

 

2.    Introduction

 

In this module, an important measure that is used to see how each observation varies from the mean value. When one is interested to know about the variation of observations from the mean value. The measures that are used to detect this variations in the observation are called Measures of Dispersions.

 

In the module “Central Tendency Measures –I” and “Central Tendency Measures –II” we discussed about different measures of central tendency. We discussed about the mathematical averages and positional averages. The basic purpose of these measures is to find out a single value that represent the whole dataset. Also concentration of observations about the central part of the data were observed. But these measure will not take into account of the fact that whether two or more different dataset have same mean values but it does not mean that the observations are same.

 

For example, we have two dataset given below that have same mean value but the observations are not same.

 

Dataset I: 8, 9, 10, 11, 13, 15

Dataset II: 3, 4, 6, 14, 16, 23

 

Hence the total sum of both datasets are same i.e. 66 and the mean is also same i.e. 11. Now the question is how we can say that mean is the representation of the data? The answer is central tendency measure gives you just an idea about the concentration of the observation around a central value but it does not say anything about the variation of these observations from the central part. In dataset I, the values are very close to each other and also to the mean value so we can say that 11 is the correct representation of the dataset but in dataset II, observations are very scattered and also very far away from the mean value. Hence, one can see that 11as a mean value is not represent the whole data set in a correct manner.

 

Hence, one should not rely just on the central tendency measure to take any opinion about the observations but also one should also think about the dispersion or variation among the observation. In this module, different type of measure will be discussed that consider variation of observation from different values like difference between highest and smallest values, absolute difference of observations from a particular quantity etc. but mainly the variation of observations from its mean is considered the most in the literature.

 

Definitions

 

Some authors have defined the measures of dispersion as:

“Dispersion is the measure of variation of the items” by A.L. Bowley.

“Dispersion or spread is the degree of the scatter or the variation of the variable about a central value” by B.C. Brooks and W. F. L. Dicks.

 

In literature, the dispersion is also considered as synonym for heterogeneous in the data. As heterogeneous is basically used to understand the extent of variations among observations. Dispersion can only be zero if all the observations have same values. The dispersion is more when the difference between the observations is very large. So one can say that if the variation is small like in data set I then it is considered as insignificant but if the variation is large as shown from the data set II then it is considered as significant.

 

Also dispersion is termed as second ordered means as central tendency measures is the first ordered means where one can see the tendency of the values around the middle of the data. As measures of dispersion are basically used to discuss about the variation, scatterness of the observations from the central tendency measure. The measures of central tendency and measures of dispersion both together discuss about the characteristics of the data set but they do not able to demonstrate that, to what extent the observations deviate from the central value i.e. whether equal number of observations are dispersed from the central tendency or whether the data is symmetrical about the mean or not. It also give us an idea of how many observations have their value below the mean value or above the mean value. If one is interested to know about the concentration of the observations around a central tendency measure then it is essential to study two more measures. These are Skewness and Kurtosis. These two measure are considered as a supportive measures for better understanding the characteristics of the data. Skewness and Kurtosis is discussed in the module “Measure of Dipersion –II with Skewness and Kurtosis”.

 

Importance of measures of dispersions

(a)   The main motive behind using these measures to check the authenticity of the central tendency measures. It is used to see whether the value of central tendency measure are reliable or not.

(b)   The second important thing about the measures of central tendency is that these are also used to compare the two datasets through relative values.

(c)    This measure is also useful in identifying the reasons behind the variations in order to control them but these are not helpful in give the exact reason behind variations in the observations. For example, in electronic industry quality of an item cannot be judged only by through whether an item is produced under defined limits but also through the variations between the characteristics of observations.

(d)   Measures of dispersions are also used to help for further analysis of the data like correlation, regression, testing of hypothesis and ANOVA etc

 

As there are many measures of dispersion, the ideal measure prevails the following characteristics:

(a)   The values of these measures should be rigid.

(b)   It should be calculated on all the observations.

(c)   The method should be easily calculated and understood by non-mathematical background person.

(d)   The measure should be used for further algebraic treatment.

(e)   The measure should not be affected by extreme values and fluctuation of the observations.

 

The measures of dispersion are further categorized into two types. These are shown in the following flowchart

 

From the Figure 1, the measure of dispersion is categorized into two measures: Absolute Measures and Relative Measures. As dispersion measure is used to detect the deviation of the observations from the central tendency. If the measures of dispersion express the dispersion of the observations in the original units then the measures are called absolute measures of dispersion. One can only compare the variations between two series if both series are in same units otherwise comparison is not meaningless. To overcome this problem, those measures that give the dispersion values in terms of ratio and percentage are called relative measures.

 

A relative measure of dispersion is the ratio of absolute measure of dispersion with its appropriate average. These measure are independent of units and these are termed as coefficient of dispersion. One important thing, while the calculation of relative measures, is that the units of absolute measure and the appropriate average must be same.

 

In this module, absolute measures will be discussed with examples. Relative measures will be discussed in the module “Measures of Dispersion II with Skewness and Kurtosis”.

 

Range

 

Range is considered as the simplest absolute measure of dispersion. It is evaluated by just taking the difference between the maximum value and minimum value in the data set.

 

Range = Maximum value- Minimum value

 

Ques 1: Calculate the range of the following dataset:

 

23, 45, 56, 52, 64, 35, 42, 51, 76, 65.

 

Ans: Maximum value is 76 and minimum value is 23. Hence

 

Range = 76-23 = 53.

 

Merits of Range

 

These are the merits of the range measure

(i)  Range is the simplest among all the absolute measures of dispersion.

(ii) Range is defined in a rigid manner and easily calculated. Due to this property it is widely used in the industry as a quality control tool.

(iii) Range is computed within second hence one can get a complete picture of variability in the dataset in a short time.

 

Demerits of Range

 

There are few demerits of range also. These are:

(i) As range is evaluated just from the difference of maximum and minimum value of the dataset. Hence it does not depend on all the observations. So it is possible that range will be same for two dataset that have same maximum and minimum value but different intermediate values that make no sense.

(ii) As we already discuss that range depend on just two values and both of these values are extreme values. Hence range is affected a lot by extreme values.

(iii) Also range will not take into account the shape of the distribution. It may be possible that range will be same for the dataset whether it is symmetric or skew symmetric.

(iv) Range is affected by the change in the observations.

(v) Range cannot be evaluated from the grouped frequency distribution with open end class.

 

Quartile Deviation

 

This measure of dispersion is related with range and it removes the drawbacks of range upto some extent. It is defined as the difference between third quartile and first quartile divided by 2. In other words,

 

 

Merits of Quartile Deviation

 

There are some merits of quartile deviation. These are

(i)   Quartile deviation is easy to understand like range.

(ii)  As quartile deviation is based on first and third quarter hence it removes the limitations of the range.

(iii) As quartile deviation consider only first quarter and third quarter value. These values are derived by neglecting first and last 25% of the observations respectively. Hence it is not affected by the extreme values.

(iv) Quartile deviation is more useful to consider in case of skewed symmetrical data.

(v)  Quartile deviation can be evaluated for open end class interval as it is not depend on the extreme values.

 

Demerits of Quartile deviation

 

There are some demerits of quartile deviation. These are

(i) Quartile deviation is evaluated on just half of the observations as it neglect first and last 25% of the observations. Hence it is not considered as a fully appropriate measure for detecting variation among observations.

(ii) Quartile measure is based on quartiles which are positional averages. It represents the deviation of observations among quartiles that is just a distance on a scale.

(iii) Quartile deviation is very much influenced by the change in the observations. A change in single observations lead to change in the value of quartile deviation.

(iv) Quartile deviation just give an idea about the deviation like range. So it is not the best measure for measuring dispersion.

 

Mean deviation

 

Mean deviation is computed as the absolute value of difference of observations from a central tendency i.e. mean, median and mode. Mostly, mean deviation is calculated by taking deviation of observations from mean and median.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ques 3. Calculate the mean deviation about mean for the following observations.

30, 40, 50, 55, 60, 65, 70, 80, 90, 100

 

Ans First find out the A.M. of the observations. It is 64. Now take absolute deviation of each observation from 64. The values are 34, 24, 14, 9, 4, 1, 6, 16, 26. Take the sum of these observation and divide it by 10 i.e. number of observations.

 

The answer is 134/10=13.4.

 

Ques 4. Calculate the mean deviation about mean for the following dataset.

 

    Merits and Demerits of Mean deviation

 

Merits

 

There are some merits of mean deviation measure. These are

(i)   Mean deviation is defined in a rigid manner.

(ii)  Mean deviation is an easy method for a non-mathematical background person.

(iii) Mean deviation is based on all the observations so it is better than range and quartile deviation measures.

(iv) Mean deviation is less influenced by the extreme values than range, standard deviation.

(v)  Mean deviation is appropriate for comparison purpose because it is derived from the deviation of observations from central values i.e. mean, median or mode.

(vi) Mean deviation is basically an average of absolute deviation from central value. Hence it remove biasness to some extent that occur due to observations while calculating and provide an accurate value for dispersion.

    Demerits

 

There are few demerits of mean deviation measure. These are

(i)  The major drawback of mean deviation measure is that while computing it we ignore the signs of deviation and take absolute of it. So one cannot get any information about the concentration of the observations above the mean or below the mean. So this measure is not useful for further mathematical treatment.

(ii)  Mean deviation cannot be computed for open end classes distributions.

(iii) Mean deviation is not an accurate measure of dispersion for highly skewed data.

 

Although mean deviation has some limitations. Due to its simplicity and easy to understand features it has great importance in accountancy, economics, forecasting and business sectors.

 

Standard Deviation

 

Karl Pearson in 1823 first introduced this measure and after that it is the most widely used measure of dispersion till date. Standard deviation is the measure that prevails all the features that other measures lack of. So basically it is considered as an ideal measure of deviation. It is also know as square root of variance, root mean square deviation etc and denoted by the Greek letter (sigma). Standard deviation value is high or low when there is more or less variation among observations respectively.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

    Ques 8. Calculate the standard deviation for the following dataset.

 

 

 

Merits and Demerits of Standard deviation

 

Merits

As standard deviation is considered as the best absolute measure of dispersion. These are the merits of standard deviation measure

(i)   Standard deviation is rigidly defined measure.

(ii)  Standard deviation is based on all the observations.

(iii) Standard deviation is evaluated by squaring of deviation of observations from the mean. This makes the standard deviation for further mathematical treatment.

(iv) Standard deviation has less influenced by the change in the sample observations.

(v)  It is also possible to compute the combined standard deviation of two or more dataset from individual standard deviation.

(vi)  Standard deviation is used for further statistical analysis like skewness and correlation.

    Demerits

 

Although standard deviation is considered as the best absolute measure of dispersion but it is not free from demerits. These are

(i) Standard deviation is very difficult to understand for a non-mathematical background person.

(ii) Standard deviation gives more weight to extreme values and less weight to values close to mean. As while squaring the deviation this bias will be increased.

 

Although, standard deviation has some demerits or limitations but this is the widely used method for the absolute measure of dispersion.

  1. Summary

    In this module, one category of measures of dispersion i.e. absolute measures of dispersion are discussed in detail. These are range, quartile deviation, mean deviation and standard deviation. We first give an introduction about these measures. Questions and answers are also included for better understanding of the topic. Merits and demerits of each measure are discussed for better understanding and comparison purpose. Other branch of dispersion measure that is relative measure of dispersion will be discussed in the module “Measures of Dispersion- II with Skewness and Kurtosis”.

  1. Suggested Readings

Agresti, A. and B. Finlay, Statistical Methods for the Social Science, 3rd Edition, Prentice Hall, 1997.

 

Daniel, W. W. and C. L. Cross, C. L., Biostatistics: A Foundation for Analysis in the Health Sciences, 10th Edition, John Wiley & Sons, 2013.

 

Hogg, R. V., J. Mckean and A. Craig, Introduction to Mathematical Statistics, Macmillan Pub. Co. Inc., 1978.

 

Meyer, P. L., Introductory Probability and Statistical Applications, Oxford & IBH Pub, 1975.

 

Triola, M. F., Elementary Statistics, 13th  Edition, Pearson, 2017.

 

Weiss, N. A., Introductory Statistics, 10th Edition, Pearson, 2017.

you can view video on Measures of Dispersion I

One can refer to the following links for further understanding of the statistics terms.

 

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/glossary.pdf

 

http://www.stats.gla.ac.uk/steps/glossary/alphabet.html

 

http://www.reading.ac.uk/ssc/resources/Docs/Statistical_Glossary.pdf

 

https://stats.oecd.org/glossary/

 

http://www.statsoft.com/Textbook/Statistics-Glossary

 

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm

 

https://stats.oecd.org/glossary/alpha.asp?Let=A