9 Measures of Dispersion II with Skewness and Kurtosis
Dr. Harmanpreet Singh Kapoor
Learning Objectives
- Introduction
- Skewness
- Kurtosis
- Summary
- Suggested Readings
1. Learning Objectives
This module is a continuation of the module “Measures of Dispersion- I”. In this module, introduction to relative measures of dispersion and different measure of dispersion are discussed with examples. Properties of different measures are also discussed with merits and demerits. There are different relative measures like coefficient of range, coefficient of mean deviation, coefficient of variation etc. Through this module, one can easily understand about which method to use under what type of conditions. In this module, another important measure like Skewness and Kurtosis are also discussed. Moments are also introduced here for the derivation of these measures. Questions with detailed solutions are included to give an in-depth knowledge of the topic.
2. Introduction
As measures of dispersion are basically used to discuss about the variation, scatterness of the observations from the central tendency measure. We already covered the one category of measures of dispersion. In this module, second branch of measures of dispersion that is relative measures of dispersion will be discussed. Relative measure of dispersion is the ratio of absolute dispersion with its appropriate average i.e. to find out the relative measure of dispersion from an absolute measure then the quantity that is used in the denominator must be of same units that of absolute measure. Mostly it is considered as the average. The basic purpose of using these measures over absolute measures that we can compare different dataset which is not possible with absolute measure due to its dependency on the units. Hence the relative measures of dispersion have a great importance in statistics due to its property of independence of units.
Also in the modules “Central Tendency Measures I”, “Central Tendency Measures –II” and “Measures of Dispersion –I”, we discussed about different central tendency measures and measures of dispersions. The measures of central tendency and measures of dispersion both together discuss about the characteristics of the dataset but they are not able to demonstrate that to what extent the observations deviate from the central value i.e. whether equal number of observations are dispersed from the central tendency or whether the data is symmetrical about the mean or not. Therefore it does not answers how many observations have their value below the mean value or above the mean value. If one is interested to know about the concentration of the observations around a central tendency measure then it is essential to study two more measures. These are Skewness and Kurtosis. These two measure are considered as a supportive measures for better understanding the characteristics of the data.
Skewness is basically used to tell about the shape of the data i.e whether data is symmetric or skew symmetric. Skewness value help in determining the concentration of the observations below and above the average value. If the observations are concentrated in the centre then it is called symmetric. If the observations lie on either side of concentration of observations then there are two possibilities either more than average value or less than average value. Hence there are two types of skewness for asymmetric data i.e. positive or negative depending on the concentration of the observations. We will discuss about it later.
Another important measure is kurtosis which refer to the peakedness, flaterness of the curve that can be drawn from the dataset. It is basically used to study the concentration of the observations at the central part is whether more or less. If the concentration of observations at central part is very high then the curve is leptokurtic. On the other hand, if the concentration of the observations at the centre is less than the curve is platykurtic.
Hence central tendency measures, measures of dispersion, skewness and kurtosis represent a complete package to understand the data in depth. In other word, it completely describe the distribution of the data.
Relative Measures
(a) Coefficient of Range: This measure of dispersion is evaluated from the range of the data set. First range of the data set is calculated then
Ques 1. Find out the quartile deviation from the following data set related to the marks obtained by 15 students in statistics.
60, 67, 56, 78, 92, 55, 72, 54, 49, 59, 37, 84, 83, 69, 62
Ans: Arrange the observations in the ascending order
37, 49, 54, 55, 56, 59, 60, 62, 67,69, 72,78, 83, 84, 92
Compute the range that is 92-37 = 55
Coefficient of Range is = 55/ 129 = 42.63%.
Now compute the first quartile N+1th/4 term i.e. 55
Similarly third quartile as 12th term i.e. 78.
Quartile deviation is (78 – 53)/2 = 25/2 = 12.5.
Coefficient of quartile deviation = 25/131=19.08%
Hence coefficient of range and coefficient of quartile deviation is 42.63% and 19.08% respectively.
(c) Coefficient of Mean Deviation: This relative measure of dispersion is derived from the mean deviation. Mean deviation is computed as the absolute value of difference of observations from a central tendency i.e. mean, median and mode. Mostly, mean deviation is calculated by taking deviation of observations from mean and median. So in order to convert the mean deviation measure into independent of unit coefficient of mean deviation is computed.
Central tendency measure can be mean or median but it is divided by the measure that is used for the derivation of mean deviation or from which the mean deviation is derived.
Ques 2. Calculate the coefficient of mean deviation about mean for the following dataset.
(d) Coefficient of Standard Deviation
As standard deviation is evaluated in terms of the observations units and is considered as a absolute measure of dispersion. It is essential for the comparison purpose that the measure must be independent of units. The relative measure based on standard deviation that is independent of units is called coefficient of standard deviation. It is defined as
As coefficient of standard deviation would be given in fraction. So if we want to express our coefficient value in term of percentage by multiplying the coefficient by 100. Then this relative measure is called coefficient of variation (C.V.). It is defined as
The coefficient of variation is among the most popular relative measure of dispersion. It is basically used to compare the variability among two or more dataset. The dataset that has more value of coefficient of variation among two is said to be more variable and vice versa.
Ques 3. Calculate the coefficient of standard deviation and coefficient of variation for the following dataset.
Ans. Calculate the mean of the observations as shown in Table 3. Subtract the mean from the observations as shown in 4th column. Now take square of these observations as shown in the 5th column in Table 4. Column 6 shows the product of frequencies with values from column 5. Now take sum of the observations in column 5 and divide it by i.e. total frequency. Take square root of the value 125.22 and the standard deviation value is 11.190.
Now for the coefficient of standard deviation, the formula is
Hence, we discussed about different relative measures of dispersion. As these measures are derived from the absolute measures to make them independent of units. So one should know about absolute measures of dispersion in depth and the properties before applying these measures on the dataset.
Now, in the next session, we will discuss about the skewness and kurtosis.
3. Skewness
A skewness is basically to see tendency of the shape of the distribution. If the frequency distribution of the data is not equally distributed about the mean i.e. the frequency distribution is not symmetric then the term that is used to refer this situation is called skewness. Skewness has many synonyms like asymmetry and lack of symmetrical. Some authors give definitions of skewness as:
“When a series is not symmetrical, it is said to be asymmetrical or skewed” by Croxton and Cowden.
“Measure of skewness tell us the direction and the extent of skewness. In symmetrical distribution, the mean, median and mode are identical. The more the mean moves away from the mode, the larger the asymmetry or skewness” by Simpson and Kafka.
Hence skewness means that the data is not symmetrical about the mean. It is also be defined in term of normal distribution. Normal distribution is the distribution which has mean, median and mode all are equal. Hence the shape of the frequency of this distribution is like bell shape.
Figure 1
Link for the image
https://www.kullabs.com/classes/subjects/units/lessons/notes/note-detail/9958
Hence from the Figure 1, one can observe the shape of the frequency distribution.
A frequency distribution is said to be positive skewed when the mean (μ) > Median > Mode. In this case, the value of mean is more than the value of median and mode. Also median value is more than the value of mode.
A frequency distribution is said to be symmetric distribution when the mean (μ) = Median = Mode. In this case, the values of mean, median and mode all are same.
A frequency distribution is said to be negative skewed distribution when the mean (μ) < Median < Mode. In this case, the value of mode is more than the median and mean. Also the median is more than the mean.
Difference between skewness and measures of dispersion
As we discuss above different measures of dispersion, as dispersion measures are basically used to know about the variation in the dataset while skewness is concerned with the concentration of the observations around the central part of the data.
There are some important differences between measure of dispersion and skewness. These are:
(i) Skewness is basically concerned about the shape of the frequency distribution while measures of dispersion are more concerned about the amount of variations.
(ii) Skewness shows the nature of data about its central value while dispersion try to measure up to what extent the central tendency value represent the whole data set.
(iii) It is possible that the data that is more dispersed but has symmetric frequency distribution. Hence in that case one can say that symmetric does not mean that variation is less.
(iv) Measures of dispersion are based on first and second order moments while skewness is based on first, second and third order moments.
This is the reason that both skewness and measures of dispersion are studied together in literature. As both measures help in understanding the features of the frequency distribution in depth.
There are many methods available in the literature to find out the skewness. Some of them are discussed here.
Measures of skewness are used to detect whether the frequency distribution is symmetric or skew. As the values of these measures depend on the units of the observations. So there are two categories of measures of skewness. These are:
(a) Absolute measure of skewness
(b) Relative measure of skewness
Absolute measures of skewness
These measures of skewness are basically used to check the asymmetry of the data. Hence these measures assume that the data is not symmetric otherwise the values of these measures will be zero. Some of the measures of skewness are given below :
(a) Mean(μ) − Mode (Md)
(b) Mean (μ) – Median (M)
(c) Q3 + Q1 − 2M
Hence by using the above three measures one can check the skewness of the distribution. As we already discussed that in skewed distribution either the mean value is greater than median or mode or mode is greater than median and mean. Hence, these measure just give you an indication about the presence of skewness in the data expect when the value of these measure is zero in that case data is symmetric.
However, these measures of skewness have limited utilization in practice due to these reasons. These are:
(i) The first and the most important thing is that these measures are based on the units of the observations. Hence the values that are derived from these measures cannot be used for comparison purposes.
(ii) If absolute measure of skewness values of two data sets are same. It does not mean that data set are always same as it may be possible that there may be variation between distributions in terms of mean and dispersion.
Now to overcome these limitations, another measure that is independent from units is used is called relative measure of skewness or coefficient of skewness.
Relative Measure of Skewness
In these measure, the limitations of absolute measures have been removed by dividing the absolute measure by the suitable measure or quantity. The following are some coefficient of skewness which are commonly used.
(a) Karl Peason Coefficient of Skewness
where u3 and are the third order moment and standard deviation of the distribution.
Ques 4 Calculate the absolute measure of skewness and coefficient of skewness (a) Karl Pearson
(b) Bowley (c) Kelly (d) Based on moments of the marks of 15 students in statistics given below:
54, 63, 78, 59, 69, 74, 85, 46, 63, 51, 58, 73, 86, 88, 93
Ans Arrange the series in ascending order
46, 51, 54, 58, 59, 63, 63, 69, 73, 74, 78, 85, 86, 88, 93
Now compute arithmetic mean, median and mode from the series.
A.M. is 69.33, Median is 69, Mode is 63
Now 1 is 58 , 3 is 85 and 1 is 46+0.6 (51-46) = 49 and 9 is 88 + 0.4 (93-88) = 90
Absolute measures values are
Mean(μ) − Mode (Md) = 69.33 − 63 = 6.33
Mean (μ)– Median (M) = 69.33 − 69 = 0.33
3 + 1 − 2 = 85 + 58 − 2 ∗ 69 = 5
Hence all absolute measures show that the data is positive skewed.
Now calculate the relative measures of skewness. We need standard deviation of the data.
Let 1, 2, … , be the observations and n is the number of observations. S.D. is evaluated by using the formula
Hence from all the absolute and relative measure of skewness, we conclude that data is positively skewed. Although we also notice that different measures have different values for same dataset that is considered as the limitation of the measures of skewness.
4. Kurtosis
Kurtosis word comes from the Greek language with a meaning curved arching. Kurtosis is basically used to measure the peakedness of the frequency distribution. It is possible that two data set have same arithmetic mean, standard deviation and coefficient of skewnss but still one has different concentration of values near the mode value. So the distribution can have more peakedness than the usual normal distribution, less peakedness than the usual normal curve and equal to the normal distribution curve. So basically kurtosis is a measure that compare the peakedness of the curve relative to the peakedness of a normal curve. So kurtosis is basically used to measure the extent how the distribution is more peaked or less peaked than the normal distribution curve.
Many authors give the definitions of the kurtosis as
“A measure of kurtosis indicated the degree to which a curve of a frequency distribution is peaked or flat topped” by Croxton and Cowden
“Kurtosis is the degree of peakedness of a distribution, usually taken relative to a normal distribution” by Spiegel.
Figure 2
Courtesy for image is
whatilearned.wikia.com
So if the distribution curve (blue curve) is more peaked than the normal distribution as shown with blue curve in Figure 2, then the distribution is called Leptokurtic. If the distribution curve (red curve) is more flat than the normal distribution curve then the distribution is called Platykurtic. Hence, the black curve represent the normal curve is also known as Mesokurtic.
Measure of Kurtosis
Kurtosis is defined as
- Summary
In this module, first we introduced the relative measures of dispersion. There are different relative measures like coefficient of range, coefficient of mean deviation, coefficient of variation etc. Properties of different measures are also discussed with merits and demerits. Difference between dispersion and skewness are also discussed. Through this module, based on the merits and demerits one can easily understand about which method to use under what type of conditions. In this module, another important measure like Skewness and Kurtosis are discussed.
- Suggested Readings
Agresti, A. and B. Finlay, Statistical Methods for the Social Science, 3rd Edition, Prentice Hall, 1997.
Daniel, W. W. and C. L. Cross, C. L., Biostatistics: A Foundation for Analysis in the Health Sciences, 10th Edition, John Wiley & Sons, 2013.
Hogg, R. V., J. Mckean and A. Craig, Introduction to Mathematical Statistics, Macmillan Pub. Co. Inc., 1978.
Meyer, P. L., Introductory Probability and Statistical Applications, Oxford & IBH Pub, 1975.
Triola, M. F., Elementary Statistics, 13th Edition, Pearson, 2017.
Weiss, N. A., Introductory Statistics, 10th Edition, Pearson, 2017.
you can view video on Measures of Dispersion II with Skewness and Kurtosis |
One can refer to the following links for further understanding of the statistics terms.
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/glossary.pdf
http://www.stats.gla.ac.uk/steps/glossary/alphabet.html
http://www.reading.ac.uk/ssc/resources/Docs/Statistical_Glossary.pdf
https://stats.oecd.org/glossary/
http://www.statsoft.com/Textbook/Statistics-Glossary
https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm