4 Measures of Dispersion and Moments
Prof Surendra Singh
Measures of Dispersion and Moments
Introduction:
It has been stated several times in the earlier modules that the values of mean of two or more distributions may be same but the ‘spread’ of the values in a distribution may vary. It is also an important characteristic of distribution and useful for prediction fluctuations, variations and even for analyzing the extremes of variability of geographic phenomena. Statistical term of such variable phenomena is called ‘dispersion’ which is measured in two ways:
(i) Distance of a particular value from another observed value in a distribution, and
(ii) Differences of the observed values from the Central point in the distribution called ‘deviation’ that is statistically known as ‘mean deviation’ and ‘standard deviation’.
The simple measurement of dispersion is known as range when the difference between the maximum and minimum values of a variable is calculated. It is a crude form of dispersion. The coefficient of range, R, is simple and it is the ratio of difference of the highest, H, and lowest, L, values with their submission as given below:
R=(H-L)/(H+L).… … … (1)
It will always be the lesser than unity for a distribution in case values of the observations of distribution vary significantly (Fig.-1). Inter-quartile and inter-deciles range may also be calculated by the highest and lowest values of a sector of distribution.
Fig-1: Range of distribution of Rainfall at Cherrapunji
Measures of Deviation: Mean Deviation and Standard Deviation
Measurement of the degree of variability in a distribution is relatively used as the sum of the deviations of an observed values from a specific point of distribution. If dispersion is measured as average of the absolute value of the sum from mean value, called mean deviation. If the average of the absolute values of deviation from median is considered it is called median deviation. In many cases in location–allocation modeling, this term is used as p-median when solution of modes over space is forwarded by using distances of locations from median centroid.
Standard deviation (denoted as sigma, a Greek letter,σ) is the standardized form of mean – deviation and has several mathematical properties. That is why, it is more realistic and accurate measurement of variation and used frequently in detailed statistical analyses also.By definition, mean-deviation (MD) is simply ‘the mean of absolute values deviated from mean’ in the series, while standard deviation (SD) is based on sum of the squared values of mean – deviated series.
Thus formulas for mean deviations from mean and standard deviations are given as below:
MD = 1/n ∑│X – X│ ̅ 2 ½………(2)
SD =[1/n ∑(X – X) ] .̅ 2 ½……….(3)
Variance is the squared value of standard deviation (σ2), so it is written as
variance(σ2 ¯ 2 ] … ( 4)
Therefore, mean sumof the squares is called variance. It,in statistics, is second moment of a frequency. The first moment is first power of the mean deviation as ∑(X –X*)/n, while second moment is variance (as Equ- 4) and the third, fourth and so on moments, are expressed as
m̅ m ….. ( 5)
µ =(1/n ∑(X- X) ]…
Infact, moments of a series are calculated to understand the shape of the frequency distribution. There are two dimensions of variability of the same of distribution as
(i) the horizontal shape that is measured by Skewness based on the degree of µ3 and
(ii) the vertical tilt in the shape, measurement is known as kurtosis that is determined by calculating µ4 of the distribution.
Details of such measurement would be given separately in the proceeding discussion. Let us describe here the variance (second moment) of a variable which is in the form of grouped data with class frequencies.
Procedure for Simplification of Variance Calculation:
Let us write the variation (σ2) in equation– 4 for ‘the sum of squares’ the procedure of simplification follows
as σ2 = 1/n ∑(Xi – ̅X)2
= (∑Xi2/n)– 2(∑Xi)/n.̅X + ̅X2 )
= (∑Xi2/n)– 2̅X + ̅X2 )
= (∑Xi2/n)– ̅X )
= (∑Xi2/n) – (∑Xi/n)2 . … …(6)
n order to get sum of squares of deviation, ∑(Xi – ̅X)2, that is equal to (variancex n), the above equation can be written as
2 ̅2
= ∑Xi – nX )
= ∑Xi2– n∑Xi/n.∑Xi/n
= ∑Xi2– (∑Xi)2/n…… (6)
So the equation-4 for variance becomes
σ2= 1/n [ ∑Xi2- (∑Xi)2/n ]…..(7)
It is direct method of calculating variance (or standard deviation, 6) without computation of mean of a series.
σ2 ̅ 2 = 1/n [ ∑f(Xm–X)],…(8)
where Xm is the mid point of each class and f is its frequency
The above expression can be simplified for convenience of computation as
σ2= 1/n (∑fXm- nX ) .2 ¯2……(9)
Note that in sample size, number of observation is considered n-1 instead of n. Therefore, variance of sample distribution which has a size of n observation is
s2 = 1/n-1 [∑f(Xm–̅X )2 ] or
s2=1/n-1 (∑fXm2 – n̅X2).
Note that computation of variance in Excel software is based on sample distribution in which n-1 is considered as number of observation.
Short Cut Method of ComputingVariance:
It is often used when frequency distribution has large number of classes. This method saves considerable time as ∑fd2is used instead ∑fXm2to convert mid class value Xm into the step–deviations as d = (Xm–Xa)/h where Xais assumed mid value in Xm series and h is class interval. It converts classes into step deviation. Thus, variance formula given in equation (9) is to be written as
σ2 = h{(∑fd2/n) – (∑fd/n)2} and …(10)
σ = [h{(∑fd2/n) – (∑fd/n)2}]½. …(11)
Example:
Calculate mean deviation, variance and standard deviation of daily rainfall data of 173 days precipitated at Cherrapunji in 2004 (Table-1). Elaborate the parametric results of dispersion.
Data of theobserved daily rainfall of 211 days starting from 20th March to 16th November 2004 in which 38 days are dry days (remaining 173 days are wet) are considered to make frequency distribution and to calculate dispersion using different methods. MicrosoftExcel is used for accurate and fast computation of results. Variation in rainfall distribution is shown by arranging daily rainfall in its ascending order (Fig-2). Mean deviation, variance and standard deviation are computed through different methods and results are compared to show the merits/demerits of the methods.
Table-1: Collected daily Rainfall Data of 211 days (20 March to 18 November 2004) at Cherrapunji
Source: data were collected from AWS installed by the Department of Geography, NEHU, Shillong under DST sponsored Project on Runoff Rainfall Relationship in Humid areas of Meghalaya Plateau.
Fig.-2: Variation in Daily Rainfall at Cherrapunji shown by arranging it in its ascending order
Results :
(a) Calculations of Mean Deviation (MD),Standard Deviation (SD) and Variance of given Ungrouped rainfall data(Table-1)using equations – 2, 3and 4:
1 | Total rainfall | =14,720.6 mm |
2 | Total number of wet days (observations=n) | =173 |
3 | Arithmetic mean | =85.090 mm |
4 | Mean Deviation | =87.895 mm |
5 | Standard Deviation (σ) | =129.122 mm |
6 | Variance (σ2) | =16672.550 mm2 |
(b) Calculation of parameters of dispersion through converting same data set into frequency distribution:
Table-2: – Frequency Distribution of Daily Rainfall Data of Cherrapunji Station(Total 173 days of 2004)following the Equi-interval Classesand Computation of Parameters of Dispersion using Mid point Method
Results:
1. | Total rainfall | =14,720.6 mm |
3. | Total observations /frequency (∑f = n) | = 173 |
4. | Mean daily rainfall(X*) | = 105.2023 mm |
5. | Standard Deviation(σ) | =160.382 mm |
6. | Variance (σ2) | =25722.57 mm2 |
(c) Calculation of parameters of dispersion through the use of short cut deviation method of the same frequency data of Daily Rainfall:
Table – 3: Computation of Parameters of Dispersion through short cut (deviation) method of the same frequency data
Interpretation:
A significant variation in the distribution of daily rainfall precipitated at Cherrapunji during the year 2004 is observed. It ranges from a minimum daily rainfall of 0.2 mm to a maximum of 793 mm. Due to a vast variation in rainfall distribution, a total nine classes are made to convert ungrouped data to frequency distribution taking into account a class interval of 25mm of daily rainfall, considering only the rainy season which started from 20thMarch in the year 2004. A total rainfall of 14720.6mm was observed in the 173 days. Undoubtedly, in such heavy rainfall area in the world, the higher degree of variation in rainfall distribution is obvious as it has been experienced in other humid areas also. Applying three methods on the same set of rainfall data (given in Table-1), It is found that there has been significant differences in the figures of parameters of dispersion. The result inferred from ungrouped data is the most accurate, because of the consideration of individual value of each observation (i.e. day) in the calculation. Standard deviation, in this case of application of simple method for ungrouped datausing Variance
= 1/n ∑(Xi – X̅)2, is 129.122 mm.
When one calculates mean and standard deviation from the frequency formula, there are significant differences in the results as SD is 160.38mm that is more than 30.0 mm higher than it is calculated by using simple method of ungrouped data. Even mean daily rainfall also deviates significantly as it is 105.2 mm by using frequency formula instead of 85.1 mm. However, frequency formula and short cut method provide the same results. Thus, this discussion leads us towards the formula fallacy and its demerits.
Uses of the parameters of Dispersion
As standard deviation is a commonly and universally used measure of variation, it is used in a variety of ways in pursuing analysis in different researches. The common uses are forwarded here in the following paragraphs.
(a) Standard deviation is an absolute value of variation measurement of a distribution. If there are two or more distributions to compare their variations, the measures of relative dispersion which is SD–dependent, are used. It is called coefficient of variation (CV) calculated in unitary term or in term of percent. It is simply defined as SD per unit of mean, ̅X, of a distribution, written as
CV (%) = (σ/X̅) 100………… (13)
With the help of CV, variations of different distributions are comparable, while SDs will not give the correct comparable figure of variability.
b. Standard deviation and mean are used to standardize the value of a distribution
Z-transformation of distribution is mean and SD dependent because
Z=[ (X-X̅)/σ] . … … (14)
Z- Transformed distribution is comparable to other classification also. It standardizes the statistically based classification of observations. Therefore, it is used for mapping the distribution and real comparison of geographical attributes.
(c) To study probability of events occurring in a large distribution, it is used to normalize and to predict the value at specific observations. For example, if mean and SD are given and the distribution is considered as normal then 68% occurrences of an event is probably calculated within the range of (mean± 1.0 σ) of a distribution. So it is used to develop significant test for regression analysis and various other statistical techniques like skewness and kurtosis.
(d) Most important technique, namely, ‘the analysis of variance’ has been developed taking SD as basis, which would be described separately in different module.
(e) It is important to note that the variations of different sets of data containing different number of observations may be added together to analyze the combined variation of different attributes. For example, there are two sets of data having mean X̅1 and X̅2, standard deviationσ1 and σ2 and observations as n1 and n2, the combined standard deviation must be
σ12= [{n1(σ12 + d12) + n2(σ22 + d22) }/( n1 + n2)]1/2 ,… (15)
where d1=(X̅1-X**) and d2=(X̅2-X**) with X** as the combined mean that equals [(n1X̅1 + n2X̅2)/(n1+n2)]
Measures of Moments (Skewness and Kurtosis):
Measures of Skewness:
As described earlier, statistical moment is the mean of the first power of the deviation that is the spacing of the size class or individual in the distribution from the mean. First order deviation is simply the mean deviation due to its unit power function. It is first moment above the mean.Likewise, higher moments are defined by raising the power upto r. Note that second moment is variance, while horizontal moment above mean (known as skewness) and vertical moment(kurtosis) are related to 3rd and 4th order of moments as
µ1=∑│X-X̅│/n……….. (16)
µ2 =∑(X-X̅)2/n (that is Variance)……… (17)
µ3=∑(X-X̅)3/n……… (18)
µ4=∑(X-X̅)4/n………….. (19)
The skewness can be calculated from frequency distribution by using the three time difference between mean and median per unit of SD, as
Skewness = ± 3(mean – median)/SD. ……(20)
Infact, in skewed distribution, its maximum value can rarely be approached to ±3.Of course, if mean is higher than median, skewness will be positive because the mid value is higher than the value of mid-position. Inverse is the case for the negative skewness (Fig.-3A).
Further, skewness coefficient as Pearson termed it β1 is as
β1= (µ32/µ23). ……(21)
Measures of Kurtosis:
The Kurtosis coefficient , β2 is
β2 = (µ4/ σ4). ………….(22)
It is the ratio of 4th moment with the fourth power of standard deviation of a distribution. As median coincides with mean, the Kurtosis, K, is zero. Inversely, three is maximum range of β2. So K= (β2-3) which has three conditions:
1.If β2>3, K>0 then curve is more peaked than perfect symmetrical curve called leptokurtic curve.
2.If β2<3, K<0, the curve is platecurtic; curve is centrally narrow, lower peak than symmetrical curve.
3. If β2= 3, K= 0, curve coincides normal distribution; no abnormality in distribution vertically (Fig.-3B).
Fig.- 3: A) Skewed distribution (Blue line –positive and red shows negative skewed) B) Kurtosis as blue line shows leptokurtic and red platekurtic in distribution.
you can view video on Measures of Dispersion and Moments |
References
- Pal, S.K.(1998): Statistics for Geoscientists Techniques And Applications, Concept Publishing Co. , New Delhi.
- Aslam Mehmood(1997): Statistical methods in Geographical studies, Rajesh Publications, New Delhi.
- Alvi, Zamir(1995): Statistical Geography, Rawal Publications, New Delhi.