8 Measures of Dispersion: Mean Absolute Deviation, Standard Deviation, Variance, Coefficient of Variation

Prof. Pankaj Madan

 

Measures of Dispersion: Mean Absolute Deviation, Standard Deviation, Variance, Coefficient of Variation

 

Learning Objectives:

 

After the completion of this module the student will understand:

 

Ø  Range

 

Ø   Mean Absolute Deviation

 

Ø   Computation of Mean Deviation

 

Ø   Characteristics of mean deviation

 

Ø   Uses of mean deviation

 

Ø   Standard Deviation

 

Ø   Computation of Standard Deviation

 

Ø   Characteristics of Standard Deviation

 

Ø   Uses of Standard Deviation

 

Ø   Quartile deviation or Semi Inter Quartile range

 

Ø   Variance

 

Ø   Relative measures of dispersion

 

Ø   Coefficient of dispersion

 

Ø   Coefficient of variation

 

Ø   Standard error

 

Ø   Expression for the standard error of mean

 

Ø   Probable error

 

 

1.    Introduction

 

Any measure of central tendency or average has its own limitations and gives us an idea only about that central value of the set of observations around which all the observations have a tendency to lie, but it fails to give any idea about the way in which they are distributed. There can be a number of series each of which has the same mean but differs from others in respect of the pattern in which the observations are distributed. To follow this point Consider the following series.

Series A 9 9 9 9 9 9 9
Series B 6 7 8 9 10 11 12
Series C 1 2 4 5 11 13 27
Series D 3 15

 

In the above series, we observe that arithmetic mean of every series is 9, but the pattern in which the observations are distributed is different in different series. In series A the mean is 9 and all the observations are same. In series B also, the mean is 9 and the observations are scattered ranging from 6 to 12 but not very much scattered. In series C, the mean is the same value 9 but the observations are too much scattered ranging from 1 to 27. In series D there are only two observations the mean of which is 9.

 

From the above example it is quite obvious that for studying a series, a study of the extent of scattering of the observations of dispersion is also essential along with the study of the central tendency in order to throw more light on the nature of the series. The following are the different measures of dispersion which are in common use.

 

2.   Range

 

2.1. Definition:

 

The range is the simplest measure of dispersion. It is the difference between the highest and lowest terms of a series of observations.

 

2.2.Computation of Range:

 

Where, XH = Highest variate value

 

and XL = Lowest variate value

 

2.3.Merits and demerits of range

 

(i)           Its value usually increases with the increase in the size of the sample.

 

(ii)         It is usually unstable in repeated sampling experiments of the same size and large ones.

 

(iii)       It is a very rough measure of dispersion and is entirely unsuitable for precise and accurate studies.

 

(iv)       The only merit possessed by ‘Range’ are that it is (i) simple (ii) easy to understand and (iii) quickly calculated. It is often used in certain industrial work.

 

3.      Mean Deviation

 

3.1.  Definition:

 

If the deviations of all the observations from their mean are calculated, their algebraic sum will be zero. When this sum is always zero, it is impossible to get the average of these deviations. In order to overcome this difficulty, these deviations are added irrespective of plus or minus sign and then the average is calculated. The deviations without any plus or minus sign are known as absolute deviations. The mean of these absolute deviations is called the mean deviation. If the deviations are calculated from the mean, the measure of dispersion is called mean deviation about the mean. As a matter of fact mean deviation can be calculated from any average, and for that, the absolute deviations from that average will be calculated.

 

3.2.Computation of mean deviation:

Mean deviation about other averages:
M.D. about A= 1 ∑ ⃓ (   −   )⃓
M.D. about Md.=1 ∑ ⃓ (   −   . )⃓

M.D. about Mo. = 1 ∑ ⃓ (   −   . )⃓

3.3.Characteristics of mean deviation:

 

(i)  A notable characteristic of mean deviation is that it is the least when calculated about the median.

 

(ii) Standard deviation is not less than the mean deviation in a discrete series i.e. it is either equal to or greater than the M.D. about mean.

 

(iii) When an average other than the A.M. is calculated as a measure of central tendency, M.D. about that average is the only suitable measure of dispersion.

 

4.      Standard Deviation

 

4.1.Definition:

 

Calculation of standard deviation is also based on the deviations from the arithmetic mean. In thecase of mean deviation the difficulty, that the sum of the deviations from the arithmetic mean is always zero, is solved by taking these deviations irrespective of plus or minus signs. But here, that the difficulty is solved by squaring them and taking the square root of their average. It is thus defined by thefollowingexpression.

 

Standard Deviation (S.D.) = √∑(  −  )2 …………. (1)

Where,   X= An observation or variate value

µ  = Arithmetic mean of the population N= Number of given observations.

 

According to the expression given in (1), thepopulationmeanµ is required for finding the standard deviation (S.D.) of a given set of observations. Generally, µ is not known. Therefore it is replaced by X͞, which is the mean of the given set of observations, and then the S.D. of the given data is given by

Standard Deviation (S.D.) = √∑(  −  )2……………(2)

 

(X − X͞)2 = deviation from mean s-10

 

Here, it should be noted that formula (2) gives the S.D. of the given set of data which itself is assumed to be the population with µ=X͞. Therefore we shall this S.D. as the ‘population S.D.’ Thus,

population S.D. (=σ) = √ ∑( − )2 …………… (3)

In case of frequency distribution-

Population S.D. (=σ) = √ ∑ ( − )2 …………… (4)

Sample S.D.: In case, when the given set of data is not a population but is a sample drawn from

͞

a large population, the population mean µ is not known. Therefore, in its place, we use X which is the estimate ofµ obtained from the sample observations. The result is that we cannot calculate the population S.D. (σ), but, in its place, we calculate its estimate (S). We represent the estimates of population parameters, µ and σ, in the following way:

 

X͞= Estimate of (µ)

 

S= Estimate of (σ)

PThe best estimates (S) of the population S.D. (σ) is given by

S (sample S.D.) = √ ∑ ( − )2 …………… (5)

s-11

 

4.2.Computation of Standard Deviation

 

For computing S.D., in every case, we have to calculate the arithmetic mean, which increases the labor of calculation work. Therefore, to avoid it, theshort cut method should be used in which any value of the variate is chosen as the arbitrary mean and then the standard deviation is calculated by the following process:

 

Suppose, A is the arbitrary mean and d is the deviation of the variate value from A. i.e. d = X-A

we have, ∑  (   −   )2 =∑   2 − (∑   )2

Therefore, for this, we require the columns of d, fd, and fd2. In the column of d we shall find a factor equal to the width of the class interval “i” common to all the figures in that column. After taking out this factor as common, the columns now will be of d/I, fd/I and fd2/i2. With the help of these symbols, the values of ∑  (   − X͞)2 and S.D. will be calculated as given bellow.

∑     =  × ∑

 

2 = 2 × ∑ 2
∑ ( − )2 = 2 × {∑ 2 − (∑ )2} . . = × [√ 1 {∑ 2 − (∑ )2 }]

 

 

s-12

 

If we use the symbol D for d/I, the above expressions will be written as

 

∑      =  × ∑

 

2 = 2 × ∑ 2

 

∑ ( − )2 = 2 × {∑ 2 − (∑ )2} . . = × [√ 1 {∑ 2 − (∑ )2 }]

 

s-13

 

Example:

 

Calculation of S.D.

Class Frequency f Mid values X d= X-A d/I or D fD fD2
0-10 1 5 -20 -2 -2 4
10-20 3 15 -10 -1 -3 3
20-30 5 25 0 0 0 0
30-40 4 35 +10 +1 +4 4
40-50 2 45 +20 +2 +4 8
Total 15 +3 19

 

 

∑ ( − )2 = 2 × {∑ 2 − (∑ )2 }

 

= 102 × [9 − 32 ] 15

= 100 × [ 276 ] 15

= 1840

∑ ( − )2 ͞ Population S.D. =√ , (Here, µ= X)

 

σ = √184015 = 11.07

 

Sample S.D. = √∑ ( − )2 = √1840

−1 14

Or S= 11.46

 

4.3.Characteristics of Standard Deviation

Ø   It is rigidly defined.

 

Ø   Its computation is based on all the observations.

 

Ø   If all the variate values are the same, S.D.=0

 

Ø   S.D. is least affected by fluctuations of sampling.

 

Ø   It is affected by the change of scale, but not affected by the change of origin.

 

4.4.Uses of Standard Deviation

 

Ø  It is used in computing different statistical quantities like, regression coefficient, correlation coefficient,

etc. and in connection with business cycle analysis.

 

Ø   It is also used in testing the reliability of certain statistical measures.

 

s-15

 

4.5.Quartile Deviation or Semi inter quartile range

 

The measure of dispersion is expressed in terms of quartiles and known as quartile deviation or semi inter quartile range. −2

 

Where,Q1 = Lower quartile

 

Q3= Upper Quartile

 

It is not a measure of the deviation from any particular average. For symmetrical and moderately skew distributions the quartile deviation is usually two-third of the standard deviation.

Q.D.=23 × ( .   . )

 

5. Variance

 

Variance is the square of the standard deviation.

 

Variance = (S.D.)2

 

The variance of a population is generally represented by the symbol σ2and its unbiased estimate calculated from the sample, by the symbol S2.

 

6.      Relative Measures of Dispersion

 

The measures of dispersion, which we studied so far, are the absolute measures of dispersion, and are represented it’s the same units in which the observations are represented, e.g., gms., cm., meters, hectares, etc. When we have to compare the dispersions of two or more distributions, it will not be proper to compare their absolute measures of dispersions, because, the distributions or the data may differ from one another.

 

(i)     With respect to their averages

 

(ii)   With respect to their dispersions

 

(iii)With respect to their averages and dispersions both (iv) With respect to their units

 

Therefore, they will not be comparable. Under such circumstances, their comparison is possible with the help of relative measures of dispersion.

 

6.1.Coefficient of Dispersion

 

It is computed by the following expression:

 

Coefficient of Dispersion =

 

s-17

 

6.2.Coefficient of Variation (C.V.)

 

This is also a relative measure of dispersion. It is especially important on account of the widely used measures of central tendency and dispersion i.e., Arithmetic mean, and Standard Deviation. It is given by

C.V. = .. .. × 100

It is expressed in percentage and used to compare the variability in the two or more series. Lesser value of thecoefficient of variation indicates more consistency.

 

7.      Standard Error 7.1.Definition

 

The Standard deviation of the sampling distribution of a statistic (estimate) is known as the standard error of that statistic (estimate).

 

If we take all possible samples from the population of the same size and get a sampling distribution of means, it can be proved that the mean of this sampling distribution of means is the population mean and its standard deviation, the standard error of the mean.

 

As it is not possible to draw and study all possible samples, we have to get and we get the estimate of the standard error from a single sample. If S be the standard deviation of the sample of size N, the estimate of the standard error of mean is given by √ .

 

S-18

 

7.2.Expression for the standard error of mean

 

Let there be a sample of N observations, X1, X2, X3………XN which have been drawn at

 

random from a population, the variance of which is σ2.

͞         1

Now, Mean X   =      (  1 +  2 +  3 + ⋯ + Variance of mean

V(X͞)= 12 {  (  1 +                               2 +  3 + ⋯ +                             )}

=       12 [  (  1) +   (  2) + ⋯ +   (   )]

 

Since, V(X1) =V(X2) = V(X3)=………………….= V(XN)= σ2

͞               2           2

V(X)=                  2 =

S.E. of X͞= √

 

But since, in practice, σ is not known, it is replaced by its unbiased estimate S.

S.E. of X͞= √

 

8.      Probable Error

 

The quartile deviation of the sampling distribution of means is known as aProbable error

 

and is 0.67449times the standard error.

 

P.E. = 0.67449 (S.E.)

 

Three times the probable error is roughly twice the standard error. This measure of dispersion has no particular advantage and moreover involves a troublesome factor 0.67449. This is why it has gone out of use and has given place to standard error.

 

9.      Summary

 

This module provides an overview to students to understand the techniques that are used to measure the extent of variation or the deviation (also called thedegree of variation) of each value in the dataset from a measure of central tendency, usually the mean or median. Such statistical techniques are called measures of dispersion (or variation). A small dispersion among values in the data set indicates that data are clustered that data are clustered closely around the mean. The mean is therefore considered representative of the data, i.e. mean is reliable average. Conversely, a large dispersion among values in the data set indicates that the mean is not reliable, i.e. it is not representative of data. The symmetrical distribution of values in two or more sets of data may have same variation but differ greatly in terms of A.M. On the other hand, two or more sets of data may have the same A.M. values but differ in variation.

  1. Self-check exercise with solution

 

Q.1.Thefollowing data give the number of passengers traveling by airplane from one city to another in one week.

 

115, 122, 129, 113, 119, 124, 132, 120, 110, 116

 

Calculate the mean and standard deviation and determine the percentage of class that lie between (i) µ±σ (ii) µ±2σ and (iii) µ±3σ. What percentage of cases lie outside these limits.

 

Calculation of Mean and Standard Deviation

 

X X- X͞ (X- X͞)/2
115 -5 25
122 2 4
129 9 81
113 -7 49
119 -1 1
124 4 16
132 12 144
120 0 0
110 -10 100
116 -4 16

 

Solution:

  • µ = ∑ = 1200= 120 and σ2= ∑(  −X͞)2 = 436 =43.6

 

1010

The percentage of cases that lie between a given limit are as follows:

 

Interval Values within interval Percentage of Percentage
µ±σ = 120±6.60

= 113.4 and 126.6

113,  115,  116,  119,120, 122, 124 70% 70% 30%
µ±2σ = 120 ± 2

= 106.80 and 133.20

110,  113,  115,  116, 119,  120,  122,  124, 129, 132 100% 100% nil

 

Q.2. What do you understand by dispersion?

 

A.2. A measure of dispersion is designed to state numerically the extent to which individual observations vary on the average.

 

Q.3. What are the different measures of dispersion?

 

A.3. (1) Absolute measures: (i) Mean Deviation (ii) Standard Deviation (ii) Quartile Deviation, Range.

 

(2) Relative measures: (i) Coefficient of variation (ii) Coefficient of Mean deviation

 

Learn More:

  1. Sharma, J K (2014). In: Business Statistics, II eds., S Chand & Company, N Delhi.
  2. Chandel, S.R.S. (2006). In: A Handbook of Agricultural Statistics, Anchal Prakashan mandir, Kanpur
  3. Pal, N. and Sarkar, S. (2005). In: Statistics Concepts and Applications, Prentice Hall of India, New Delhi.
  4. Gupta, K.R. (2012). Practical Statistics, Atlantic Publishers & Distributors (P) Ltd., New Delhi.