28 Frequency Distribution
S. Gandhimathi
Introduction
Formation of frequency distribution is the process of forming frequency table. It involves the counting of items, the number of times a particular value is repeated which is called the frequency of that class. The formation of frequency distribution involves discrete and continuous frequency distribution. In the discrete frequency distribution, the value of X will be certain. For certain value of x value, the value of frequency is identified through tally marks. In the continuous frequency distribution, the value of X will be a range of values. For each value of x value, the frequency is identified through tally mark
In order to facilitate counting, prepare a column of “tallies”. In another column, place all possible values of variable from the lowest to the highest. Then put a bar (vertical line) opposite the particular value to which it relates. To facilitate counting, blocks of five bars are prepared and some space is left in between each block. We finally count the number of bars and get frequency.
Objectives
1.To constructunivariatevariate frequency distribution
2.To construct bi variate frequency distribution
1. Univariatevariate frequency distribution
Univariate frequency distribution is explained though the following examples.
Example: In a survey of 35 families in a village, the number of children per family was recorded and the following data obtained.
0 | 1 | 2 | 3 | 4 | 6 | 5 |
2 | 7 | 3 | 4 | 2 | 3 | 5 |
8 | 4 | 5 | 12 | 6 | 3 | 2 |
7 | 6 | 5 | 3 | 3 | 7 | 8 |
9 | 7 | 9 | 4 | 5 | 4 | 3 |
1 |
Represent the data in the form of a discrete frequency distribution.
Solution:
FREQUENCY DISTRIBUTION OF THE NUMBER OF CHILDREN
No. of children | Tallies | Frequency |
0 | I | 1 |
1 | I | 1 |
2 | IIII | 4 |
3 | IIII II | 7 |
4 | 5 | |
5 | IIII | 5 |
6 | IIII | 3 |
7 | III | 4 |
8 | IIII | 2 |
9 | II | 2 |
10 | II | 0 |
11 | — | 0 |
12 | — | 1 |
I | ||
Total 35 | ||
It is clear from the table that the number of children varied from 0 to 12. There were 1 family with no child, 5 families with 4 children each and only one family with 12 children.
FORMATION OF CONTINUOUS FREQUENCY DISTRIBUTION
This type of classification is most popular in practice. The following technical terms are important when a continuous frequency distribution is formed or data are classified according to class intervals
1.Class Limits: The lowest value of a class is the lower class limit and the upper value of a class is the upper class limit. For example, take the class 20-40. The lowest value of the class is 20 and the highest is 40. The two boundaries of class are known as the lower limit and the upper limit of the class. The lower limit of a class is the value below which there can be no item in the class. The upper limit of a class is the value above which no item can belong to that class. Of the class 70-89, 70 is the lower limit and 89 upper limit, ie., in this class there can be no value which is less than 70 or more than 89. Similarly, if we take the class 90-109, there can be no value in that class which is less than 90 or more than 109. The way in which class limits are stated depends upon the nature of the data.
2. Class intervals: The difference between the Upper and Lower Limit of a class is known as class interval of that class. For example, in the class 100-200, the class interval is 100 (i.e., 200 minus 100). An important decision while constructing a frequency distribution is about the width of the class interval, ie., whether it should be 10, 20, 50, 100, 500 etc., The decision would depend upon a number of factors such as the range in the data, i.e., the difference between the smallest and largest item, the details required and number of classes to be formed, etc., A simple formula to obtain the estimate the appropriate class interval, ie., I is –
For example, if the salary of 100 employees in a commercial undertaking varied between Rs.500 and Rs.5,500 and we want to form 10 classes, then the class interval would be
The starting class would be 500 – 1000, the next 1000 – 1500, and so on.
The question now is how to fix the number of classes, i.e., k. The number can be either fixed arbitrarily keeping in view the nature of problem under study or it can be decided with the formula:
k= 1 + 3.322 log N
where N = total number of observations
and log = logarithms of the number
Thus, if 10 observations are being studied, the number of classes shall be:
k = 1 + (3.322 x 1) = 4.322 or 4 and if 100 observations are being studied, the number of classes shall be:
k = 1 + (3.322×3) = 1 + 6.644 = 7.644 or 8.
It should be noted that since log is used in the formula, the number of classes will generally be between 4 and 20 – it cannot be less than 4 even if N is less than 10 .
Sturges suggested the following formula for determining the magnitude of class interval:
where range is the difference between the largest and smallest items.
For example, if in the above illustration we apply this formula the magnitude of class interval shall be
If we take a class interval of 650, the number of classes formed would be 5000/650, ie., 7.69 or 8.
It may be noted that the application of above formula may give a value involving fractions and odd intervals. For example, we have I = 6.541. In such cases suitable approximation should be made.
3. Class Frequency: The number of observations corresponding to a particular class is known as the frequency of that class or the class frequency. In the following example, the frequency of the class 1000-1100 is 50 which implies that there are 50 persons having income between Rs.1000 and Rs.1100. If we add together the frequencies of all individual classes, we obtain the total frequency. Thus, in the same problem, the total frequency of the six classes is 550 which means that in all there are 550 persons whose income has been studied.
Class Mid-point or Class Mark: It is the value lying half-way between the lower and upper class limits of a class interval. Mid-point of a class is ascertained as follows
Mid-point of a class = Upper limit of the class+Lower limit of the class 2
For the purpose of further calculation in statistical work the mid-point of each class is taken to represent that class.
There are two methods of classifying the data according to class-intervals, namely (i) ‘exclusive’ method and (ii) ‘inclusive’ method.
(i) ‘Exclusive Method: When the class intervals are so fixed that the upper limit of one class is the lower limit of the next class it is known as the exclusive method of classification. The following data are classified on this basis.
Income (Rs.) | No. of persons | |
1000 – 1100 | 50 | |
1100 – 1200 | 100 | |
1200 – 1300 | 200 | |
1300 – 1400 | 150 | |
1400 – 1500 | 40 | |
1500 – 1600 | 10 | |
Total | 550 | |
Thus, in the above example, there are 50 persons whose income is between Rs.1000 and Rs.1099.99. A person whose income is Rs.1100 would be included in the class 1100-1200. This method is widely followed in practice.
(ii) ‘Inclusive’ method: Under the ‘inclusive’ method of classification, the upper limit of one class is included in that class itself. The following example illustrates the method.
Income (Rs.) | No. of persons | |
1000 – 1099 | 50 | |
1100 – 1199 | 100 | |
1200 – 1299 | 200 | |
1300 – 1399 | 150 | |
1400 – 1499 | 40 | |
1500 – 1599 | 10 | |
Total | 550 | |
In the class 1000 – 1099 we include persons whose income is between Rs.1000 and Rs.1099. If the income of person is exactly Rs,1100 he is included in the next class. The above example makes it clear that there is no confusion here of the type we find under the exclusive method. We may have classes like 100 – 1099.5 or 1000 – 1099.99 and so on.
Considerations in the Construction of Frequency Distributions
It is difficult to lay down and hard and fast rules for constructing a frequency distribution, as much depends on the nature of the given data and the object of classification.
However, the following general considerations may be kept in mind for ensuring meaningful classification of data:
The number of classes should preferably be between 5 and 20. However, there is no rigidity about it. The classes can be more than 20 depending upon the total number of items in the series and the details required, but they should not be less than five because in that case the classification may not reveal the essential characteristics. The choice of number of classes basically depends upon:
(a) the number of figures to be classified
(b) the magnitude of the figures
(c) the details required and
(d) ease of calculation for further statistical work.
As far as possible one should avoid values of class-intervals, as 3, 7, 11, 26, 39, etc., Preferably, one should have class intervals of either five or multiples of 5 like 10, 20, 25, 100 etc.
The starting point, ie., the lower limit of the first class, should either be zero or 5 or multiple of 5. For example, if the lowest value of the data is 63 and we have taken a class interval of 10. Then the first class should be 60-70, instead of 63-73. Similarly, if the lowest value of the data is 75 to 80 rather than 76 to 81.
To ensure continuity and to get correct class interval we should adopt ‘exclusive’ method of classification. However, where ‘inclusive’ method has been adopted it is necessary to make an adjustment consists of finding the difference between the lower limit of the second class and the upper limit of the first class, dividing the difference by two, subtracting the value so obtained from all lower limits and adding this value to all upper limits. This can be expressed in the form of a formula as follows:
Correction factor = Lower Limit of the 2nd class−Upper limit of the 1st class 2
How the adjustment is made when data are given by inclusive method can be seen from the following examples:
Weekly wages | No. [1]f workers | Weekly wages | No. of workers |
800 – 899 | 5 | 1100 – 1199 | 8 |
900 – 999 | 10 | 1200 – 1299 | 2 |
1000 – 1099 | 15 |
To adjust the class limits, we take here the difference between 900 and 899, which is one. By dividing it by two we get ½ or 0.5. ‘This (0.5) is called the correction factor. Deduct 0.5 from the lower limits of all classes and add 0.5 to upper limits. The adjusted classes would then be as follows:
Weekly wages | No. of workers | Weekly wages | No. of workers | |
799.5 | – 899.5 | 5 | 1099 – 1199.5 | 8 |
899.5 | – 999.5 | 10 | 1199.5 – 1299.5 | 2 |
999.5 | – 1099.5 | 15 | ||
It should be noted that before adjustment the class interval was 99 but after adjustment, it is 100.
Open-end distribution presents problems of graphing and further analysis. When the frequency distribution is being employed as the only technique of presentation, open-end classes do not seriously reduce its usefulness as long as only a few items fall in these classes. However, use of the distribution for purposes of further mathematical computation is difficult because a mid-point value, which can be used to present the class, cannot be determined for an open-end class.
In any frequency distribution the size of items or the values are indicated on the left-hand side and the number of times the items in thise sizes or values have repeated are indicated by frequencies on the right-hand side corresponding to the respective size of value.
Example: Prepare a frequency table for the following data with width of each class interval as 10. Use exclusive method of classification:
57 | 44 | 80 | 75 | 00 | 18 | 45 | 14 | 04 | 64 |
72 | 51 | 69 | 34 | 22 | 83 | 70 | 20 | 57 | 28 |
96 | 56 | 50 | 47 | 10 | 34 | 61 | 66 | 80 | 46 |
22 | 10 | 84 | 50 | 47 | 73 | 42 | 33 | 48 | 65 |
Relative Frequency Distribution
At times it may be desirable to convert class frequencies to relative class frequencies to show the percentage of the total number of observations in each class. For example, we may be interested in knowing the percentage of employees earning less than Rs.2000 per month.
In order to convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of frequencies so that the relative frequencies would always total 1.
The relative frequency distribution would be as follows:
Bivariate or Two-way Frequency Distribution
In the previous few pages of this module, we described frequency distributions involving one variable only. Such frequency distributions are called univariate frequency distributions. In many situations simultaneous study of two variables becomes necessary. For example, we want to classify data relating to age of husbands and age of wives or data relating to marks in statistics and marks in accountancy or height and weight of students. The data so classified on the basis of two variables is called a bivariate frequency distribution. While preparing a bivariate frequency distribution, the same considerations of classification apply as for univariate frequency distribution, i.e., the values of each variable. If the data corresponding to one variable, say X, is grouped into m classes and the data corresponding to the other variable, say Y, is grouped into n classes then the bivariate table will consist of m x n cells. By going through the different pairs of the values (X, Y) of the variables and using tally marks we can find the frequency of each cell and thus from bivariate frequency distribution.
The frequency distribution of the values of the variable X together with their frequency distribution of the values of variable Y together with the total frequencies is known as the marginal frequency distribution of Y.
Conclusion
Let us summarise, the statistical inferences can not be drawn based on the raw data. Hence, the raw data are to be summarised. One way of summarizing the data is the formation of frequency distribution. There are two types of frequency distribution. One is the univariate frequency distribution and the other is the bivariate frequency distribution. The univariate frequency distribution is based on only one variable. The bi variate frequency distribution is based on the two variables. The above discussion may give an idea to group your statistical data and to construct frequency table.
you can view video on Frequency Distribution y |
WEB LINKS
- https://www.google.co.in/search?dcr=0&source=hp&ei=LukXWsaZC4LsvASwgrHYBw&q=formation+of+frequency+distribution+in+statistics&oq=FORMATION+OF+FREQUENCY+&gs_l
- o ↵