16 Application of Chi Square Analysis in Geographical Studies

Dr. Madhushree Das

epgp books

 

 

 

Introduction

 

Chi-square analysis is non-parametric statistical test ( 2) with no rigid assumptions. It is commonly used for testing independence and goodness-of-fitto a given frequency distribution. Testing for goodness-of-fit determines if an observed frequency distribution matches to a theoretically generatedone.Chi –Squared ( 2) test is used mainly to test the significance of a Null Hypothesis under which some frequencies are generated against a given set of frequencies.

 

The basis of a chi-square test is to test the validity of an assumption regarding the process which generates a given distribution theoretically. If the theory is valid, it will generate the frequency distribution very close to the given distributionandthe theory is validated. However, if the generated values are not found to be close to the given frequency distribution, the theory is rejected (Hypothesis is rejected). Main question which is resolved with the help of Chai-square test is, how to test the closeness of generated or expected frequencies to the given frequencies.

 

Chai-square statistics is defined as the sum of the values obtained by dividing the square of the difference between observed and expected frequencies by the expected frequencies for each class. It has been shown that this statistics follows a specific distribution known as Chai-square distribution with some degrees of freedom. The degrees of freedom will be (n-1) if there are n number of classes. However, if we are dealing a problem related to a contingency table of “r” rows and “ c ‘ columns, the degrees of freedom will be (r-1)(c-1).A table of Chai-square values is generated for degrees of freedom starting with 1 up to 100 +.The values are tabulated for different levels of significance. A sample of the Chi-square table for d.f. up to 25 for 5 %, 1% and 0.1 % level of significanceis given below in Table 1.

 

If a computed value of Chai-square is found to be less than the corresponding value given in the table, it is said to be statistically insignificant. This will mean that the theory or the hypothesis under which the expected frequencies have been generated may be accepted and we accept the hypothesis which is known as the null hypothesis. On the other hand, if the calculated value of the chai-square is found to be more than the corresponding tabulated value for some level of significance, the null hypothesis is rejected, i.e. the assumption under which the expected frequencies are generated will not hold good. In such case the null hypothesis is rejected and we accept the alternative hypothesis.

 

To summarize the above we can say the following:

 

The basis of Chi-Square analysis is the comparison of frequencies expected under precisely defined conditions with frequencies observed in the actual pattern under investigation. It is represented in the following formula:

=   (  −   )2

 

 

Where, 2 = Chi-Square value, Oi = Observed Data frequencies and Ei= Expected Values of the distribution.After calculating the Chi-Square value,it is compared with the table values of Chi-Square to examine the validity of the formulated Null-Hypothesis. In this regard, two main aspects of the analysis must be remembered, a) the degree of freedom and b) the level of significance of the distribution.

 

 

1.1 Degrees of Freedom(d.f.):

 

Degree of freedom is generally the number of independent parameters of information that is provided in a given set of data. For example if there are four scores and the mean is fifty, it means that the sum of the given scores is 200. For the first score we are free to choose any value, for e.g. 70, in the same way for the second score we can freely select 20, for the third 45 which brings to a total of 135. At this point our freedom stops as we cannot have any choice on the last number as it has to be 65 because the sum has to come to 200. So when we do the average of four scores, we have three degrees of freedom which means that we are totally free for the first three numbers to choose but the fourth number is forced on us to take {200-(70+20+45=135)} that is 65. So, the degree of freedom (d.f.) used in Chi-Square analysis is n-1 where n is the total number of frequencies. The shape of the distribution will vary with (n-1) d.f. which is testing ground of the distribution.

 

Table 1:Critical value of Chi-Square (  2) distribution

 

1.2  The Level of Significance:

 

The calculated value of Chi-Square is compared with the critical values given in the tableto find out whether the sum of the differences between the observed (Oi)and expected (Ei) frequencies is statistically significant or not. This is always tested in terms of certain probabilities. These probabilities indicate the level of confidence under which we can accept or reject the null hypothesis. There are various probability levels for which the table values of Chi-Square are given. It is called the level of significance. Usually the value of Chi-Square at 0.05 and 0.01 levels of significance for the given degrees of freedom are used from the table for the purpose of analysis which signifies that we are 95% and 99% confident while rejecting or accepting the Null Hypothesis. There is Chi- square calculator for calculating it critical value for a given value of degree of freedom (http://graphpad.com/quickcalcs/chisquared1.cfm). In this calculator, select calculation, d.f. and enter appropriate value, you will find the critical table value as given above in Table-1.

 

 

Example

 

Consider a hypothetical set of scores of number 1,2,3 and 4 with some frequencies given in Table 2,below. Suppose we have to decide about the fact that these frequencies are generated with equal probabilities. We can make null hypothesis as below:

 

Ho: Scores are generated with equal probabilities, against the alternative hypothesis, H1:Scores are not generated with equal probabilities.which has generated the above sores.

 

Accepting or rejecting the null hypothesiswill help us in making the conclusion about the process.

 

The above hypothesis can be tested with the help of Chai-square test as shown below in Table 2. Column 2 of the table gives the observed frequencies of the scores. Column 3.Of the table gives the expected frequencies under the assumption of equal probabilities. Column 4 gives the difference, nex column gives the square of the difference and the last column gives the final value to be added and at the bottom of the column the sum which is the calculated value of Chai-square for 3 d.f. is given= 30.07

 

Table-2: Calculation of chi Square values

Score Freely chosen frequencies (O) Average foreach score (E) (O-E) (O-E)2 (O-E)2/E
1 70 50 20 400 400/50=8
2 20 50 -30 900 900/50=18
3 45 50 -5 25 25/50=0.2
4 65 50 15 225 225/50=4.5
total 200 200 Σ = 30.7

 

 

Hereat d.f. =3, the calculated value of=30.7

 

This calculated value of Chai-square is to be tested against the values of Chai-square for 3 d.f. given in table 2, for different levels of significance.

 

Table value of Chi-square is found to be( 11.345) at .01 significant level (see Table-1),

 

Since our calculated value is much higher than the tabulated value even at 0.001 level of significance, it is statistically significant and null hypothesis is rejected.

 

2. Application of Chi-Square test:

 

Chi-square test is applicable when expectations are based on normal distributions. So the selection of samples from a given population and its goodness of chances may be tested with this analysis.

 

Chi-square test is applicable when our expectations are based on predetermined results.

 

Chi-square test is used to test the degree of divergence of observed value from expected results, when our expectations are based on hypothesis of equal probability of the occurrences the events in a specific geographic environment.

 

 

3.Uses of Chi-square test:

  1. It is conceptually useful to testthe proportions between observed and expected values of an event.
  2. It is used in testing hypothesis but is not useful for estimation;
  3. Chi-square test can be applied to complex contingency table with several classes of different attributes.
  4. Chi-square test has a very useful property i.e., ‘the additive property’. If a number of sample studies are conducted in the same field, the results can be pooled together. This means that 2values can be added.

     

1. Examples of Different Situations

 

  1.1  Example 1:

 

In 50tosses of a coin we have observed 20 heads and 30 tails. The null hypothesis in this regard suggests that there is no difference between the number of heads and the number of tails and thus the coin is uniform.

 

 

Analysis:

 

Since, there are only two outcomes i.e.,heads and tailsthus expected frequencies will be

 

50÷2 = 25

 

Therefore, the expected frequencies for both the outcomes will be 25 each and the observed are, 20heads and 30tails.

 

 

As we know,

 

 

Since, the numbers of categories (head, tail) are 2, the degree of freedom (n-1) will be

 

  1. At 1 degree of freedom the tabulated value of 2 at .05 level of significance is 3.84 and at .01 level of significance is 6.64. The calculated value of the Chi Squared is 2, which is smaller than the tabulated value in both0.05 level of significance and0.01 level of significance. Hence, the null hypothesis that the coin is uniform in both of its sides and there is no significant statistical difference between the number of heads and the number of tails is accepted.

 

Example – 2:

 

The value of chi-square can be calculated under the assumption of independence of two attributes when two attributes are cross-classified into a contingency table. If the row and column attributes are independent, the expected frequencies of the cell in the ith row and jth column in a contingency table is equal to       ×    , where   and   are the totals of row and jth column and N is the total number of frequencies. Degrees of freedom of r rows and c column is (r 1) and (c 1) respectively (Mahmood, 1986).

 

It is demonstrated in the following hypothetical situation:-

 

Two field surveyors classified some housesin a village on the basis of type from the sample they collected. Their results are as follows:

 

 

Chi-Square analysis is used here to show that the sampling technique of one field surveyors is defective

 

The null hypothesis in this case says that the sample techniques adopted by the field surveyors are similar. The expectations of A investigator in classifying the house types are as follows:

 

 

We can now calculate the value of    2 as follows:

 

Table- 3: Observed and Expected Values of Different House Types

 

 

Degrees of freedom = (c-1) (r-1) = (3-1) (2-1) = 2

 

The tabulated value of chi square for two degrees of freedom at 0.05 level of significance is 5.991. The calculated value is thus higher than the tabulated value which means that the null hypothesis which suggested the sample techniques of both the field surveyors are similar is rejected. Thus, the technique of one surveyor is better than the other.

 

Example-3:

 

In order to determine the distribution pattern of an area, the Chi-Square test is used though it is “space- based” measure rather “distance based “as described in Nearest Neighbour Analysis. The entire space is divided in to equal size grids/squares and then numbers of each grid are counted to calculate their density. In doing this measure, we proceed as follows.

  1. Count the number of points in the pattern under consideration.
  2. Construct grid squares of equal size to cover the study area completely.
  3. Calculate the observed frequencies of points in each grid square.
  4. Calculate the expected frequencies of points in each grid square, working on the assumption that the pattern is perfectly ordered as in the figure 2(a) Expected Frequency (Ei) =  {(Total no. of points in the area)/ (Total no. of grids)}
  5. Count the number of points actually located within each grid square for the pattern in question, and record each total as an observed frequency (oi).
  6. Calculate the value of χ2 from the formula

2 =       (  −   )2

 

 

This is best done by tabulating the observed and expected frequencies for each square as exemplified below

 

An example has been worked out to demonstrate the 2 test fromtoposheet No. 45H3 with R.F. 1:50,000 to analyse the distribution of settlements in a part of Gujarat state(73 o0/E – 73 o 3/ E and 24 o 15/ N – 24 o 18 /N). The entire area is divided in to 24 equal grids and the degree of freedom (n-1) in this case calculated24. The tabulated value of χ2 at .05  level  of  significance  is  36.42  and  at  .01  level  of  significance  is  42.98.  The calculated value of χ2 is 196.36 which is much higher than the critical table value. Hence, the null hypothesis that the settlement pattern is uniformly distributed is hereby rejected.

 

 

(B: Observed Number of Points/Settlements)

10 15 25 20 3
3 6 5 25 17
8 8 14 28 18
39 12 8 46 30
10 0 18 27 12

 

(C: ExpectedNumber of Points/ Settlements)

 

16.28 16.28 16.28 16.28 16.28
16.28 16.28 16.28 16.28 16.28
16.28 16.28 16.28 16.28 16.28
16.28 16.28 16.28 16.28 16.28
16.28 16.28 16.28 16.28 16.28

 

Oil Ei (OiEi)2 (OiEi)2 /Ei
1 10 16.28 39.44 2.42
2 3 16.28 176.35 10.83
3 8 16.28 68.56 4.21
4 39 16.28 516.19 31.71
5 10 16.28 39.44 2.42
6 15 16.28 1.64 0.10
7 6 16.28 105.67 6.49
8 8 16.28 74.48 4.57
9 12 16.28 18.32 1.13
10 0 16.28 265.04 16.28
11 25 16.28 76.04 4.67
12 5 16.28 127.24 7.82
13 14 16.28 5.19 0.32

Table-4: the showing  calculation of Chi Square value

 

 

The value of χ2 obtained in this way can be interpreted in a simple manner. If the pattern being studied is itself perfectly uniform, then χ2 = 0, since expected and observed frequencies are identical. In general, a low total of χ2 indicates a fairly uniform distribution of points, while a higher value suggests a greater degree of clustering. The maximum value of χ2 is obtained when all the points in a pattern lie within one grid square. An increase in either the total number of points in the pattern or the number of grid squares used, would give a higher maximum value forχ2.

 

 

Summary

 

The main drawback of this method is the large variation of its result that can be obtained by using different grids. Changes in the size and orientation of individual square in the grid can have a considerable effect upon the value of χ2, even though the number and pattern of points in the distribution remains the same. Peter Davis (1974) says, “The size of the grid squares used determines the effectiveness of the technique in recognizing elements of clustering which may be present in a distribution. As grid squares become smaller, the discrimination of clustering improves, and the value of χ2 therefore increases. On the other hand, the use of larger grid squares reduces the value of χ2 and gives the impression of greater uniformity. In the extreme case, the use of a single large grid square to cover the entire study area would indicate a perfectly ordered pattern, regardless of the degree of clustering which actually existed. Thus, valid comparison between distributions is only possible if a standard size of grid square is used.” In the geographical studies, grid size determines the frequencies of the distribution and predicts more accurate results.

 

 

 

you can view video on Application of Chi Square Analysis in Geographical Studies

 

References

  • Davis, P. (1988): Science in Geography 3, Data description and presentation, Oxford University Press, Hong Kong.
  • Mahmood, A. (1986): Statistical Methods in Geographical Studies, Rajesh Publications, New Delhi- 110002.
  • Kothari, C.R. (2013): Research Methodology, Methods and Techniques, New Age International Publishers, New Delhi-110002.
  • Sahu, B.K. (2004): Statistics in Psychology and Education, Kalyani Publishers, New Delhi, 110002
  • S, Murray. R. (1972): Schaum’s Outline of Theory and Problems of Statistics, McGraw-Hill Book Company, Singapore.
  • Web: http://www. Wikipedia.org/wiki/Nomogram#chi-squired_test.html