20 Chi Square (χ2)Distribution and tests of significance based on χ2
Felix Bast
1. Introduction
Chi Square (χ2)distribution is an important probability distribution which is used in a number of statistical tests of significance, most famous among which are χ2 test of independence and χ2 test of the goodness of fit. As a non-parametric, rank-based method, tests based on χ2-distribution do not make any assumptions on the normality of the distributions of populations from which the samples came from, and therefore, the test is suitable for the analysis of nominal or categorical samples. However, for the analysis of non-Gaussian data, better non-parametric tests are available (for example, Mann-Whitney U test to compare two unpaired groups, and Kruskal-Wallis test to compare means of three or more unpaired groups). When the outcome is binomial-especially for the analysis of 2×2 contingency tables-Fisher’s exact test is preferred. To compare three or more unpaired groups, χ2 test of independence is still the best method. When we want to find the fit of an observed distribution (data) to a theoretically expected distribution (model), χ2 test of the goodness of fit is performed. The main difference between χ2 test of independence and χ2 test of the goodness of fit is that while the former automatically calculates expected frequency from the input data of observed frequencies, the latter require input of expected frequencies derived from an explicit model (for example, Mendel’s dihybrid cross ratio, or Fisherian sex ratio).
2. Learning Outcome:
a. To learn about the properties of χ2-distribution and statistical tests of significance based upon χ2 distribution
b. To learn assumptions for χ2-tests
c. To learn how χ2 test of independence is performed
d. To learn how fisher’s exact test is performed, which is a much better alternative to χ2 test of independence especially with small sample sizes
e. To learn how χ2 test of the goodness of fit is performed
3. χ2 distribution
χ2 distribution (Chi-Square distribution) is a type of asymmetric continuous probability distribution with probabilities of every χ2 statistic known under the assumption of null hypothesis. It was first used and described by Karl Pearson in 1900, one of the founding fathers of statistics and Population genetics. This distribution enables us to calculate P value from a given χ2 statistic.
Distribution of these χ2 statistic under the assumption of null hypothesis plotted as in a probability histogram is called χ2 distribution. Like lognormal distribution and F-distribution, χ2 distribution is right-skewed (with a long tail towards right. The shape of χ2 -distribution depends only on k the shape parameter (degrees of freedom, df). χ2 distribution is a special case of more generalized gamma distribution.
4. Tests of significance based on χ2 distribution
χ2 distribution is used for two main tests; χ2 test of independence for the analysis of categorical data (to test whether two categorical variables are correlated), and χ2 test of the goodness of fit of an observed distribution (data) to a theoretically expected distribution (model). This distribution is also used for likelihood ratio test (and its variant, hierarchical LRT) used for model selection in molecular phylogenetics
5. Assumptions for χ2 tests
1. Random observations
2. Independent measurements
3. Accurate data
Note that there are no explicit assumptions about the distributions of populations from which these samples are drawn, as χ2 is considered as a nonparametric test.
6. χ2 test of independence
This test analyses two sets of information (two variables) for any association between them, therefore in sensu stricto, this test falls under multivariate statistics. In effect, this test compares two proportions like other methods for comparing proportions such as Relative Risk, Attributable risk, Odd’s Ratio and so on. The null hypothesis is that there is no association between them (these variables are statistically independent), while alternative hypothesis is that there is an association between them (variables are statistically dependant).
The χ2 test statistic is computed as
Where fo is observed frequency and fe is expected frequency. fe can be calculated as (row total x column total)/n
Let us consider an example. Is there any association between income level and happiness level? To study, imagine we have done a questionnaire survey and obtained the results as given below:
This is the result of questionnaire survey with total number of participants 2955, which is indicated in the table as the overall total. The first cell 272 means out of 615 rich participants, 272 responded that their happiness level is high. These numbers (highlighted in italics) are the observed frequencies (fo in our equation). Note that these numbers are only frequencies; measurement merely measures into three categories (high, middle and low), so the level of measurement here is nominal (categorical). Tables like these where exact measured values are entered is called contingency tables.
Let us first define our null hypothesis and alternative hypotheses:
H0: Income level happiness level are independent
Ha: Income level and happiness level are dependant
First step in χ2 test of independence is to calculate fe, the expected frequencies of all these cells. This is computed as:
fe= (row total x column total)/ overall total
For the first cell (where fo=272), row total is 615 and column total is 911. Plugging into the above equation,
fe= (615 x 911)/2955 =189.6
It is better to calculate these values in a tabular format:
χ2 test statistic = 172.28
Next step is to look up χ2 table to find χ2 critical value, for which we should know degree of freedom and significance level (which is 0.05). For χ2 tests df can be calculated by the following equation
df = (No. of rows-1) x (No. of columns -1)
Remember that these numbers means that of the actual data; totals or labels are excluded.
Df= (3-1) x (3-1)
=2 x 2
=4
As table value (critical χ2, 9.488) is far less than our obtained χ2 test statistic (172.28), we can conclude that P<0.05, we reject null hypothesis of independence of two variables and conclude that two variables are dependant, or associated. Moving towards right in the table, we can see that even at significance level 0.005, critical χ214.86 is still far less than our obtained χ2 test statistic, therefore P value must be <0.005
There is no support for χ2 test in excel. An online calculator like the following can be used instead
http://turner.faculty.swau.edu/mathematics/math241/materials/contablecalc/
The χ2 statistic is sensitive to small cell sizes. Whenever any of your cell sizes are <5, a slightly modified formula (Yates’ correction for continuity) to calculate chi square should be used
Modified Formula (0.5 is deducted from the absolute value of fo – fe before squaring)
However, most statisticians agree that Yates correction overcorrects it.
When there are only two categories (like head and tail in coin flipping, or male or female in gender), to make inferences of one population the best option is binomial test, which calculate the exact probabilities using binomial equation. Binomial test is available at https://www.graphpad.com/quickcalcs/binomial1/ P values inferred from χ2 test are only approximations, not exact.
7. Fisher’s exact test
For 2 x 2 contingency tables used frequently in case control studies, the best test is Fisher’s exact test that can be found here:
H0: Apixaban does not alter the risk of a recurrent thromboembolism
Ha: Apixaban alters the risk of a recurrent thromboembolism
The values in the table indicate No. of patients (treated with placebo or apixaban, two rows) who already had thromboembolism and going on to have another thromboembolism during the study period (in the column ”recurrent”) and those who do not have second episode of thromboembolism (in the column “No Recurrence”). In 2 x 2 contingency tables like this, it is customary to enter groups as rows and outcomes as columns. Fisher’s exact test uses the following formula which is simple and straightforward to understand:
Where a, b, c and d are values in 2 x 2 contingency table and n is the total number of values of the table. When the numbers become very large, calculation of factorials becomes mathematically unwieldy so that χ2 test is preferred.
Fisher’s exact test for our above example returns a P value less than 0.0001, so the difference is very significant.
8. χ2 test of the goodness of fit
χ2 test of the goodness of fit is used when we want to find the fit of an observed distribution (data) to a theoretically expected distribution (model). The main difference from the earlier test (independence) is that for goodness of fit the expected frequencies are derived from a theory or a mathematical model, while in the former, expected frequencies are calculated from the observed frequencies itself. Therefore, for test of independence, input data is only the observed (empirical) frequencies. In the case of test of goodness of fit, input data encompasses expected frequencies derived from theory in addition to the observed frequencies.
The χ2 test statistic for test of goodness of fit is computed exactly as in χ2 test of independence:
Where fo is observed frequency and fe is expected frequency. Only difference from χ2 test of independence is that fe is not computed from fo but from a model.
Let us consider an example. Out of total 556 pea plants, the famous Geneticist Gregor Mendel observed (dihybrid cross) four seed phenotypes in frequencies given below:
According to his famous law of independent assortment Mendel expected a certain ratio (9:3:3:1) of those phenotypes. This ratio is a model, a theoretically expected proportions. Let us plot those expected proportions in this table as well:
To get Expected frequencies, all we have to do is to multiply each of the expected proportions with the total no. of plants (556). Note that total of expected frequencies add up to the total (556)
Question is whether observed frequencies deviate significantly from the expected frequencies?
Let us first define our null hypothesis and alternative hypotheses:
H0: fo = fe (i.e, our data fits model well)
Ha: fo ≠ fe (i.e, our data do not fits model)
Now let us complete the χ2 table
χ2 test statistic = 0.47
Next step is to look up χ2 table to find χ2 critical value, for which we should know degree of freedom and significance level (which is 0.05). For χ2 tests of goodness of fit, our data is grouped only in rows, not in columns. So df is (no. of rows – 1)
4-1 = 3
As table value (critical χ2, 7.815) is far higher than our obtained χ2 test statistic (0.47), we can conclude that P>0.05 and the results are not significant; we fail to reject null hypothesis of equal frequencies. It means that our data is not significantly deviating from the model. A high P value means that the data fits model really well, and is desirable in goodness of fit analysis. Moving towards left in the table, we can see that even at significance level 0.9, critical χ2 0.58 is still less than our obtained χ2 test statistic, therefore P value must be >0.9. Almost all of Mendel’s data have unrealistically high P values that lead Fisher to doubt whether these values were real or not!
Microsoft Excel do not support χ2 test; online calculators like the one below can do the job well.
http://www.socscistatistics.com/tests/goodnessoffit/Default2.aspx
9. Summary
- χ2 test statistic can be computed by this formula: ∑[( fo – fe )2/ fe ] Where fo is observed frequency and fe is expected frequency. Both variants of χ2 test uses the same formula.
- For χ2 test of independence, fe is calculated from the table itself. For χ2 test of the goodness of fit, fe is derived from a theoretical model and user have to explicitly input the values to the table.
- χ2 test of independence is used to find association between two categorical variables. However whenever a cell value is less than 5, this test should not be used. For paired (dependant) values, this test should not be used McNemar’s test is used instead)
- χ2 test of independence is also used to compare three or more unpaired groups of binomial data. For the analysis of 2 x 2 contingency table of binomial data, a better alternative is Fisher’s Exact Test. For the fit of observed data to a theoretical model involving only one population, binomial test is preferred as it returns exact P value.
- For the fit of observed data to a theoretical model involving more than one population, χ2 test of the goodness of fit can be used.
- A web-based Chi-Square calculator for test of independance is available at http://turner.faculty.swau.edu/mathematics/math241/materials/contablecalc/
- For chi square test of goodness of fit, visit https://www.graphpad.com/quickcalcs/chisquared1.cfm
- For binomial test, visit https://www.graphpad.com/quickcalcs/binomial1/
- For fisher’s exact test, visit https://www.graphpad.com/quickcalcs/contingency1.cfm