Chi-square Goodness of Fit Test

Menu location: Analysis_Nonparametric_Chi-Square Goodness of Fit.

This function enables you to compare the distribution of classes of observations with an expected distribution.

Your data must consist of a random sample of independent observations, the expected distribution of which is specified (Armitage and Berry, 1994; Conover, 1999).

Pearson's chi-square goodness of fit test statistic is:

- where O_j are observed counts, E_j are corresponding expected count and c is the number of classes for which counts/frequencies are being analysed.

The test statistic is distributed approximately as a chi-square random variable with c-1 degrees of freedom. The test has relatively low power (chance of detecting a real effect) with all but large numbers or big deviations from the null hypothesis (all classes contain observations that could have been in those classes by chance).

The handling of small expected frequencies is controversial. Koehler and Larnz (1980) assert that the chi-square approximation is adequate provided all of the following are true:

total of observed counts (N) ≥ 10
number of classes (c) ≥ 3
all expected values ≥ 0.25

Some statistical software offers exact methods for dealing with small frequencies but these methods are not appropriate for all expected distributions, hence they can be specious. You can try reducing the number of classes but expert statistical guidance is advisable for this (Conover, 1999).

Example

Suppose we suspected an unusual distribution of blood groups in patients undergoing one type of surgical procedure. We know that the expected distribution for the population served by the hospital which performs this surgery is 44% group O, 45% group A, 8% group B and 3% group AB. We can take a random sample of routine pre-operative blood grouping results and compare these with the expected distribution.

Results for 187 consecutive patients:

Blood Group:	O	67
	A	83
	B	29
	AB	8

To analyse these data using StatsDirect you must first enter the observed frequencies into the workbook. You can enter the grouped frequencies, as above, or the individual observations (187 rows coded 1 to 4 in this case). If you enter individual observations, StatsDirect collects them into groups/bins/classes of frequencies which you can inspect before proceeding with the analysis. The next step is to enter the expected frequencies, this is done directly on screen after you have selected the observed frequencies and chosen Chi-square Goodness of Fit from the Nonparametric section of the analysis menu. For this example you can enter the expected proportions, the expected frequencies will be calculated and displayed automatically. You can also alter the number of degrees of freedom but this is intended for expert statistical use, thus you would normally except the default value of number of categories minus one. The results for our example are:

N = 187

Value	Observed frequency	Expected frequency
1	67	82.28
2	83	84.15
3	29	14.96
4	8	5.61

Chi-square = 17.0481 df = 3

P = .0007

Here we may report a statistically highly significant difference between the distribution of blood groups from patients undergoing this surgical procedure and that which would be expected from a random sample of the general population.

P values