R by C Contingency Table Analysis

Menu location: Analysis_Chi-square_R by C.

The r by c chi-square test in StatsDirect uses a number of methods to investigate two way contingency tables that consist of any number of independent categories forming r rows and c columns.

Tests of independence of the categories in a table are the chi-square test, the G-square (likelihood-ratio chi-square) test and the generalised Fisher exact (Fisher-Freeman-Halton) test. All three tests indicate the degree of independence between the variables that make up the table.

The generalised Fisher exact test is difficult to compute (Mehta and Patel, 1983, 1986a); it may take a long time and it may not be computed for the table that you enter. If the Fisher exact method cannot be computed practically then a hybrid method based upon Cochrane rules is used (Mehta and Patel, 1986b); this may also fail with large tables and/or numbers. The Fisher-Freeman-Halton result is quoted with just one P value as it is implicitly two-sided.

Relating the Fisher-Freeman-Halton statistic to the Pearson Chi-square statistic:

The null hypothesis is independence between row and column categories.
Let t denote a table from the set of all tables with the same row and column margins.
Let D(t) be the measure of discrepancy.
The exact two sided P value = P [D(t) gt;= D(t_observed)] = sum of hypergeometric probabilities of those tables where D(t) is larger than or equal to the observed table.
In large samples the distribution of D(t) conditional on fixed row and column margins converges to the chi-square distribution with (r-1)(c-1) degrees of freedom.

The G-square statistic is less reliable than the chi-square statistic when you have small numbers. In general, you should use the chi-square statistic if the Fisher exact test is not computable. If you consult a statistician then it would be useful to provide the G-square statistic also.

These tests of independence are suitable for nominal data. If your data are ordinal then you should use the more powerful tests for trend (Armitage and Berry, 1994; Agresti, 2002, 1996).

Assumptions of the tests of independence:

the sample is random
each observation may be classified into one cell (in the table) only

- where, for r rows and c columns of n observations, O is an observed frequency and E is an estimated expected frequency. The expected frequency for any cell is estimated as the row total times the column total then divided by the grand total (n).

- where P is the two sided Fisher probability, P_f is the conditional probability for the observed table given fixed row and column totals (f_i. and f_.j respectively), f_.. is the total count and ! represents factorial.

Analysis of trend in r by c tables indicates how much of the general independence between scores is accounted for by linear trend. StatsDirect uses equally spaced scores for this purpose unless you specify otherwise. If you wish to experiment with other scoring systems then expert statistical guidance is advisable. Armitage and Berry (1994) quote an example where extent of grief of mothers suffering a perinatal death, graded I to IV, is compared with the degree of support received by these women. In this example the overall statistic is non-significant but a significant trend is demonstrated.

The sample correlation coefficient r reflects the direction and closeness of linear trend in your table. r may vary between -1 and 1 just like Pearson's product moment correlation coefficient. Total independence of the categories in your table would mean that r = 0. The test for linear trend is related to r by M²=(n-1)r² and this is numerically identical to Armitage's chi-square for linear trend (Armitage and Berry, 1994; Agresti, 1996). If you interchange the rows and columns in your table then the value of M² will be the same

The ANOVA output applies techniques similar to analysis of variance to an r by c table. Here the equality of mean column and row scores is tested. StatsDirect uses equally spaced scores for this purpose unless you specify otherwise. See Armitage for more information (Armitage and Berry, 1994).

Pearson's and Cramér's (V) coefficients of contingency and the phi (f, correlation) coefficient reflect the strength of the association in a contingency table (Agresti, 1996; Fleiss, 1981; Stuart and Ord, 1994):

For 2 by 2 tables, Cramér's V is calculated alternatively as a signed value:

Observed values, expected values and totals are given for the table when c ≤ 8 and r ≤ 10.

If your data categories are both ordered then you will gain more power in tests of independence by using the ordinal methods due to Goodman and Kruskal (gamma) and Kendall (tau-b). Large sample, asymptotically normal variance estimates are used; the simple form is used for independence testing (Agresti, 1984; Conover, 1999; Goodman and Kruskal, 1963, 1972). Tau-b tends to be less sensitive than gamma to the choice of response categories.

Example

From Armitage and Berry (1994, p. 408).

The following data (as above) describe the state of grief of 66 mothers who had suffered a neonatal death. The table relates this to the amount of support given to these women:

		Support
		Good	Adequate	Poor
Grief State:	I	17	9	8
	II	6	5	1
	III	3	5	4
	IV	1	2	5

To analyse these data in StatsDirect you must select r by c from the chi-square section of the analysis menu. Choose the default 95% confidence interval. Check the boxes marked "Show expected counts" and "Show cell chi-square". Then enter the above data as directed by the screen.

For this example:

Observed	17	9	8	34
Expected	13.91	10.82	9.27
DChi²	0.69	0.31	0.17
Observed	6	5	1	12
Expected	4.91	3.82	3.27
DChi²	0.24	0.37	1.58
Observed	3	5	4	12
Expected	4.91	3.82	3.27
DChi²	0.74	0.37	0.16
Observed	1	2	5	8
Expected	3.27	2.55	2.18
DChi²	1.58	0.12	3.64
Totals:	27	21	18	66

TOTAL number of cells = 12

WARNING: 9 out of 12 cells have 1 ≤ EXPECTATION < 5

NOMINAL INDEPENDENCE

Chi-square = 9.9588, DF = 6, P = 0.1264

G-square = 10.186039, DF = 6, P = 0.117

Fisher-Freeman-Halton exact P = 0.1426

ANOVA

Chi-square for equality of mean column scores = 5.696401

DF = 2, P = 0.0579

LINEAR TREND

Sample correlation (r) = 0.295083

Chi-square for linear trend (M²) = 5.6598

DF = 1, P = 0.0174

NOMINAL ASSOCIATION

Phi = 0.388447

Pearson's contingency = 0.362088

Cramér's V = 0.274673

ORDINAL

Goodman-Kruskal gamma = 0.349223

Approximate test of gamma = 0: SE = 0.15333, P = 0.0228, 95% CI = 0.048701 to 0.649744

Approximate test of independence: SE = 0.163609, P = 0.0328, 95% CI = 0.028554 to 0.669891

Kendall tau-b = 0.236078

Approximate test of tau-b = 0: SE = 0.108929, P = 0.0302, 95% CI = 0.02258 to 0.449575

Approximate test of independence: SE = 0.110601, P = 0.0328, 95% CI = 0.019303 to 0.452852

Here we see that although the overall test was not significant we did show a statistically significant trend in mean scores. This suggests that supporting these mothers did help lessen their burden of grief.

P values

confidence intervals