------------------------------------------------------
Assumptions/restrictions for use of chi-square tests
------------------------------------------------------
The chi-square distribution with the appropriate degrees of freedom
provides a good approximation to the sampling distribution of Pearson's
chi-square when the null hypothesis is true, and the following conditions
are met:
1. Each observation is independent of all the others (i.e., one observation
per subject)*;
2. "No more than 20% of the expected counts are less than 5 and all individual
expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, p. 734);
3. For 2x2 tables:
a) All expected frequencies should be 10 or greater.
b) If any expected frequencies are less than 10, but greater than or
equal to 5, some authors suggest that Yates' Correction for continuity
should be applied. This is done by subtracting 0.5 from the absolute
value of (O-E) before squaring. However, the use of Yates' correction
is controversial, and is not recommended by all authors.
c) If any expected frequencies are smaller than 5, then some other test
should be used (e.g., Fisher exact Test for 2x2 contingency tables)**.
Notice that point number 2 differs from the common advice that all expected
counts must be 5 or greater. That advice applies to 2x2 tables; but for
larger tables, it may be okay to have one or more expected counts < 5.
* For matched pairs of subjects, or 2 observations per person,
McNemar's Change Test (or McNemar's chi-square) may be appropriate.
** Fisher's exact test is a "conditional" test--it is conditional on the
observed marginal totals (i.e., row and column totals). Unconditional
exact tests are also available. For example:
http://www4.stat.ncsu.edu/~berger/tables.html
--------------------------------------------------------
Pearson's formula versus the likelihood ratio chi-square
--------------------------------------------------------
The following is from Alan Agresti's book, "Categorical Data Analysis".
"It is not simple to describe the sample size needed for the chi-squared
distribution to approximate well the exact distributions of X^2 and G^2
[also called L^2 by some authors]. For a fixed number of cells, X^2
usually converges more quickly than G^2. The chi-squared approximation
is usually poor for G^2 when n/IJ < 5 [where n = the grand total and
IJ = rc = the number of cells in the table]. When I or J [i.e., r or c]
is large, it can be decent for X^2 for n/IJ as small as 1, if the table
does not contain both very small and moderately large expected frequencies."
(Agresti, 1990, p. 49)
References
Agresti, A. (1990). Categorical Data Analysis. New York: Wiley.
Yates, D., Moore, Moore, D., McCabe, G. (1999). The Practice of Statistics
(1st Ed.). New York: W.H. Freeman.
--
Bruce Weaver
bweaver@lakeheadu.ca
2-Feb-2006