------------------------------------------------------ Assumptions/restrictions for use of chi-square tests ------------------------------------------------------ The chi-square distribution with the appropriate degrees of freedom provides a good approximation to the sampling distribution of Pearson's chi-square when the null hypothesis is true, and the following conditions are met: 1. Each observation is independent of all the others (i.e., one observation per subject)*; 2. "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, p. 734); 3. For 2x2 tables: a) All expected frequencies should be 10 or greater. b) If any expected frequencies are less than 10, but greater than or equal to 5, some authors suggest that Yates' Correction for continuity should be applied. This is done by subtracting 0.5 from the absolute value of (O-E) before squaring. However, the use of Yates' correction is controversial, and is not recommended by all authors. c) If any expected frequencies are smaller than 5, then some other test should be used (e.g., Fisher exact Test for 2x2 contingency tables)**. Notice that point number 2 differs from the common advice that all expected counts must be 5 or greater. That advice applies to 2x2 tables; but for larger tables, it may be okay to have one or more expected counts < 5. * For matched pairs of subjects, or 2 observations per person, McNemar's Change Test (or McNemar's chi-square) may be appropriate. ** Fisher's exact test is a "conditional" test--it is conditional on the observed marginal totals (i.e., row and column totals). Unconditional exact tests are also available. For example: http://www4.stat.ncsu.edu/~berger/tables.html -------------------------------------------------------- Pearson's formula versus the likelihood ratio chi-square -------------------------------------------------------- The following is from Alan Agresti's book, "Categorical Data Analysis". "It is not simple to describe the sample size needed for the chi-squared distribution to approximate well the exact distributions of X^2 and G^2 [also called L^2 by some authors]. For a fixed number of cells, X^2 usually converges more quickly than G^2. The chi-squared approximation is usually poor for G^2 when n/IJ < 5 [where n = the grand total and IJ = rc = the number of cells in the table]. When I or J [i.e., r or c] is large, it can be decent for X^2 for n/IJ as small as 1, if the table does not contain both very small and moderately large expected frequencies." (Agresti, 1990, p. 49) References Agresti, A. (1990). Categorical Data Analysis. New York: Wiley. Yates, D., Moore, Moore, D., McCabe, G. (1999). The Practice of Statistics (1st Ed.). New York: W.H. Freeman. -- Bruce Weaver bweaver@lakeheadu.ca 2-Feb-2006