From: nichols@spss.com (David Nichols)
Subject: Expected mean squares and error terms in GLM
Date: 1996/11/05
Message-ID: <55oa9t$1tj@netsrv2.spss.com>#1/1
organization: SPSS, Inc.
newsgroups: comp.soft-sys.stat.spss
I've had a few questions from users about expected mean squares and error terms in GLM. In particular, with a two way design with A fixed and B random, many people are expecting to see the A term tested against A*B and B tested against the within cells term. In the model used by GLM, the interaction term is automatically assumed to be random, expected mean squares are calculated using Hartley's method of synthesis, and the results are not as many people are used to seeing. In this case, both A and B are tested against A*B. Here's some information that people may find useful.
It would appear that there's something of a split among statisticians in how to handle models with random effects. Quoting from page 12 of the SYSTAT DESIGN module documentation (1987):
There are two sets of distributional assumptions used to analyze a two factor mixed model, differing in the way interactions are handled. The first, used by SAS (1985, p. 469-470), can be traced to Mood (1950). Interaction terms are assumed to be a set of i.i.d. normal random variables. The second, used by DESIGN, is due to Anderson and Bancroft (1952). They impose the constraint that the interactions sum to zero over the levels of fixed factor within each level of the random factor.
According to Miller (1986, p. 144): "The matter was more or less resolved by Cornfield and Tukey (1956)." Cornfield and Tukey derive expected mean squares under a finite population model and obtain results in agreement with Anderson and Bancroft.
On the other side, Searle (1971) states: "The model that leads to [Mood's results] is the one customarily used for unbalanced data."
Statisticians have divided themselves along the following lines:
Mood (1950, p. 344) |
Anderson and Bancroft (1952) |
Hartley and Searle (1969) |
Cornfield and Tukey (1956) |
Hocking (1985, p. 330) |
Graybill (1961, p. 398) |
Milliken and Johnson (1984) |
Miller (1986, p. 144) |
Searle (1971, sec. 9.7) |
Scheffe (1959, p. 269) |
SAS |
Snedecor and Cochran (1967, p. 367) |
SPSS GLM* |
DESIGN |
The references are:
Cornfield, J., & Tukey, J. W. (1956). Average values of mean squares in factorials. Annals of Mathematical Statistics, 27, 907-949.
Graybill, F. A. (1961). An introduction to linear statistical models (Vol. 1). New York: McGraw-Hill.
Hartley, H. O., & Searle, S. R. (1969). On interaction variance components in mixed models. Biometrics, 25, 573-576.
Hocking, R. R. (1985). The analysis of linear models. Monterey, CA: Brooks/Cole.
Miller, R. G., Jr. (1986). Beyond ANOVA, basics of applied statistics. New York: Wiley.
Milliken, G. A., & Johnson, D. E. (1984). Analysis of Messy Data, Volume 1: Designed Experiments. New York: Van Nostrand Reinhold.
Mood, A. M. (1950). Introduction to the theory of statistics. New York: McGraw-Hill. Scheffe, H. (1959). The analysis of variance. New York: Wiley.
Searle, S. R. (1971). Linear models. New York: Wiley.
Snedecor, G. W., & Cochran, W. G. (1967). Statistical methods (6th ed.). Ames, IA: Iowa State University Press.
SPSS can be added to the left hand column. We're assuming i.i.d. normally normally distributed random variables for any interaction terms containing random factors.
-----------------------------------------------------------------------------
David Nichols Senior Support Statistician SPSS, Inc.
Phone: (312) 329-3684 Internet: nichols@spss.com Fax: (312) 329-3668
-----------------------------------------------------------------------------