Subject: How to check for departure from linearity in complex models From: Bruce Weaver Date: Wed, 09 Feb 2005 13:21:29 -0500 Newsgroups: sci.stat.edu In another thread, Herman Rubin wrote: ------ start of quote from HR ------- There is a huge amount of converting to normality in the applied area. Essentially ALL of this produces incorrect results. Yet the applied people ignore linearity, which is important, and go on using regressions on inappropriate data. ------- end of quote from HR -------- I think most of us can see how to check for linearity in the case of simple linear regression: i.e., we look at the scatterplot and see if a straight line fits nicely through the cloud of points. Even in the case of (multiple) linear regression with 2 predictors, we can look at the 3D scatterplot to see how well a 2D plane fits through the cloud of points. But in regression models with 3 or more variables, it is not so clear (to me, at least) how to check for linearity. Ditto for models with nominal predictor variables (e.g., a factorial ANOVA model). So my question for Professor Rubin, or anyone else who cares to jump in, is this: Is there an analog to looking at the scatterplot that one can use to check for departure from linearity in more complex linear models? Or is there some other reasonably straightforward method one can use? Thanks, Bruce -------------------------------------------------------------------------- Subject: Re: How to check for departure from linearity in complex models From: daniel_stahl@gmx.de Date: 10 Feb 2005 06:56:42 -0800 Newsgroups: sci.stat.edu you can check linearity by looking at the residuals of partial regression plots: "The Partial Regression Plots display scatter plots of residuals of each independent variable and the residuals of the dependent variable when both variables are regressed separately on the rest of the independent variables. Partial regression plots exhibit the "net" relationship between each independent variable and the dependent variable ("net" because the influence of the other variables is "partialed out"). These plots are important to identify possible nonlinear relationships or groups of influential cases that are not easily identified by the statistics mentioned above. If you add a regression line in the plot you can check if each independent variable fits the model (linear relationship) to detect which variable may cause violations against the assumptions of linear regression." (copied from spss) -------------------------------------------------------------------------- Subject: Re: How to check for departure from linearity in complex models From: hrubin@odds.stat.purdue.edu (Herman Rubin) Date: 10 Feb 2005 12:58:50 -0500 Newsgroups: sci.stat.edu The method suggested by Stahl in his post (at least I think that is his name), taken from SPSS, is certainly one to consider, and this is also the one one of my colleagues came up with. Another possibility is to look at the large residuals, and see if there is any pattern in the independent variables which goes with them. Another one is to see if large conditional variances of residuals occur at the extremes. In any case, be careful, as lots of tests are being made. Without replication, which seems unlikely in your problem, it is hard to test infinite-dimensional alternatives. -------------------------------------------------------------------------- Subject: Re: How to check for departure from linearity in complex models From: Bruce Weaver Date: Thu, 10 Feb 2005 16:18:51 -0500 Newsgroups: sci.stat.edu Thanks to those who have responded. Here's another response I got from Dan Ward that was not posted to the newsgroup. ----------------------------------------------- Sorry to respond off list, feel free to repost sans my email. I also use, among others, the method suggested by Stahl. I think that no one method is likely to be adequate for all data sets. You may wish to take a look at Ray Myers' textbook: Classical and Modern Regression with Applications, 2nd ed. , 1990. Duxbury Press. He suggests 4 methods to diagnose a single regressor variable in a model with many regressors: 1. Residual against predictor plots 2. Partial regression plots (added variable plots) 3. Component plus residual plots (partial residual plots) 4. Augmented partial residual plots. He provides an extremely readable treatment of the topic (and regression in general). Dan --------------------------------------------------------------------------