* Display syntax commands in the Output Viewer . SET Printback=On Length=59 Width=80. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * File: linreg.SPS . * Date: 13-March-2001 . * Author: Bruce Weaver, weaverb@mcmaster.ca . * Notes: Demonstration of simple linear regression and correlation using the dataset described in my notes . * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * NOTE: I am deliberately using a very small dataset so that it is easier for us to see what is happening in the analysis. * First, read in the 6 X-Y pairs . DATA LIST LIST /x(f2.0) y(f2.0) . BEGIN DATA. 20 20 30 50 45 35 60 60 78 45 88 90 END DATA. var lab x 'Spelling score' y 'Writing score' . * List the data, and show descriptive stats on X and Y . * To produce the following syntax: ANALYZE-->REPORTS-->Case Summaries . SUMMARIZE /TABLES=x y /FORMAT=VALIDLIST NOCASENUM TOTAL LIMIT=100 /TITLE='Case Summaries' /MISSING=VARIABLE /CELLS=COUNT MEAN STDDEV VAR . * To get the following syntax: ANALYZE-->CORRELATE-->BIVARIATE . CORRELATIONS /VARIABLES=x y /PRINT=TWOTAIL SIG /MISSING=PAIRWISE . * Note that the mean of the Y-scores = 50 . * Create a variable YBAR, and set it = 50 . * To produce the following syntax: TRANSFORM-->COMPUTE . compute ybar = 50. exe. * Compute regression equation for predicting Writing Ability from Spelling score; ask SPSS to SAVE the predicted Y-score for each person, as well as the "residual" score; also generate a scatterplot . * ANALYZE-->REGRESSION-->LINEAR . REGRESSION /MISSING LISTWISE /STATISTICS COEFF CI ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT y /METHOD=ENTER x /scatterplot=(y,x) /SAVE PRED RESID . * Note that the p-value for the F-test in the ANOVA of Regression table is identical to the 2-tailed p-value for the correlation between X and Y we saw earlier . * SS(Total) = 2850.000 . * SS(Regression) = 1683.762 . * SS(Residual) = 1166.238 . * The regression equation is: Y-prime = 0.686(X) + 13.307 . * The /SAVE PRED RESID you see in the Regression syntax shown above asked SPSS to save each subject's predicted score as well as their residual score (i.e., the error in prediction) . * Let's compute our own predicted Y-scores using this equation, and compare them to the predicted scores SPSS saved as variable pre_1 . * TRANSFORM-->COMPUTE . compute my_pred = 0.686*X + 13.307 . compute my_res = y - my_pred. exe. var lab my_pred 'My predicted Y-score' my_res 'My residual score' . * Now compute difference between the predicted and residual scores generated by SPSS, and those we computed ourselves; the difference should be 0, or very close (there may be some rounding error in our predictions) . compute diff1 = pre_1 - my_pred. compute diff2 = res_1 - my_res. exe. var lab diff1 '(SPSS Y-prime) - (My Y-prime)' diff2 '(SPSS residual) - (My residual)'. * ANALYZE-->DESCRIPTIVE STATISTICS-->DESCRIPTIVES . descrip pre_1 my_pred diff1 res_1 my_res diff2. * As you can see, there is a bit of rounding error in our estimates. * The funny notation you see in some cells is scientific notation; e.g., 8.000E-03 means 8.000 times 10 to the minus 3, or 0.008 . * Another way to look at the agreement of our computations with those of SPSS is to crosstabulate the variables as follows . * ANALYZE-->DESCRIPTIVE STATISTICS-->CROSSTABS . crosstabs /tables pre_1 by my_pred /tables res_1 by my_res. * Show that SS(Y) = SS(regression) + SS(residual) . compute sqdevy = (y-ybar)**2. /* **2 means 'squared' */ compute sqdevreg = (pre_1 - ybar)**2. compute sqdevres = (y - pre_1)**2. exe. var lab sqdevy '(Y - Ybar)**2' sqdevreg '(Yprime - Ybar)**2' sqdevres '(Y - Yprime)**2' . * ANALYZE-->COMPARE MEANS-->MEANS . means sqdevy sqdevreg sqdevres /cells = count sum mean. * Compare the SUMS shown above to the Sums of Squares in the ANOVA table generated by the REGRESSION command; they are identical . * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * REGRESSION OF X ON Y . * If you wished to predict X from Y, you would have to compute a different equation, because the errors in prediction would be measured differently (i.e., errors would be measured on the X-axis rather than the Y-axis). * Regression of X on Y (i.e., predicting X from Y) . REGRESSION /MISSING LISTWISE /STATISTICS COEFF CI ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT x /METHOD=ENTER y . * SS(Total) = 3579.000 . * SS(Regression) = 2114.746 . * SS(Residual) = 1464.754 . * NOTE that the sum of the squared errors in prediction for the regression of X on Y, or SS(residual), is not the same as for the regression of Y on X; this is because we are now partitioning SS(X) rather than SS(Y), and so we have a completely different set of error scores (or residuals) . * The regression equation is: X-prime = 0.861(Y) + 10.430 . * Finished. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * ADDENDUM: USING INTERACTIVE GRAPHICS TO MAKE SCATTERPLOTS . * You can use Interactive Graphics to produce a scatterplot WITH the regression line included; here's some syntax I generated by going to Graphs-->Interactive . * GRAPHS-->INTERACTIVE-->SCATTERPLOT . IGRAPH /VIEWNAME='Scatterplot' /X1 = VAR(x) TYPE = SCALE /Y = VAR(y) TYPE = SCALE /COORDINATE = VERTICAL /FITLINE METHOD = REGRESSION LINEAR INTERVAL(95.0) = MEAN INDIVIDUAL LINE = TOTAL SPIKE=OFF /TITLE='Writing Ability' + ' as a Function of Spelling Competence' /X1LENGTH = 3.0 /YLENGTH = 3.0 /X2LENGTH = 3.0 /CHARTLOOK = 'Grayscale.clo' /SCATTER COINCIDENT = NONE. EXE. * The preceding figure includes 2 intervals around the regression line. The wider interval is the 95% confidence interval for making a prediction about a single case with a given value of X. * The narrower interval is the 95% confidence interval for the MEAN of all cases with a given value of X . * Which interval is appropriate depends on what you are * using it for. You don't need to know about this for * this particular course. But if anyone needs to know * more about it at a later date, here's a website with * some more information: * http://courses.washington.edu/qsci483/lab3/ . * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .