Homework 4
The SAS data set SASUSER.AIR
contains information regarding CO levels in the air. For this example, we will use a modified
version of that data set called NEWAIR.
The dependent (y) variable in this data set is CO – the carbon monoxide
level in the air. In this lab, we will
build a predictive model for CO. We have
4 potential predictors:
NO – the nitrogen oxide level
in the air
SO2 – the sulfur dioxide
level in the air
DUST – the amount of dust in
the air
WIND – the wind speed
1.
Create plots of
the dependent variable versus each of the independent variables. Do the relationships appear to be
linear? What do you notice about the
relationship between CO and WIND? Based
on the SAS/Insight demo from lab, what polynomial term(s) should also be included
in the model?
2.
Use PROC CORR to
examine the correlations between the variables.
Which variables are the most strongly related to CO? Is there strong correlation between any of
the independent variables?
3.
Use PROC REG to
generate partial regression plots for each of the independent variables. Why is it necessary to create partial
regression plots? For this data set,
what do these plots tell us?
4.
We may also need
to include interaction terms in our multiple regression model
for CO. Using the output from the lab
demo, record the value of the slope coefficient for each dust category. Do this for the SO2 model and the NO
model. Is there evidence of potential
interaction between DUST and SO2?
Between DUST and NO? Explain.
5.
In order to
include interaction and polynomial terms in your regression model, you will
need to create those terms using a data step.
PROC REG does not allow you to directly specify higher order terms in
the MODEL statement. Run the SAS code
provided in lab to create the higher order terms. Then, fit a regression model to the data
using the following MODEL statement in PROC REG:
MODEL
co = so2 no dust wind so2no so2dust
so2wind nodust nowind dustwind wind2 ;
Are there any terms which appear to be
insignificant in this regression model?
Which ones? Give evidence from
the SAS output to support your answer.
6.
Using the stepwise model selection technique, wind and wind2
do not enter into the model. So, we’ll
drop wind, wind2, and the interactions with wind from the model. Now consider the following model:
MODEL CO = SO2 NO DUST SO2NO SO2DUST NODUST ;
Run PROC REG for this model using SELECTION =
FORWARD, SELECTION = BACKWARD, and SELECTION = STEPWISE. Report the final model for each of these
selection methods. Are they the same?
Now, try changing the slentry
and slstay criterion to 0.01 for the STEPWISE
selection method. Does your final model
change? Why?
7.
Using the MODEL statement in question 6, run PROC REG again
and include the INFLUENCE, R, VIF, SS1, and SS2 options in the model
statement. Does it appear that there are
any outliers in the data set? Give
evidence from the output to support your answer. Do the variance inflation factors indicate
that there is evidence of multicollinearity? Verify that the sequential sums of squares
partition the model sums of squares into components associated with
sequentially adding each variable to the model.
8.
Use PROC GLM to compute the F-statistics and p-values for the
full vs. reduced model tests based on the partial sums of squares. State the null and alternative hypotheses
associated with the F-test for NO. Is NO
a significant predictor of CO in the presence of the other variables in this
model? Now, report the t-statistic and
associated p-value that could also be used to test this hypothesis. Verify that the t-test and F-test are
equivalent by showing that F = t2.
9.
In question 4, we saw
evidence of potential interactions between SO2 and DUST and between NO and
DUST. Using either a t-test or an
F-test, determine whether the interaction between SO2 and DUST is significant
in this model. Do the same thing for the
interaction between NO and DUST.