This homework is a modified version of Homework 1 from Lab 3.
After lab 3, you should be able to plot the data (always a good idea to do before
doing anything else), evaluate how the variables are correlated
with one another, and do a regression to determine which variables
are important. Note that SAS produces a LOT of output. Please
label carefully or (preferably) cut out and attach relevant
parts (or file output, then pull it into program
editor to delete unnecessary parts)
A moving company gives you the following dataset:
Weight Distance Moved Damage
(1,000 lbs.) (1,000 miles) (dollars)
4 1.5 160
3 2.2 112
1.6 1.0 69
1.2 2.0 90
3.4 0.8 123
4.8 1.6 186
3.2 0.9 120
1. Create a SAS data set called hw3.
2. Plot damage*weight. Use appropriate titles for your plot.
Does there appear to be a relationship?
Plot damage*distance. Again, does there appear to be a
relationship? Based on these results what do you expect to
find from your correlation analysis and regression analysis?
Run PROC CORR and comment on how the correlation values relate
to what you see in the plots.
There appears to be a strong linear relationship between damage
and weight. The relationship between damage and distance appears
to be fairly weak. These results are consistent with the results
from PROC CORR. The correlation between damage and weight is 0.94
indicating that there is a very strong, positive, linear
association between damage and weight. The correlation of 0.08
between damage and distance is very close to 0, indicating that
there is almost no linear relationship between damage and distance.
3. Regress damage on weight and distance. Assume the following
equation: y=b0 + b1x1 + b2x2 + e. Report the prediction
equation (based on the SAS output) and report the MSE.
Damage = 13.55 + 30.1*weight + 12.7*distance
The MSE for the model is 190.58.
4. You are planning to move from
to Raleigh Kansas City
(about 1100 miles) and the weight of load is about
2,000 lbs. How much damage (in dollars) do you
expect to incur? Construct a 95% prediction interval
for your answer. (Hint: You can answer both of these
questions by creating a new data set using the following
data step code and then running a regression (with the CLI
option) on the new data set.)
Data Step Code for Question 4 (Assumes that your original
data set is called hw3):
DATA NEWVAL ;
INPUT WEIGHT DISTANCE DAMAGE ;
2 1.1 .
DATA BOTH ;
SET HW3 NEWVAL ;
Now run the regression on the data set BOTH.
From the SAS output for observation 8, the predicted value is
87.73. That is, we would expect about $87.74 dollars worth of
damage when moving
for the 95% prediction interval is 131.9 and the lower limit for
the 95% prediction
interval is 43.5.
5. Do either weight or distance or both significantly affect damage?
Suport your answer with the appropriate test results from your
To test the hypothesis
Ho: Bweight = 0
Ha: Bweight not = 0
The t-statistic = 6.74 and the p-value = 0.0025
Therefore, at the 5% significance level, 0.0025 < 0.05, so we
reject the null hypothesis and conclude that, in the presence
of distance, weight is a significant predictor of damage.
To test the hypothesis
Ho: Bdistance = 0
Ha: Bdistance not = 0
The t-statistic = 1.23 and the p-value = 0.2850
Therefore, at the 5% significance level, 0.2850 > 0.05, so we
fail to reject the null hypothesis and conclude that, in the
presence of weight, distance is NOT a significant predictor
6. What percent of the variation seen in damage is explained by the
regression on weight and distance?
R-square = 0.9195 – That is, approximately 91.95% of the
variability in damage can be explained by the regression
weight and distance.
7. Is there any evidence of multicollinearity in this regression?
There is slight evidence of multicollinearity that might need
further investigation. First, the correlation between the 2
independent variables is fairly high (0.84). Also, the
coefficient on distance in the multiple regression model is 12.7.
However, if you fit a simple linear regression of damage on
distance, the coefficient on distance is only 5.9. So, the value of
the coefficient changes by a fairly large amount. However, the
variance inflation factors are not large (none greater than 10).
So, if multicollinearity does exist, it probably does not have
much impact on the standard errors of the parameter estimates.