Site hosted by Angelfire.com: Build your free website today!

Homework 3


This homework is a modified version of Homework 1 from Lab 3.
Questions

After lab 3, you should be able to plot the data (always a good idea to do before
doing anything else), evaluate how the variables are correlated
with one another, and do a regression to determine which variables
are important. Note that SAS produces a LOT of output. Please
label carefully or (preferably) cut out and attach relevant
parts (or file output, then pull it into program
editor to delete unnecessary parts)



A moving company gives you the following dataset:

Weight Distance Moved Damage
(1,000 lbs.) (1,000 miles) (dollars)

4 1.5 160
3 2.2 112
1.6 1.0 69
1.2 2.0 90
3.4 0.8 123
4.8 1.6 186
3.2 0.9 120


1. Create a SAS data set called hw3.

2. Plot damage*weight. Use appropriate titles for your plot.
 Does there appear to be a relationship?
 Plot damage*distance. Again, does there appear to be a
 relationship? Based on these results what do you expect to
 find from your correlation analysis and regression analysis?
 Run PROC CORR and comment on how the correlation values relate
to what you see in the plots.
 
 There appears to be a strong linear relationship between damage
 and weight. The relationship between damage and distance appears
 to be fairly weak. These results are consistent with the results
 from PROC CORR. The correlation between damage and weight is 0.94
 indicating that there is a very strong, positive, linear 
association between damage and weight. The correlation of 0.08
 between damage and distance is very close to 0, indicating that
 there is almost no linear relationship between damage and distance.

3. Regress damage on weight and distance. Assume the following
 equation: y=b0 + b1x1 + b2x2 + e. Report the prediction
equation (based on the SAS output) and report the MSE.
 
 Prediction equation:
 Damage = 13.55 + 30.1*weight + 12.7*distance
 The MSE for the model is 190.58.

4. You are planning to move from Raleigh to Kansas City
 (about 1100 miles) and the weight of load is about
 2,000 lbs. How much damage (in dollars) do you
 expect to incur? Construct a 95% prediction interval

for your answer. (Hint: You can answer both of these
questions by creating a new data set using the following
data step code and then running a regression (with the CLI
option) on the new data set.)

Data Step Code for Question 4 (Assumes that your original
data set is called hw3):

DATA NEWVAL ;
INPUT WEIGHT DISTANCE DAMAGE ;
CARDS ;
2 1.1 .
;
DATA BOTH ;
SET HW3 NEWVAL ;

Now run the regression on the data set BOTH.

 

From the SAS output for observation 8, the predicted value is

87.73.        That is, we would expect about $87.74 dollars worth of

damage when moving from Raleigh to Kansas City. The upper limit

for the 95% prediction interval is 131.9 and the lower limit for

the 95% prediction interval is 43.5.

5. Do either weight or distance or both significantly affect damage?
Suport your answer with the appropriate test results from your

analyses.

 

To test the hypothesis

Ho: Bweight = 0

Ha: Bweight not = 0

 

The t-statistic = 6.74 and the p-value = 0.0025

Therefore, at the 5% significance level, 0.0025 < 0.05, so we

reject the null hypothesis and conclude that, in the presence

of distance, weight is a significant predictor of damage.

 

To test the hypothesis

Ho: Bdistance = 0

Ha: Bdistance not = 0


The t-statistic = 1.23 and the p-value = 0.2850

Therefore, at the 5% significance level, 0.2850 > 0.05, so we

fail to reject the null hypothesis and conclude that, in the

presence of weight, distance is NOT a significant predictor

of damage.


6. What percent of the variation seen in damage is explained by the
regression on weight and distance?

 

R-square = 0.9195 That is, approximately 91.95% of the

variability in damage can be explained by the regression

relationship with weight and distance.

7. Is there any evidence of multicollinearity in this regression?

 

There is slight evidence of multicollinearity that might need

further investigation. First, the correlation between the 2

independent variables is fairly high (0.84). Also, the

coefficient on distance in the multiple regression model is 12.7.

However, if you fit a simple linear regression of damage on

distance, the coefficient on distance is only 5.9. So, the value of

the coefficient changes by a fairly large amount. However, the

variance inflation factors are not large (none greater than 10).

So, if multicollinearity does exist, it probably does not have

much impact on the standard errors of the parameter estimates.