Site hosted by Angelfire.com: Build your free website today!

Homework 3


 This homework is a modified version of Homework 1 from Lab 3.
Questions  

After lab 3, you should be able to plot the data (always a good idea to do before
doing anything else), evaluate how the variables are correlated
with one another, and do a regression to determine which variables
are important.  Note that SAS produces a LOT of output.  Please
label carefully or (preferably) cut out and attach relevant
parts (or file output, then pull it into program
editor to delete unnecessary parts)



  A moving company gives you the following dataset:

        Weight         Distance Moved          Damage
      (1,000 lbs.)     (1,000 miles)          (dollars)

          4                  1.5                 160
          3                  2.2                 112
         1.6                 1.0                  69
         1.2                 2.0                  90
         3.4                 0.8                 123
         4.8                 1.6                 186
         3.2                 0.9                 120


1. Create a SAS data set called hw3.

2. Plot damage*weight.  Use appropriate titles for your plot.
    Does there appear to be a relationship?
    Plot damage*distance.  Again, does there appear to be a
    relationship?  Based on these results what do you expect to
    find from your correlation analysis and regression analysis?
    Run PROC CORR and comment on how the correlation values relate
    to what you see in the plots.
 
    There appears to be a strong linear relationship between damage
    and weight.  The relationship between damage and distance appears
    to be fairly weak.  These results are consistent with the results
    from PROC CORR.  The correlation between damage and weight is 0.94
    indicating that there is a very strong, positive, linear 
    association between damage and weight.  The correlation of 0.08
    between damage and distance is very close to 0, indicating that
    there is almost no linear relationship between damage and distance.

3.  Regress damage on weight and distance.  Assume the following
    equation:  y=b0 + b1x1 + b2x2 + e.  Report the prediction
    equation (based on the SAS output) and report the MSE.
 
    Prediction equation:
    Damage = 13.55 + 30.1*weight + 12.7*distance
    The MSE for the model is 190.58.

4.  You are planning to move from Raleigh to Kansas City
    (about 1100 miles) and the weight of load is about
    2,000 lbs.  How much damage (in dollars) do you
    expect to incur? Construct a 95% prediction interval

        for your answer. (Hint:  You can answer both of these
    questions by creating a new data set using the following
    data step code and then running a regression (with the CLI
    option) on the new data set.)
   
    Data Step Code for Question 4 (Assumes that your original
    data set is called hw3):

    DATA NEWVAL ;
      INPUT WEIGHT DISTANCE DAMAGE ;
      CARDS ;
      2 1.1 .
      ;
    DATA BOTH ;
      SET HW3 NEWVAL ;
   
    Now run the regression on the data set BOTH.

 

    From the SAS output for observation 8, the predicted value is

87.73.        That is, we would expect about $87.74 dollars worth of

    damage when moving from Raleigh to Kansas City.  The upper limit

    for the 95% prediction interval is 131.9 and the lower limit for

    the 95% prediction interval is 43.5.

5.  Do either weight or distance or both significantly affect damage?
    Suport your answer with the appropriate test results from your

    analyses.

 

    To test the hypothesis

    Ho: Bweight = 0

    Ha: Bweight not = 0

 

    The t-statistic = 6.74 and the p-value = 0.0025

    Therefore, at the 5% significance level, 0.0025 < 0.05, so we

    reject the null hypothesis and conclude that, in the presence

    of distance, weight is a significant predictor of damage.

 

    To test the hypothesis

    Ho: Bdistance = 0

    Ha: Bdistance not = 0


    The t-statistic = 1.23 and the p-value = 0.2850

    Therefore, at the 5% significance level, 0.2850 > 0.05, so we

    fail to reject the null hypothesis and conclude that, in the

    presence of weight, distance is NOT a significant predictor

    of damage.


6.  What percent of the variation seen in damage is explained by the
    regression on weight and distance?

 

    R-square = 0.9195 – That is, approximately 91.95% of the

    variability in damage can be explained by the regression

    relationship with weight and distance.

7.  Is there any evidence of multicollinearity in this regression?

 

    There is slight evidence of multicollinearity that might need

    further investigation.  First, the correlation between the 2

    independent variables is fairly high (0.84).  Also, the

    coefficient on distance in the multiple regression model is 12.7.

    However, if you fit a simple linear regression of damage on

    distance, the coefficient on distance is only 5.9. So, the value of

    the coefficient changes by a fairly large amount.  However, the

    variance inflation factors are not large (none greater than 10).

    So, if multicollinearity does exist, it probably does not have

    much impact on the standard errors of the parameter estimates.