Homework 1

Linear Regression

The data for this problem illustrate the relationship between the average monthly outdoor temperature (in degrees Fahrenheit) and the average monthly gas consumption (in therms) for a household.  Use the attached SAS output to answer the following questions.

1. Based on the first scatterplot, does it appear that it would be appropriate to fit a linear regression model to these data?  Why or why not?

Yes, it appears that it is appropriate to fit a linear regression model to these data.  The data show a fairly strong decreasing linear trend.  The correlation (r) is -0.9030, which also supports our findings from the scatter plot.

1. A regression model was fit to these data using SAS.  Use the output from PROC REG to write down the prediction equation for this regression model.

Based on the values from the parameter estimates section of the output, the prediction equation is given by:

Y = 17.37  0.24x

1. What is the value of the slope for the model?  What does the slope tell you about the relationship between average temperature and average gas consumption?

The value of the slope is -0.24.  This value tells us that as temperature increases by 1 degree, gas consumption should decrease by 0.24 therms.

1. Using information from the SAS output and table A.3 in Steele & Torrie, construct a 95% confidence interval for the slope.  How do you interpret this confidence interval in non-statistical terms?

Beta1 +/- t(0.025, 14)*s(beta1)

-0.24 +/- (2.145 )(0.03)

-0.24 +/- 0.06435

(-0.30435 , -0.17565)

We are 95% confident that the true value of the slope parameter is between -0.30435

and  -0.17565).

1. Does it appear that there is a significant linear relationship between average temperature and average gas consumption?  Use the SAS output to conduct a hypothesis test to support your answer.   Be sure to state the hypotheses and report the t-statistic and p-value for your test.

To answer this question, we need to test the hypothesis that the slope is equal to 0.

Ho:  beta1 = 0

Ha:  beta1 not equal 0

The value of the test statistic for this test is t=-7.86.  The p-value for the test is given to be <0.0001.  Therefore, at the 5% significance level, we see that the p-value is less than the significance level (e.g., 0.0001 < 0.05), and we reject the null hypothesis.  Thus, we conclude that there is a significant linear relationship between average gas consumption and average temperature.

1. Assuming that this model is correct, how much gas should we expect to use, on average, in a month when the average temperature was 45 degrees Fahrenheit?

Plug 45 into the prediction equation.

Y = 17.37  0.24*45 = 17.37  10.8 = 6.57

1. Would it be appropriate to use this model to predict the average gas usage for a month when the average temperature was 85 degrees?  Why or why not?

No.  That would be extrapolation since the data that were used to construct the model only had average temperatures ranging from 29 degrees to 71 degrees.  85 degrees is outside of the range of these data.

1. Report the coefficient of determination (R2) for this model and explain (in non-statistical terms) what it tells you about the relationship between average temperature and average gas consumption.

The value of R-square is 0.8154.  This value tells us that 81.54% of the variability in average gas consumption can be explained by the linear relationship with average temperature.

1. Examine the plot of residuals vs. predicted values and residuals vs. time.  Does it appear that any of our assumptions have been violated or that there is need for further investigation before accepting this as our final regression model?  Explain.

The plot of residuals vs. predicted values allows us to check for violations of the following assumptions:

·        Outliers

·        Non-constant variance

·        Non-linearity

If the residuals are randomly scattered about 0, then the regression assumptions have not been violated.  In this plot, it appears that there may be a slight megaphone effect which indicates that there may be non-constant variance.

The sequence plot of the residuals vs. time allows us to look for the possibility of correlated error terms.  Patterns or trends in this plot indicate the presence of correlation.  Although there is a lot of noise in this plot, it does appear that there may be a cyclical pattern to the residuals.  This pattern suggests that we may need to apply some time series methods to adjust for temporal correlation before we can fit a regression model to the data.

The SAS System         13:35 Monday, January 20, 2003   9

Plot of avgas*avtemp.  Legend: A = 1 obs, B = 2 obs, etc.

12 

         A            A







           A



10 

                    A

       A



       A





8                       A



                         A               A

avgas 







6 

                                      A



                           A







4 



                                                          A

                                                        A







2                                                   A



                                                                             A



                                                                                   A





0 



25       30       35       40       45       50       55       60       65       70       75

avtemp

The SAS System         13:35 Monday, January 20, 2003   8

Obs    time    avtemp    avgas

1      0      29        8.9

2      1      30       11.6

3      2      31       10.7

4      3      37       11.6

5      4      48        7.5

6      5      57        3.5

7      6      68        1.5

8      8      71        0.8

9     10      53        1.9

10     11      40        5.0

11     12      39        7.3

12     13      29        9.3

13     14      36        9.7

14     15      37        7.9

15     16      46        5.8

16     17      56        3.2

The CORR Procedure

2  Variables:    avgas    avtemp

Pearson Correlation Coefficients, N = 16

Prob > |r| under H0: Rho=0

avgas        avtemp

avgas        1.00000      -0.90300

avtemp      -0.90300       1.00000

The REG Procedure

Model: MODEL1

Dependent Variable: avgas

Analysis of Variance

Sum of           Mean

Source                   DF        Squares         Square    F Value    Pr > F

Model                     1      161.51498      161.51498      61.85    <.0001

Error                    14       36.56252        2.61161

Corrected Total          15      198.07750

Root MSE              1.61605    R-Square     0.8154

Dependent Mean        6.63750    Adj R-Sq     0.8022

Coeff Var            24.34723

Parameter Estimates

Parameter       Standard

Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1       17.37277        1.42362      12.20      <.0001

avtemp        1       -0.24295        0.03089      -7.86      <.0001

The SAS System         13:35 Monday, January 20, 2003  12

Plot of res*pred.  Legend: A = 1 obs, B = 2 obs, etc.

4 









                                                        A

3 











2 

                                       A



                                                                    A



R    

e  1                                                           A

s                                                                      A

i       A    A

d    

u    

a    

l  0                          A



                                          A

                          A                             A

                                                     A



-1                                                                      A





                                                                     A





-2 







                               A                    A



-3 



0            2            4            6            8           10           12

Predicted Value of avgas

Plot of res*time.  Legend: A = 1 obs, B = 2 obs, etc.

4 









                 A

3 











2 

                      A



       A



R    

e  1                                                                         A

s                A

i                                    A         A

d    

u    

a    

l  0                            A



                                                                                  A

                                                                             A         A

                                                              A



-1                                                                    A





  A





-2 







                                                    A    A



-3 



0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17

time