Multiple Regression  Part 1

Estimating the parameters of a Multiple Linear Regression Model using SAS

The following data is an excerpt from a public data set containing information on employees, sales, and profits figures for several major companies in 1993.  This data set is contained in the SAS/Insight sample data library (SASUSER.BUSINESS) and was used to produce this example.  A partial listing of the data follows:

OBS  COMPANY               NATION     INDUSTRY       EMPLOYS         SALES       PROFITS

1    Lucas Industries      Britain    Automobiles       46          \$3,864           \$39

2    GKN                   Britain    Automobiles       27          \$3,037           \$58

3    GEC                   Britain    Electronics       93          \$9,491          \$907

4    Grand Metropolitan    Britain    Food              87         \$11,164          \$629

5    Unilever              Britain    Food             303         \$41,843        \$1,945

6    Allied-Lyons          Britain    Food              71          \$7,231          \$488

7    Guinness              Britain    Food              23          \$7,006          \$650

8    Hillsdown Holdings    Britain    Food              43          \$6,900          \$142

9    Assoc. British Foods  Britain    Food              50          \$6,798          \$353

10    Tate & Lyle          Britain    Food              16          \$5,633          \$227

11    Cadbury Schweppes    Britain    Food              39          \$5,594          \$365

12    United Biscuits      Britain    Food              39          \$5,174          \$101

13    Harrisons Crossfield Britain    Food              31          \$3,319           \$91

14    Unigate              Britain    Food              26          \$2,978          \$110

15    Royal Dutch / Shell  Britain    Oil              116         \$95,136        \$4,504

16    British Petroleum    Britain    Oil               73         \$52,485          \$924

17    Renault              France     Automobiles      140         \$29,977          \$189

18    Peugeot              France     Automobiles      144         \$25,670         \$-258

19    Valeo                France     Automobiles       25          \$3,572          \$125

20    Alcatel Alsthom      France     Electronics      197         \$27,600        \$1,248

21    Thomson              France     Electronics      100         \$11,917             .

22    Schneider            France     Electronics       91          \$9,953           \$52

23    Danone Group         France     Food              56         \$12,377          \$604

24    Besnier              France     Food              12          \$4,103           \$78

25    ELF Aquitaine        France     Oil               94         \$37,016          \$189

26    Total                France     Oil               50         \$23,917          \$523

27    Daimler-Benz         Germany    Automobiles      366         \$59,102          \$364

28    Volkswagen           Germany    Automobiles      252         \$46,312       \$-1,232

29    Robert Bosch         Germany    Automobiles      157         \$19,634          \$258

30    BMW                  Germany    Automobiles       71         \$17,546          \$317

31    MAN                  Germany    Automobiles       61         \$12,106          \$142

32    ZF Friedrichshafen   Germany    Automobiles       27          \$3,167           \$34

33    Siemens              Germany    Electronics      391         \$50,381        \$1,113

34    Veba Oel             Germany    Oil                7          \$6,246           \$-1

35    Nissan Motor         Japan      Automobiles      143         \$53,760         \$-805

36    Toyota Motor         Japan      Automobiles      109         \$85,283        \$1,474

37    Honda Motor          Japan      Automobiles       91         \$35,798          \$220

38    Mitsubishi Motors    Japan      Automobiles       46         \$27,311           \$52

39    Mazda Motor          Japan      Automobiles       33         \$20,279         \$-454

40    Isuzu Motors         Japan      Automobiles       13         \$13,731          \$-38

We will use this data to predict profits (in millions of dollars) from a companys sales (in millions of dollars) and the number of employees of the company (in thousands).

1.      Create a scatterplot matrix illustrating the relationships among the variables:

2.      Create Partial Regression Plots for each of the variables.

DEFINITION:  A partial regression plot (sometimes called an added-variable plot) displays the relationship between the response variable, y, and an explanatory variable, xi, after removing the effect of the other explanatory variables.

You can create partial regression plots in SAS automatically by using the REG procedures with the PARTIAL option in the model statement.  The code and output for this data set is given below:

model profits = employs sales / partial ;

run ;

The REG Procedure

Model: MODEL1

Partial Regression Residual Plot



PROFITS                                                                                        

                                                                                       

3000                                                                                        

    1                                 1                                                

                                                                                       

                                                                                       

        1                          1                                                   

2000                                                                                        

                                 1  1                                                  

P                                                                                                

r                                                                                                

o                                       1                                                        

f    1000                                                           1                        1   

i                                         1                                                      

t                           1               2      1 11     1                                    

s                                         1  1 2 1111    11                                      

                        1        21 1112                                               

i       0                                12*555714                                               

n                                    1 1413421131    1    1                  1                   

                     1        1 11   1                                                 

\$                     1                1 1 1   1   1 1                                           

                               1     2  1  1                               1           

M   -1000                               1                                                        

i                                                                    1                           

l                                                    1                                           

l                                                                                                

i                                                                                                

o   -2000                                                                                        

n                                        1                                                       

s                                                          1                                     

                                                                                       

                                                                                       

-3000                                                                                        

                                                                                       

                                                                                       

                                 1                                                     

                                                                                       

-4000                                                                                        

                                                                                       

                                                                                       



-250   -200   -150   -100    -50     0     50     100    150    200    250    300    350

Employees in Thousands   EMPLOYS

The SAS System         15:48 Sunday, February 9, 2003   4

The REG Procedure

Model: MODEL1

Partial Regression Residual Plot



PROFITS                                                                                        

                                                                                       

6000                                                                                        

                                                                                       

                                                                                       

                                                                                       

                                                                            1          

                                                                                       

                                                                                       

P    4000                                                                         1              

r                                                                                                

o                                                       1                                        

f                                                                                                

i                                                                                                

t                                                                                                

s                                                      1                                         

2000                                      1 1                                               

i                                                                   1                            

n                                                    1                                           

                                                              1                        

\$                                                  1       1                  1                  

                              1     1                  1                               

M                                         221 221   1                                            

i       0            1                 11 1153444221 1                                           

l                                          2*59*31311 12                                         

l                                    1   11    122 1                                             

i                                         1  2      1 1                                          

o                                          1                                                     

n                                                          1                                     

s                                                                                                

-2000                                                                                        

                                     1                                                 

                                                                                       

                                                                                       

                                              1                                        

                                                                                       

                                                                                       

-4000                                                                                        

                                                                                       

                                                                                       



-60000    -40000    -20000       0       20000     40000     60000     80000

Sales in \$ Millions   SALES
Look at the two simple regression models:

Regression of PROFITS on SALES:

Regression of PROFITS on EMPLOYS:

Note:  The output shown above is from SAS/Insight.  You could also fit these 2 regression models using the following SAS code:

model profits = sales ;

plot profits*sales ;

model profits = employs ;

plot profits*employs ;
Now look at the partial correlation matrix and output from regressing PROFITS on EMPLOYS and SALES.

The SAS System

Correlation

CORR              EMPLOYS             SALES           PROFITS

EMPLOYS            1.0000            0.7298            0.3619

SALES              0.7298            1.0000            0.5969

PROFITS            0.3619            0.5969            1.0000

Model: MODEL1

Dependent Variable: PROFITS    Profits in \$ Millions

Analysis of Variance

Sum of         Mean

Source          DF      Squares       Square      F Value       Prob>F

Model            2  42711485.23 21355742.615       35.503       0.0001

Error          122  73385789.57 601522.86532

C Total        124  116097274.8

Root MSE     775.57905     R-square       0.3679

Dep Mean     442.16000     Adj R-sq       0.3575

C.V.         175.40688

Parameter Estimates

Parameter      Standard    T for H0:

Variable  DF      Estimate         Error   Parameter=0    Prob > |T|

INTERCEP   1    -13.325360   90.80843729        -0.147        0.8836

EMPLOYS    1     -1.553555    1.03671710        -1.499        0.1366

SALES      1      0.030346    0.00448738         6.762        0.0001

3.      Based on the computer printout, we can write the prediction equation as follows:

Ŷ =  -13.3254  -  1.5536 X1  +  0.0303 X2

4.      Inference for the multiple regression model:

Overall test for the model  basically asking Is this model worthwhile?

Hypothesis test:

Ho: b1 = b2 =  = bk

Ha: At least one bi not equal to 0

Test Statistic:

F =  MSR / MSE

So, if F is large, then the MSR is large with respect to the error in the model.

Large F (small p-value) says to reject Ho.

Rule:  If the p-value (Prob > F) is small, then we reject the null hypothesis and conclude that at least one of the explanatory variables has some effect on the response variable.

For this example, the value of F is 35.503 and the p-value is 0.0001.  Therefore, we reject the null hypothesis and conclude that either the amount of sales or the number of employees (or both) has some effect on profit.

Can we be more specific and determine which individual variables are significant?

Tests of Independence for Individual Explanatory Variables

Does xi have a significant effect on the response variable in the presence of the other variables?

Hypothesis test:

Ho: bi = 0

Ha: bi Ή 0

Test Statistic:

Example from SAS output:

Rule:  If the p-value (Prob>|T|) is small, then we reject the null hypothesis and conclude that there is a significant relationship between xi and y after controlling for the other explanatory variables in the model.

Constructing confidence intervals for individual partial regression coefficients:

Example from SAS output:

To construct a 95% confidence interval for the partial regression parameter for sales:

t0.025,122 = 1.96

bi = 0.0303

sbi = 0.0045

CI:  0.0303 ± 0.00882

So, after controlling for the number of employees, we are 95% confident that the true value of the partial regression parameter for sales is between 0.039 and 0.021.

Coefficient of Multiple Determination:  R2

Definition:  R2 is the proportion of the total variation in y explained by the simultaneous predictive power of all of the explanatory variables through the multiple regression model.

R2 = SSR  =  SST  SSE  =  1 - SSE

SST             SST                 SST

Properties of R2:

·        0 <=  R2 <= 1

·        represents the proportional reduction of the total variation in y associated with the use of the set of predictor variables

·        R2 can only increase as more variables are added to the model

Example:

In our example, R2 = 0.3679, so we can say that 36.79% of the variability that we see in profits is due to differences in the amount of sales and the number of employees.

Adjusted R2: SAS also includes an adjusted R2 value as part of its output.  This statistic also looks at the proportion of variation explained by the model.  However, it then adjusts that value based on the number of variables included in the model.

Adj. R-square = 1  {SSE/(n-k-1) / SST/(n-1)}

Use the adjusted R2 value to compare multiple regression models.