Multiple Regression – Part II

Multicollinearity

Multicollinearity occurs when there is a high degree of correlation among several of the independent variables.

Primary effects of multicollinearity:

• Inflates the variance of the estimated partial regression coefficients
• Invalidates the traditional interpretation of the partial regression coefficients

Recognizing Multicollinearity:

• High correlation between 2 or more independent variables
• Large changes in the values of the estimated regression coefficients when a variable is added to or deleted from the model
• Estimated regression coefficients with a sign that is opposite of that which would be commonly expected
• Non-significant t-tests for the regression coefficients of important independent variables
• Wide confidence intervals for the regression coefficients of important independent variables
• Variance Inflation Factor (VIF) – If the maximum VIF is greater than 10, then it is likely that there is significant multicollinearity present.

Remedial Measures:

• Remove some of the correlated variables
• Principal components regression
• Ridge regression

Example:

US Navy Bachelor Officers Quarters data (BOQ)

Goal:  predict the number of manhours required to operate each establishment

SAS Code to create data set and run regression:

data boq ;

input id \$ occup checkin hours common wings cap rooms manh ;

if id = 'W' then delete ;

cards ;

A 2 4 4 1.26 1 6 6 180.23

B 3 1.58 40 1.25 1 5 5 182.61

C 16.6 23.78 40 1 1 13 13 164.38

D 7 2.37 168 1 1 7 8 284.55

E 5.3 1.67 42.5 7.79 3 25 25 199.92

F 16.5 8.25 168 1.12 2 19 19 267.38

G 25.89 3 40 0 3 36 36 999.09

H 44.42 159.75 168 .6 18 48 48 1103.24

I 39.63 50.86 40 27.37 10 77 77 944.21

J 31.92 40.08 168 5.52 6 47 47 931.84

K 97.33 255.08 168 19 6 165 130 2268.06

L 56.63 373.42 168 6.03 4 36 37 1489.5

M 96.67 206.67 268 17.86 14 120 120 1891.7

N 54.58 207.08 168 7.77 6 66 66 1387.82

O 113.88 981 168 24.48 6 166 179 3559.92

P 149.58 233.83 168 31.07 14 185 202 3115.29

Q 134.32 145.82 168 25.99 12 192 192 2227.76

R 188.74 937 168 45.44 26 237 237 4804.24

S 110.24 410 168 20.05 12 115 115 2628.32

T 96.83 677.33 168 20.31 10 302 210 1880.84

U 102.33 288.83 168 21.01 14 131 131 3036.63

V 274.92 695.25 168 46.63 58 363 363 5539.98

W 811.08 714.33 168 22.76 17 242 242 3534.49

X 384.5 1473.66 168 7.36 24 540 453 8266.77

Y 95 368 168 30.26 9 292 196 1845.89

;

proc reg data=boq corr ;

model manh = occup checkin hours common wings cap rooms / vif ;

run ;

quit ;

SAS Output:

The SAS System        21:08 Monday, February 10, 2003   2

The REG Procedure

Correlation

Variable             occup           checkin             hours            common

occup               1.0000            0.8571            0.4421            0.5688

checkin             0.8571            1.0000            0.4018            0.4640

hours               0.4421            0.4018            1.0000            0.3592

common              0.5688            0.4640            0.3592            1.0000

wings               0.7668            0.5460            0.3581            0.6827

cap                 0.9270            0.8452            0.4166            0.5878

rooms               0.9708            0.8545            0.4373            0.6579

manh                0.9808            0.9027            0.4344            0.5653

Correlation

Variable             wings               cap             rooms              manh

occup               0.7668            0.9270            0.9708            0.9808

checkin             0.5460            0.8452            0.8545            0.9027

hours               0.3581            0.4166            0.4373            0.4344

common              0.6827            0.5878            0.6579            0.5653

wings               1.0000            0.6722            0.7581            0.7323

cap                 0.6722            1.0000            0.9785            0.8900

rooms               0.7581            0.9785            1.0000            0.9428

manh                0.7323            0.8900            0.9428            1.0000

The SAS System        21:08 Monday, February 10, 2003   3

The REG Procedure

Model: MODEL1

Dependent Variable: manh

Analysis of Variance

Sum of           Mean

Source                   DF        Squares         Square    F Value    Pr > F

Model                     7       87506375       12500911     155.38    <.0001

Error                    16        1287285          80455

Corrected Total          23       88793659

Root MSE            283.64643    R-Square     0.9855

Dependent Mean     2050.00708    Adj R-Sq     0.9792

Coeff Var            13.83636

Parameter Estimates

Parameter       Standard                              Variance

Variable     DF       Estimate          Error    t Value    Pr > |t|      Inflation

Intercept     1      198.88441      140.96751       1.41      0.1774              0

occup         1       21.21732        4.28692       4.95      0.0001       43.88352

checkin       1        1.42972        0.32819       4.36      0.0005        4.50284

hours         1       -0.34814        1.03073      -0.34      0.7399        1.28982

common        1        8.03540        8.41280       0.96      0.3537        4.06356

wings         1       -5.32923        9.42088      -0.57      0.5795        3.79988

cap           1       -4.00425        3.28993      -1.22      0.2412       56.57171

rooms         1        0.14603        6.79523       0.02      0.9831      178.92034

Outlier Detection

Outliers may not be immediately obvious from the residual plots.  The following influence statistics may be useful for outlier detection:

Studentized Residuals

Hat Matrix

DFFITS

Cook’s D

DFBETAS

SAS Code (INFLUENCE and R options in the model statement):

proc reg data=boq ;

model manh = occup checkin hours common wings cap rooms / influence r ;

run ;

quit ;

SAS Output:

The REG Procedure

Model: MODEL1

Dependent Variable: manh

Output Statistics

Dep Var Predicted    Std Error           Std Error  Student                   Cook's

Obs      manh     Value Mean Predict  Residual  Residual Residual   -2-1 0 1 2           D

1  180.2300  227.2915     136.7833  -47.0615     248.5   -0.189 |      |      |    0.001

2  182.6100  236.2935     111.0639  -53.6835     261.0   -0.206 |      |      |    0.001

3  164.3800  523.7144     114.6708 -359.3344     259.4   -1.385 |    **|      |    0.047

4  284.5500  268.1507     107.8024   16.3993     262.4   0.0625 |      |      |    0.000

5  199.9200  249.0805     109.5588  -49.1605     261.6   -0.188 |      |      |    0.001

6  267.3800  427.3125     106.0943 -159.9325     263.1   -0.608 |     *|      |    0.008

7  999.0900  583.6810     121.1537  415.4090     256.5    1.620 |      |***   |    0.073

8      1103      1035     171.6435   68.2710     225.8    0.302 |      |      |    0.007

9  944.2100  968.0711     148.7522  -23.8611     241.5  -0.0988 |      |      |    0.000

10  931.8400  706.0006      99.8980  225.8394     265.5    0.851 |      |*     |    0.013

11      2268      2049     126.3167  218.9069     254.0    0.862 |      |*     |    0.023

12      1490      1764     153.3497 -274.7074     238.6   -1.151 |    **|      |    0.068

13      1892      2058     149.7979 -166.3589     240.9   -0.691 |     *|      |    0.023

14      1388      1370      82.4553   17.4975     271.4   0.0645 |      |      |    0.000

15      3560      3485     253.9162   74.5703     126.4    0.590 |      |*     |    0.175

16      3115      3112     178.7169    3.1305     220.3   0.0142 |      |      |    0.000

17      2228      2603     170.9561 -375.1418     226.3   -1.657 |   ***|      |    0.196

18      4804      4797     204.5948    7.4635     196.5   0.0380 |      |      |    0.000

19      2628      2719     129.9785  -90.7250     252.1   -0.360 |      |      |    0.004

20      1881      2095     211.7136 -213.7154     188.8   -1.132 |    **|      |    0.202

21      3037      2313      74.5211  723.3295     273.7    2.643 |      |***** |    0.065

22      5540      5633     256.1430  -92.5620     121.8   -0.760 |     *|      |    0.319

23      8267      8240     274.4380   26.2878    71.688    0.367 |      |      |    0.246

24      1846      1737     209.5631  109.1391     191.2    0.571 |      |*     |    0.049

Output Statistics

Hat Diag         Cov

Obs  RStudent           H       Ratio      DFFITS

1   -0.1836      0.2325      2.1448     -0.1011

2   -0.1994      0.1533      1.9378     -0.0849

3   -1.4295      0.1634      0.7211     -0.6319

4    0.0605      0.1444      1.9549      0.0249

5   -0.1821      0.1492      1.9352     -0.0763

6   -0.5956      0.1399      1.6161     -0.2402

7    1.7152      0.1824      0.4892      0.8102

8    0.2936      0.3662      2.5256      0.2231

9   -0.0957      0.2750      2.3003     -0.0589

10    0.8430      0.1240      1.3211      0.3172

11    0.8547      0.1983      1.4290      0.4251

12   -1.1639      0.2923      1.1856     -0.7480

13   -0.6789      0.2789      1.8242     -0.4222

14    0.0624      0.0845      1.8267      0.0190

15    0.5774      0.8014      7.0757      1.1598

16    0.0138      0.3970      2.7788      0.0112

17   -1.7633      0.3633      0.5832     -1.3318

18    0.0368      0.5203      3.4908      0.0383

19   -0.3499      0.2100      1.9877     -0.1804

20   -1.1430      0.5571      1.9400     -1.2819

21    3.4092      0.0690      0.0183      0.9283

22   -0.7492      0.8155      6.7693     -1.5749

23    0.3566      0.9361     24.5230      1.3650

24    0.5585      0.5459      3.1298      0.6123

Output Statistics

----------------------------------------DFBETAS---------------------------------------

Obs Intercept      occup    checkin      hours     common      wings        cap      rooms

1   -0.1008     0.0004    -0.0060     0.0771     0.0086    -0.0053     0.0002     0.0006

2   -0.0829    -0.0016    -0.0036     0.0521     0.0095    -0.0052    -0.0018     0.0033

3   -0.6076    -0.1569     0.0220     0.4046    -0.0086     0.0252    -0.0757     0.1198

4    0.0018    -0.0034    -0.0016     0.0146    -0.0084     0.0009    -0.0026     0.0025

5   -0.0690     0.0192     0.0001     0.0473     0.0092    -0.0024     0.0117    -0.0155

6   -0.0169     0.0297     0.0356    -0.1389     0.0913    -0.0066     0.0283    -0.0290

7    0.7144    -0.1507    -0.1499    -0.4508    -0.3388    -0.0016    -0.1994     0.2313

8    0.0176    -0.0550     0.0532     0.0725    -0.1136     0.1780     0.0064     0.0010

9   -0.0354    -0.0103     0.0075     0.0392    -0.0335     0.0054    -0.0103     0.0115

10    0.0104    -0.1051    -0.0626     0.1808    -0.1555     0.0399    -0.0826     0.0998

11   -0.0035     0.2715    -0.1668     0.0596     0.2615    -0.1730     0.2616    -0.2598

12   -0.0879    -0.4336    -0.3125    -0.1478    -0.2745    -0.0276    -0.2929     0.4634

13    0.2248    -0.0230     0.1097    -0.3662     0.0175    -0.0031     0.0166    -0.0006

14    0.0014    -0.0009     0.0014     0.0102    -0.0046    -0.0003    -0.0031     0.0016

15   -0.0166    -0.6731     0.7411     0.0917    -0.2709    -0.2601    -0.7668     0.7072

16   -0.0011    -0.0005    -0.0065     0.0008     0.0006    -0.0068    -0.0058     0.0052

17    0.1282     0.1628     0.9503    -0.1769     0.0939     0.7478     0.5877    -0.6366

18   -0.0009     0.0140     0.0197    -0.0064     0.0264     0.0018     0.0082    -0.0157

19   -0.0090    -0.1397    -0.0228    -0.0125    -0.1213     0.0199    -0.0791     0.1244

20    0.0491     0.7450    -0.3763    -0.1661     0.3003    -0.2561    -0.1999    -0.2128

21   -0.0279     0.2444    -0.2003     0.2720     0.2835    -0.0891    -0.0187    -0.1039

22    0.0961     0.2820     0.0883     0.1759     0.2386    -1.0713     0.1386    -0.1921

23    0.1100     0.3246    -0.0004    -0.2146    -0.3676    -0.2106     0.1308    -0.0977

24   -0.0444     0.0253    -0.1088     0.0362     0.2340    -0.0297     0.3860    -0.2228