Multiple Regression – Part II
Multicollinearity
Multicollinearity occurs when there is a high degree of correlation among several of the independent variables.
Primary effects of multicollinearity:
Recognizing Multicollinearity:
Remedial Measures:
Example:
US Navy Bachelor Officers Quarters data (BOQ)
Goal: predict the number of manhours required to operate each establishment
SAS Code to create data set and run regression:
data boq
;
input
id $ occup checkin hours
common wings cap rooms manh ;
if
id = 'W' then delete
;
cards
;
A
2 4 4 1.26 1 6 6 180.23
B
3 1.58 40 1.25 1 5 5 182.61
C
16.6 23.78 40 1 1 13 13
164.38
D
7 2.37 168 1 1 7 8 284.55
E
5.3 1.67 42.5 7.79 3 25 25 199.92
F
16.5 8.25 168 1.12 2 19 19 267.38
G
25.89 3 40 0 3 36 36 999.09
H
44.42 159.75 168 .6 18 48 48 1103.24
I
39.63 50.86 40 27.37 10 77 77 944.21
J
31.92 40.08 168 5.52 6 47 47 931.84
K
97.33 255.08 168 19 6 165 130 2268.06
L
56.63 373.42 168 6.03 4 36 37 1489.5
M
96.67 206.67 268 17.86 14 120 120 1891.7
N
54.58 207.08 168 7.77 6 66 66 1387.82
O
113.88 981 168 24.48 6 166 179 3559.92
P
149.58 233.83 168 31.07 14 185 202 3115.29
Q
134.32 145.82 168 25.99 12 192 192 2227.76
R
188.74 937 168 45.44 26 237 237 4804.24
S
110.24 410 168 20.05 12 115 115 2628.32
T
96.83 677.33 168 20.31 10 302 210 1880.84
U
102.33 288.83 168 21.01 14 131 131 3036.63
V
274.92 695.25 168 46.63 58 363 363 5539.98
W
811.08 714.33 168 22.76 17 242 242 3534.49
X
384.5 1473.66 168 7.36 24 540 453 8266.77
Y
95 368 168 30.26 9 292 196 1845.89
;
proc reg data=boq corr
;
model
manh = occup checkin hours common wings cap rooms / vif ;
run ;
quit
;
SAS Output:
The SAS
System
The
REG Procedure
Correlation
Variable occup checkin hours common
occup 1.0000 0.8571 0.4421 0.5688
checkin 0.8571 1.0000 0.4018 0.4640
hours 0.4421 0.4018 1.0000 0.3592
common 0.5688 0.4640 0.3592 1.0000
wings 0.7668 0.5460 0.3581 0.6827
cap 0.9270 0.8452 0.4166 0.5878
rooms 0.9708 0.8545 0.4373 0.6579
manh 0.9808 0.9027 0.4344 0.5653
Correlation
Variable wings cap rooms manh
occup 0.7668 0.9270 0.9708 0.9808
checkin 0.5460 0.8452 0.8545 0.9027
hours 0.3581 0.4166 0.4373 0.4344
common 0.6827 0.5878 0.6579 0.5653
wings 1.0000 0.6722 0.7581 0.7323
cap 0.6722 1.0000 0.9785 0.8900
rooms 0.7581 0.9785 1.0000 0.9428
manh 0.7323 0.8900 0.9428 1.0000
The SAS System
The
REG Procedure
Model: MODEL1
Dependent Variable: manh
Analysis
of Variance
Sum of Mean
Source DF Squares Square F Value
Pr > F
Model 7 87506375 12500911 155.38
<.0001
Error 16 1287285 80455
Corrected Total 23 88793659
Root MSE 283.64643 R-Square
0.9855
Dependent Mean 2050.00708 Adj R-Sq 0.9792
Coeff
Var
13.83636
Parameter Estimates
Parameter Standard Variance
Variable DF
Estimate Error t Value
Pr > |t| Inflation
Intercept 1
198.88441 140.96751 1.41
0.1774 0
occup
1 21.21732 4.28692 4.95
0.0001 43.88352
checkin 1
1.42972 0.32819 4.36
0.0005 4.50284
hours 1
-0.34814 1.03073 -0.34
0.7399 1.28982
common 1
8.03540 8.41280 0.96
0.3537 4.06356
wings 1
-5.32923 9.42088 -0.57
0.5795 3.79988
cap 1 -4.00425 3.28993 -1.22
0.2412 56.57171
rooms
1 0.14603 6.79523 0.02
0.9831 178.92034
Outlier Detection
Outliers may not be immediately obvious from the residual plots. The following influence statistics may be useful for outlier detection:
Studentized Residuals
Hat Matrix
DFFITS
Cook’s D
DFBETAS
SAS Code (INFLUENCE and R options in the model statement):
proc reg data=boq ;
model
manh = occup checkin hours common wings cap rooms / influence r ;
run ;
quit
;
SAS Output:
The
REG Procedure
Model: MODEL1
Dependent
Variable: manh
Output
Statistics
Dep Var Predicted Std
Error Std Error Student Cook's
Obs manh Value Mean Predict Residual Residual Residual -2-1 0 1
2 D
1 180.2300 227.2915
136.7833 -47.0615 248.5
-0.189 | | |
0.001
2 182.6100 236.2935
111.0639 -53.6835 261.0
-0.206 | | |
0.001
3 164.3800 523.7144
114.6708 -359.3344 259.4 -1.385 |
**| | 0.047
4 284.5500 268.1507
107.8024 16.3993 262.4
0.0625 | | |
0.000
5 199.9200 249.0805
109.5588 -49.1605 261.6
-0.188 | | |
0.001
6 267.3800 427.3125
106.0943 -159.9325 263.1 -0.608 |
*| |
0.008
7 999.0900 583.6810
121.1537 415.4090 256.5
1.620 | |*** |
0.073
8
1103 1035 171.6435
68.2710 225.8 0.302 |
| | 0.007
9 944.2100 968.0711
148.7522 -23.8611 241.5
-0.0988 | | |
0.000
10 931.8400 706.0006
99.8980 225.8394 265.5
0.851 | |* |
0.013
11
2268 2049 126.3167 218.9069 254.0
0.862 | |* |
0.023
12 1490 1764
153.3497 -274.7074 238.6 -1.151 |
**| | 0.068
13
1892 2058 149.7979 -166.3589 240.9
-0.691 | *| |
0.023
14
1388 1370 82.4553
17.4975 271.4 0.0645 |
| | 0.000
15
3560 3485 253.9162
74.5703 126.4 0.590 |
|* | 0.175
16
3115 3112 178.7169
3.1305 220.3 0.0142 |
| | 0.000
17
2228 2603
170.9561 -375.1418 226.3 -1.657 |
***| | 0.196
18
4804 4797 204.5948
7.4635 196.5 0.0380 |
| | 0.000
19
2628 2719 129.9785 -90.7250 252.1
-0.360 | | |
0.004
20
1881 2095 211.7136 -213.7154 188.8
-1.132 | **| |
0.202
21
3037 2313 74.5211 723.3295 273.7
2.643 | |***** | 0.065
22
5540 5633 256.1430 -92.5620 121.8
-0.760 | *| |
0.319
23
8267 8240 274.4380
26.2878 71.688 0.367 |
| | 0.246
24
1846 1737 209.5631 109.1391 191.2
0.571 | |* |
0.049
Output
Statistics
Hat
Diag Cov
Obs
RStudent H Ratio DFFITS
1 -0.1836
0.2325 2.1448 -0.1011
2 -0.1994
0.1533 1.9378 -0.0849
3 -1.4295
0.1634 0.7211 -0.6319
4 0.0605
0.1444 1.9549 0.0249
5 -0.1821
0.1492 1.9352 -0.0763
6 -0.5956
0.1399 1.6161 -0.2402
7 1.7152
0.1824 0.4892 0.8102
8 0.2936
0.3662 2.5256 0.2231
9 -0.0957
0.2750 2.3003 -0.0589
10 0.8430
0.1240 1.3211 0.3172
11 0.8547
0.1983 1.4290
0.4251
12 -1.1639
0.2923 1.1856 -0.7480
13 -0.6789
0.2789 1.8242 -0.4222
14 0.0624
0.0845 1.8267 0.0190
15 0.5774
0.8014 7.0757 1.1598
16 0.0138
0.3970 2.7788 0.0112
17 -1.7633
0.3633 0.5832 -1.3318
18 0.0368
0.5203 3.4908 0.0383
19 -0.3499
0.2100 1.9877 -0.1804
20 -1.1430
0.5571 1.9400 -1.2819
21 3.4092
0.0690 0.0183
0.9283
22 -0.7492
0.8155 6.7693 -1.5749
23 0.3566
0.9361 24.5230 1.3650
24 0.5585
0.5459 3.1298 0.6123
Output
Statistics
----------------------------------------DFBETAS---------------------------------------
Obs
Intercept occup checkin hours common
wings cap rooms
1 -0.1008 0.0004
-0.0060 0.0771 0.0086
-0.0053 0.0002 0.0006
2
-0.0829 -0.0016 -0.0036
0.0521 0.0095 -0.0052
-0.0018 0.0033
3
-0.6076 -0.1569 0.0220
0.4046 -0.0086 0.0252
-0.0757 0.1198
4
0.0018 -0.0034 -0.0016
0.0146 -0.0084 0.0009
-0.0026 0.0025
5
-0.0690 0.0192 0.0001
0.0473 0.0092 -0.0024
0.0117 -0.0155
6
-0.0169 0.0297
0.0356 -0.1389 0.0913
-0.0066 0.0283 -0.0290
7
0.7144 -0.1507 -0.1499
-0.4508 -0.3388 -0.0016
-0.1994 0.2313
8
0.0176 -0.0550 0.0532
0.0725 -0.1136 0.1780
0.0064 0.0010
9
-0.0354 -0.0103 0.0075
0.0392 -0.0335 0.0054
-0.0103 0.0115
10
0.0104 -0.1051 -0.0626
0.1808 -0.1555 0.0399
-0.0826 0.0998
11
-0.0035 0.2715 -0.1668
0.0596 0.2615
-0.1730 0.2616 -0.2598
12
-0.0879 -0.4336 -0.3125
-0.1478 -0.2745 -0.0276
-0.2929 0.4634
13
0.2248 -0.0230 0.1097
-0.3662 0.0175 -0.0031
0.0166 -0.0006
14
0.0014 -0.0009 0.0014
0.0102 -0.0046 -0.0003
-0.0031 0.0016
15
-0.0166 -0.6731 0.7411
0.0917 -0.2709 -0.2601
-0.7668 0.7072
16
-0.0011 -0.0005 -0.0065
0.0008 0.0006 -0.0068
-0.0058 0.0052
17
0.1282 0.1628 0.9503
-0.1769 0.0939 0.7478
0.5877 -0.6366
18
-0.0009 0.0140 0.0197
-0.0064 0.0264 0.0018
0.0082 -0.0157
19
-0.0090 -0.1397 -0.0228
-0.0125 -0.1213 0.0199
-0.0791 0.1244
20
0.0491 0.7450 -0.3763
-0.1661 0.3003 -0.2561
-0.1999 -0.2128
21
-0.0279 0.2444 -0.2003
0.2720 0.2835 -0.0891
-0.0187 -0.1039
22
0.0961 0.2820 0.0883
0.1759 0.2386 -1.0713
0.1386 -0.1921
23
0.1100 0.3246 -0.0004
-0.2146 -0.3676 -0.2106
0.1308 -0.0977
24 -0.0444 0.0253
-0.1088 0.0362 0.2340
-0.0297 0.3860 -0.2228