Site hosted by Angelfire.com: Build your free website today!

Multiple Regression – Part II

 

Multicollinearity

 

Multicollinearity occurs when there is a high degree of correlation among several of the independent variables.

 

Primary effects of multicollinearity:

 

  • Inflates the variance of the estimated partial regression coefficients
  • Invalidates the traditional interpretation of the partial regression coefficients

 

 

 

Recognizing Multicollinearity:

 

  • High correlation between 2 or more independent variables
  • Large changes in the values of the estimated regression coefficients when a variable is added to or deleted from the model
  • Estimated regression coefficients with a sign that is opposite of that which would be commonly expected
  • Non-significant t-tests for the regression coefficients of important independent variables
  • Wide confidence intervals for the regression coefficients of important independent variables
  • Variance Inflation Factor (VIF) – If the maximum VIF is greater than 10, then it is likely that there is significant multicollinearity present.

 

Remedial Measures:

 

  • Remove some of the correlated variables
  • Principal components regression
  • Ridge regression

 

 

Example:

 

US Navy Bachelor Officers Quarters data (BOQ)

 

Goal:  predict the number of manhours required to operate each establishment

 

SAS Code to create data set and run regression:

 

data boq ;

  input id $ occup checkin hours common wings cap rooms manh ;

  if id = 'W' then delete ;

  cards ;

A 2 4 4 1.26 1 6 6 180.23

B 3 1.58 40 1.25 1 5 5 182.61

C 16.6 23.78 40 1 1 13 13 164.38

D 7 2.37 168 1 1 7 8 284.55

E 5.3 1.67 42.5 7.79 3 25 25 199.92

F 16.5 8.25 168 1.12 2 19 19 267.38

G 25.89 3 40 0 3 36 36 999.09

H 44.42 159.75 168 .6 18 48 48 1103.24

I 39.63 50.86 40 27.37 10 77 77 944.21

J 31.92 40.08 168 5.52 6 47 47 931.84

K 97.33 255.08 168 19 6 165 130 2268.06

L 56.63 373.42 168 6.03 4 36 37 1489.5

M 96.67 206.67 268 17.86 14 120 120 1891.7

N 54.58 207.08 168 7.77 6 66 66 1387.82

O 113.88 981 168 24.48 6 166 179 3559.92

P 149.58 233.83 168 31.07 14 185 202 3115.29

Q 134.32 145.82 168 25.99 12 192 192 2227.76

R 188.74 937 168 45.44 26 237 237 4804.24

S 110.24 410 168 20.05 12 115 115 2628.32

T 96.83 677.33 168 20.31 10 302 210 1880.84

U 102.33 288.83 168 21.01 14 131 131 3036.63

V 274.92 695.25 168 46.63 58 363 363 5539.98

W 811.08 714.33 168 22.76 17 242 242 3534.49

X 384.5 1473.66 168 7.36 24 540 453 8266.77

Y 95 368 168 30.26 9 292 196 1845.89

 ;

proc reg data=boq corr ;

  model manh = occup checkin hours common wings cap rooms / vif ;

run ;

quit ;

 

SAS Output:

 

                                  The SAS System        21:08 Monday, February 10, 2003   2

 

                                         The REG Procedure

 

                                           Correlation

 

         Variable             occup           checkin             hours            common

 

         occup               1.0000            0.8571            0.4421            0.5688

         checkin             0.8571            1.0000            0.4018            0.4640

         hours               0.4421            0.4018            1.0000            0.3592

         common              0.5688            0.4640            0.3592            1.0000

         wings               0.7668            0.5460            0.3581            0.6827

         cap                 0.9270            0.8452            0.4166            0.5878

         rooms               0.9708            0.8545            0.4373            0.6579

         manh                0.9808            0.9027            0.4344            0.5653

 

                                           Correlation

 

         Variable             wings               cap             rooms              manh

 

         occup               0.7668            0.9270            0.9708            0.9808

         checkin             0.5460            0.8452            0.8545            0.9027

         hours               0.3581            0.4166            0.4373            0.4344

         common              0.6827            0.5878            0.6579            0.5653

         wings               1.0000            0.6722            0.7581            0.7323

         cap                 0.6722            1.0000            0.9785            0.8900

         rooms               0.7581            0.9785            1.0000            0.9428

         manh                0.7323            0.8900            0.9428            1.0000

 

            The SAS System        21:08 Monday, February 10, 2003   3

 

                                         The REG Procedure

                                           Model: MODEL1

                                     Dependent Variable: manh

 

                                       Analysis of Variance

 

                                              Sum of           Mean

          Source                   DF        Squares         Square    F Value    Pr > F

 

          Model                     7       87506375       12500911     155.38    <.0001

          Error                    16        1287285          80455

          Corrected Total          23       88793659

 

 

                       Root MSE            283.64643    R-Square     0.9855

                       Dependent Mean     2050.00708    Adj R-Sq     0.9792

                       Coeff Var            13.83636

 

 

                                        Parameter Estimates

 

                             Parameter       Standard                              Variance

        Variable     DF       Estimate          Error    t Value    Pr > |t|      Inflation

 

        Intercept     1      198.88441      140.96751       1.41      0.1774              0

        occup         1       21.21732        4.28692       4.95      0.0001       43.88352

        checkin       1        1.42972        0.32819       4.36      0.0005        4.50284

        hours         1       -0.34814        1.03073      -0.34      0.7399        1.28982

        common        1        8.03540        8.41280       0.96      0.3537        4.06356

        wings         1       -5.32923        9.42088      -0.57      0.5795        3.79988

        cap           1       -4.00425        3.28993      -1.22      0.2412       56.57171

        rooms         1        0.14603        6.79523       0.02      0.9831      178.92034

 

 

 

Outlier Detection

 

Outliers may not be immediately obvious from the residual plots.  The following influence statistics may be useful for outlier detection:

 

 

Studentized Residuals

 

 

Hat Matrix

 

DFFITS

 

 

 

Cook’s D

 

 

 

 

DFBETAS

 

 

 

 

 

SAS Code (INFLUENCE and R options in the model statement):

 

proc reg data=boq ;

  model manh = occup checkin hours common wings cap rooms / influence r ;

run ;

quit ;

 

SAS Output:

 

                                         The REG Procedure

                                           Model: MODEL1

                                     Dependent Variable: manh

 

                                         Output Statistics

 

             Dep Var Predicted    Std Error           Std Error  Student                   Cook's

       Obs      manh     Value Mean Predict  Residual  Residual Residual   -2-1 0 1 2           D

 

         1  180.2300  227.2915     136.7833  -47.0615     248.5   -0.189 |      |      |    0.001

         2  182.6100  236.2935     111.0639  -53.6835     261.0   -0.206 |      |      |    0.001

         3  164.3800  523.7144     114.6708 -359.3344     259.4   -1.385 |    **|      |    0.047

         4  284.5500  268.1507     107.8024   16.3993     262.4   0.0625 |      |      |    0.000

         5  199.9200  249.0805     109.5588  -49.1605     261.6   -0.188 |      |      |    0.001

         6  267.3800  427.3125     106.0943 -159.9325     263.1   -0.608 |     *|      |    0.008

         7  999.0900  583.6810     121.1537  415.4090     256.5    1.620 |      |***   |    0.073

         8      1103      1035     171.6435   68.2710     225.8    0.302 |      |      |    0.007

         9  944.2100  968.0711     148.7522  -23.8611     241.5  -0.0988 |      |      |    0.000

        10  931.8400  706.0006      99.8980  225.8394     265.5    0.851 |      |*     |    0.013

        11      2268      2049     126.3167  218.9069     254.0    0.862 |      |*     |    0.023

        12      1490      1764     153.3497 -274.7074     238.6   -1.151 |    **|      |    0.068

        13      1892      2058     149.7979 -166.3589     240.9   -0.691 |     *|      |    0.023

        14      1388      1370      82.4553   17.4975     271.4   0.0645 |      |      |    0.000

        15      3560      3485     253.9162   74.5703     126.4    0.590 |      |*     |    0.175

        16      3115      3112     178.7169    3.1305     220.3   0.0142 |      |      |    0.000

        17      2228      2603     170.9561 -375.1418     226.3   -1.657 |   ***|      |    0.196

        18      4804      4797     204.5948    7.4635     196.5   0.0380 |      |      |    0.000

        19      2628      2719     129.9785  -90.7250     252.1   -0.360 |      |      |    0.004

        20      1881      2095     211.7136 -213.7154     188.8   -1.132 |    **|      |    0.202

        21      3037      2313      74.5211  723.3295     273.7    2.643 |      |***** |    0.065

        22      5540      5633     256.1430  -92.5620     121.8   -0.760 |     *|      |    0.319

        23      8267      8240     274.4380   26.2878    71.688    0.367 |      |      |    0.246

        24      1846      1737     209.5631  109.1391     191.2    0.571 |      |*     |    0.049

 

                                         Output Statistics

 

                                            Hat Diag         Cov

                           Obs  RStudent           H       Ratio      DFFITS

 

                             1   -0.1836      0.2325      2.1448     -0.1011

                             2   -0.1994      0.1533      1.9378     -0.0849

                             3   -1.4295      0.1634      0.7211     -0.6319

                             4    0.0605      0.1444      1.9549      0.0249

                             5   -0.1821      0.1492      1.9352     -0.0763

                             6   -0.5956      0.1399      1.6161     -0.2402

                             7    1.7152      0.1824      0.4892      0.8102

                             8    0.2936      0.3662      2.5256      0.2231

                             9   -0.0957      0.2750      2.3003     -0.0589

                            10    0.8430      0.1240      1.3211      0.3172

                            11    0.8547      0.1983      1.4290      0.4251

                            12   -1.1639      0.2923      1.1856     -0.7480

                            13   -0.6789      0.2789      1.8242     -0.4222

                            14    0.0624      0.0845      1.8267      0.0190

                            15    0.5774      0.8014      7.0757      1.1598

                            16    0.0138      0.3970      2.7788      0.0112

                            17   -1.7633      0.3633      0.5832     -1.3318

                            18    0.0368      0.5203      3.4908      0.0383

                            19   -0.3499      0.2100      1.9877     -0.1804

                            20   -1.1430      0.5571      1.9400     -1.2819

                            21    3.4092      0.0690      0.0183      0.9283

                            22   -0.7492      0.8155      6.7693     -1.5749

                            23    0.3566      0.9361     24.5230      1.3650

                            24    0.5585      0.5459      3.1298      0.6123

 

                                         Output Statistics

 

           ----------------------------------------DFBETAS---------------------------------------

       Obs Intercept      occup    checkin      hours     common      wings        cap      rooms

 

         1   -0.1008     0.0004    -0.0060     0.0771     0.0086    -0.0053     0.0002     0.0006

         2   -0.0829    -0.0016    -0.0036     0.0521     0.0095    -0.0052    -0.0018     0.0033

         3   -0.6076    -0.1569     0.0220     0.4046    -0.0086     0.0252    -0.0757     0.1198

         4    0.0018    -0.0034    -0.0016     0.0146    -0.0084     0.0009    -0.0026     0.0025

         5   -0.0690     0.0192     0.0001     0.0473     0.0092    -0.0024     0.0117    -0.0155

         6   -0.0169     0.0297     0.0356    -0.1389     0.0913    -0.0066     0.0283    -0.0290

         7    0.7144    -0.1507    -0.1499    -0.4508    -0.3388    -0.0016    -0.1994     0.2313

         8    0.0176    -0.0550     0.0532     0.0725    -0.1136     0.1780     0.0064     0.0010

         9   -0.0354    -0.0103     0.0075     0.0392    -0.0335     0.0054    -0.0103     0.0115

        10    0.0104    -0.1051    -0.0626     0.1808    -0.1555     0.0399    -0.0826     0.0998

        11   -0.0035     0.2715    -0.1668     0.0596     0.2615    -0.1730     0.2616    -0.2598

        12   -0.0879    -0.4336    -0.3125    -0.1478    -0.2745    -0.0276    -0.2929     0.4634

        13    0.2248    -0.0230     0.1097    -0.3662     0.0175    -0.0031     0.0166    -0.0006

        14    0.0014    -0.0009     0.0014     0.0102    -0.0046    -0.0003    -0.0031     0.0016

        15   -0.0166    -0.6731     0.7411     0.0917    -0.2709    -0.2601    -0.7668     0.7072

        16   -0.0011    -0.0005    -0.0065     0.0008     0.0006    -0.0068    -0.0058     0.0052

        17    0.1282     0.1628     0.9503    -0.1769     0.0939     0.7478     0.5877    -0.6366

        18   -0.0009     0.0140     0.0197    -0.0064     0.0264     0.0018     0.0082    -0.0157

        19   -0.0090    -0.1397    -0.0228    -0.0125    -0.1213     0.0199    -0.0791     0.1244

        20    0.0491     0.7450    -0.3763    -0.1661     0.3003    -0.2561    -0.1999    -0.2128

        21   -0.0279     0.2444    -0.2003     0.2720     0.2835    -0.0891    -0.0187    -0.1039

        22    0.0961     0.2820     0.0883     0.1759     0.2386    -1.0713     0.1386    -0.1921

        23    0.1100     0.3246    -0.0004    -0.2146    -0.3676    -0.2106     0.1308    -0.0977

        24   -0.0444     0.0253    -0.1088     0.0362     0.2340    -0.0297     0.3860    -0.2228