Examining Regression for Multicollinearity Using Stata

- PS401

Multicollinearity is a ubiquitous problem in regression analysis. It is something that we need to routinely examine in each model we run, since it's presence will produce results that lead us to make erroneous inferences with our hypothesis tests.

We will use the Presidential Approval data set from last years final for this analysis. Presapp.dta

The steps we will follow will be:

- Examine the correlation matrix for the data
- Run the regression analysis

- Examine the betas
- compare the t values and the F statistic
- Examine the Variance Inflation Factors (VIFs)

Examine the correlation matrix for the data

Correlations are easy to obtain. Get a correlation matrix of the entire data set with the following command

cor year-cpiWhich looks like this

The format is straightforward. Regress dependentVar IndependentVars

regress approval unemrate realgnp cpiSome supplementary statistics are also available

regress approval unemrate realgnp cpi, bproduces betas (standardized regression coefficients)

.

Examine the Variance Inflation Factors (VIFs)

Simply type

vifin the command line and get the following results.

Note that the 1/VIF column is the Tolerance.

The VIF ranges from 1.0 to infinity. VIFs greater than 10.0 are generally seen as indicative of severe multicolinearity. Tolerance ranges from 0.0 to 1.0, with 1.0 being the absence of multicolinearity.

Now using a 50 state dataset, try looking at the following model:

regress rate96 urb96 urbrnk96 emprat96 emprnk96, bwhere

is the Crime rate,rateis urbanization, andurbis % of workforce employed. Theempsuffix is the state rank, rather than the raw data.rnkHow does multicolinearity affect this model?