Examining Residuals from Regression Analysis Using Stata

- PS602

Using Stata to examine the residuals from a regression analysis is a relatively straightforward task. It requires a number of steps, but the process gives the analyst a feel for the analysis much better than a canned package.

We will use the Presidential Approval data set from last years final for this analysis.

The steps we will follow will be:

- Obtain the data
- Examine the data
- Printing results
- Plotting the data
- Run the regression analysis
- Save the residuals
- Examine the residuals
- Correcting for the problem

The data set is easy to get (using Internet Explorer - Netscape needs to be properly configured to use)

Presidential Approval 1949-1985 (Sorry it is an old data set!)

First, you always need to look at your data. Since this is a time series data set we need to look at plots of
the data as well. [Note: Stata code lines are in ** bold italics**.]

- Simple descriptive statistics: Obtain means, standard deviations, number of observations. Look and see if the data is normally distributed. Does the data need to be N(mean,var)?

summ approval unemrate realgnp milforce, dThese results are written to the results window. Examine them there.

If your results look acceptable (you got the correct outout), open the log file. See the log icon/button at the top. Rerun the command with the log file open. Then, if desired, print the log file. The results window can also be printed directly.

Note: You can open and close the log file. When you do, you can choose between appending the next run to the log, or overwriting the entire log.

Plotting the data

Plot the data series in question.

- Plot Approval
*graph twoway line approval year*or simply

*graph twoway line approval year* - Unemployment

line unemrate year

The format is straightforward. Regress dependentVar IndependentVars

regress approval unemrateSome supplementary statistics are also available

regress approval unemrate, bproduces betas (standardized regression coefficients)

.

After the regress statement, use the predict statement.

predict res, rThis will save the residuals from the analysis as a new variable

res.We can also save the predicted values of the dependent variable by

predict yhat

First, simply look at the residuals

summ res, d

plot res year

** ** Then perform statistical tests for
skewness and kurtosis

sktest resLook at some other residual plots and try to figure out how well the model works.

plot res yhat

plot res approvalLook at another data set!

State Crime Rates

How to correct for non-normal errors

Transform the data

Bootstrap it!