Question

I am running Logit Regression in Stata.

  1. How can I know the explanatory power of the regression (in OLS, I look at R^2)?

  2. Is there a meaningful approach in expanding the regression with other independent variables (in OLS, I manually keep on adding the independent variables and look for adjusted R^2; my guess is Stata should have simplified this manual process)?

Was it helpful?

Solution

I'm worried that you are getting the fundamentals of modelling wrong here:

  1. The explanatory power of a regression model is theoretically determined by your interpretation of the coefficients, not by the R-squared. The R^2 represents the amount of variance that your linear model predicts, which might be an appropriate benchmark to your model, or not.

  2. Identically, the presence or absence of an independent variable in your model requires substantive justification. If you want to have a look at how the R-squared changes when adding or subtracting parts of your model, see help nestreg for help on nested regression.

To summarize: the explanatory power of your model and its variable composition cannot be determined just by crunching the numbers. You first need an adequate theory to build your model onto.

Now, if you are running logit:

  • Read Long and Freese (Ch. 3) to understand how log likelihood converges (or not) in your model.
  • Do not expect to find something as straightforward as the R-squared for logit.
  • Use logit diagnostics on your model, just like you should be after running OLS.

You might also want to read the likelihood ratio Chi-squared test or run additional lrtest commands as explained by Eric.

OTHER TIPS

The concept of R^2 is meaningless in logit regression and you should disregard the McFadden Pseudo R2 in the Stata output altogether. Lemeshow recommends 'to assess the significance of an independent variable we compare the value of D with and without the independent variable in the equation' with the Likelihood ratio test (G): G=D(Model without variables [B])-D(Model with variables [A]).

The Likelihood ratio test (G):

H0: coefficients for eliminated variables are all equal to 0

Ha: at least one coefficient is not equal to 0

When the LR-test p>.05 do not reject H0, which implies that, statistically speaking, there is no advantage to include the additional IV's into the model.

Example Stata syntax to do this is: logit DV IV1 IV2 estimates store A logit DV IV1 estimates store B lrtest A B // i.e. tests if A is 'nested' in B

Note, however, that many more aspects have to checked and tested before we can conclude whether or not a logit model is 'acceptable'. For more detauls, I recommend to visit: http://www.ats.ucla.edu/stat/stata/topics/logistic_regression.html

and consult:

Applied logistic regression, David W. Hosmer and Stanley Lemeshow , ISBN-13: 978-0471356325

I certainly agree with the above posters that almost any measure of R^2 for a binary model like logit or probit shouldn't be considered very important. There are ways to see how good of a job your model does at predicting. For example, check out the following commands:

lroc 
estat class

Also, here's a good article for further reading: http://www.statisticalhorizons.com/r2logistic

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top