Question

The usual definition of regression (as far as I am aware) is predicting a continuous output variable from a given set of input variables.

Logistic regression is a binary classification algorithm, so it produces a categorical output.

Is it really a regression algorithm? If so, why?

Was it helpful?

Solution

Logistic regression is regression, first and foremost. It becomes a classifier by adding a decision rule. I will give an example that goes backwards. That is, instead of taking data and fitting a model, I'm going to start with the model in order to show how this is truly a regression problem.

In logistic regression, we are modeling the log odds, or logit, that an event occurs, which is a continuous quantity. If the probability that event $A$ occurs is $P(A)$, the odds are:

$$\frac{P(A)}{1 - P(A)}$$

The log odds, then, are:

$$\log \left( \frac{P(A)}{1 - P(A)}\right)$$

As in linear regression, we model this with a linear combination of coefficients and predictors:

$$\operatorname{logit} = b_0 + b_1x_1 + b_2x_2 + \cdots$$

Imagine we are given a model of whether a person has gray hair. Our model uses age as the only predictor. Here, our event A = a person has gray hair:

log odds of gray hair = -10 + 0.25 * age

...Regression! Here is some Python code and a plot:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

x = np.linspace(0, 100, 100)

def log_odds(x):
    return -10 + .25 * x

plt.plot(x, log_odds(x))
plt.xlabel("age")
plt.ylabel("log odds of gray hair")

plot of the log odds for our toy example

Now, let's make it a classifier. First, we need to transform the log odds to get out our probability $P(A)$. We can use the sigmoid function:

$$P(A) = \frac1{1 + \exp(-\text{log odds}))}$$

Here's the code:

plt.plot(x, 1 / (1 + np.exp(-log_odds(x))))
plt.xlabel("age")
plt.ylabel("probability of gray hair")

plot of the probability of gray hair for our toy example

The last thing we need to make this a classifier is to add a decision rule. One very common rule is to classify a success whenever $P(A) > 0.5$. We will adopt that rule, which implies that our classifier will predict gray hair whenever a person is older than 40 and will predict non-gray hair whenever a person is under 40.

Logistic regression works great as a classifier in more realistic examples too, but before it can be a classifier, it must be a regression technique!

OTHER TIPS

Short Answer

Yes, logistic regression is a regression algorithm and it does predict a continuous outcome: the probability of an event. That we use it as a binary classifier is due to the interpretation of the outcome.

Detail

Logistic regression is a type of generalize linear regression model.

In an ordinary linear regression model, a continuous outcome, y, is modeled as the sum of the product of predictors and their effect:

y = b_0 + b_1 * x_1 + b_2 * x_2 + ... b_n * x_n + e

where e is the error.

Generalized linear models do not model y directly. Instead, they use transformations to expand the domain of y to all real numbers. This transformation is called the link function. For logistic regression the link function is the logit function (usually, see note below).

The logit function is defined as

ln(y/(1 + y))

Thus the form of logistic regression is:

ln(y/(1 + y)) = b_0 + b_1 * x_1 + b_2 * x_2 + ... b_n * x_n + e

where y is the probability of an event.

The fact that we use it as a binary classifier is due to the interpretation of the outcome.

Note: probit is another link function used for logistic regression but logit is the most widely used.

As you discuss the definition of regression is predicting a continuous variable. Logistic regression is a binary classifier. Logistic regression is the application of a logit function on the output of a usual regression approach. Logit function turns (-inf,+inf) to [0,1]. I think it is just for historical reasons that keeps that name.

Saying something like "I did some regression to classify images. In particular I used logistic regression." is wrong.

To put it simply any hypothetical function $f$ makes for regression algorithm if $f:X\rightarrow \mathbb{R}$. Thus logistic function which is $P(Y=1|\lambda, x)=\dfrac{1}{1+e^{-\lambda^Tx}} \in [0,1]$ makes for a regression algorithm. Here $\lambda$ is coefficient or hyperplane found from trained datasets & $x$ is a data point. Here, $sign(P(Y=1|\lambda, x))$ is taken as class.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top