What is the difference between a generative and a discriminative algorithm?

https://stackoverflow.com/questions/879432

22-08-2019
|

Question

Please, help me understand the difference between a generative and a discriminative algorithm, keeping in mind that I am just a beginner.

Solution

Let's say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(y|x) - which you should read as "the probability of y given x".

Here's a really simple example. Suppose you have the following data in the form (x,y):

(1,0), (1,0), (2,0), (2, 1)

p(x,y) is

      y=0   y=1
     -----------
x=1 | 1/2   0
x=2 | 1/4   1/4

p(y|x) is

      y=0   y=1
     -----------
x=1 | 1     0
x=2 | 1/2   1/2

If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions.

The distribution p(y|x) is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly are called discriminative algorithms. Generative algorithms model p(x,y), which can be transformed into p(y|x) by applying Bayes rule and then used for classification. However, the distribution p(x,y) can also be used for other purposes. For example, you could use p(x,y) to generate likely (x,y) pairs.

From the description above, you might be thinking that generative models are more generally useful and therefore better, but it's not as simple as that. This paper is a very popular reference on the subject of discriminative vs. generative classifiers, but it's pretty heavy going. The overall gist is that discriminative models generally outperform generative models in classification tasks.

OTHER TIPS

A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal?

A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal.

Imagine your task is to classify a speech to a language.

You can do it by either:

learning each language, and then classifying it using the knowledge you just gained

determining the difference in the linguistic models without learning the languages, and then classifying the speech.

The first one is the generative approach and the second one is the discriminative approach.

Check this reference for more details: http://www.cedar.buffalo.edu/~srihari/CSE574/Discriminative-Generative.pdf.

In practice, the models are used as follows.

In discriminative models, to predict the label y from the training example x, you must evaluate:

enter image description here

which merely chooses what is the most likely class y considering x. It's like we were trying to model the decision boundary between the classes. This behavior is very clear in neural networks, where the computed weights can be seen as a complexly shaped curve isolating the elements of a class in the space.

Now, using Bayes' rule, let's replace the enter image description here in the equation by . Since you are just interested in the arg max, you can wipe out the denominator, that will be the same for every y. So, you are left with

enter image description here

which is the equation you use in generative models.

While in the first case you had the conditional probability distribution p(y|x), which modeled the boundary between classes, in the second you had the joint probability distribution p(x, y), since p(x, y) = p(x | y) p(y), which explicitly models the actual distribution of each class.

With the joint probability distribution function, given a y, you can calculate ("generate") its respective x. For this reason, they are called "generative" models.

Here's the most important part from the lecture notes of CS299 (by Andrew Ng) related to the topic, which really helps me understand the difference between discriminative and generative learning algorithms.

Suppose we have two classes of animals, elephant (y = 1) and dog (y = 0). And x is the feature vector of the animals.

Given a training set, an algorithm like logistic regression or the perceptron algorithm (basically) tries to find a straight line — that is, a decision boundary — that separates the elephants and dogs. Then, to classify a new animal as either an elephant or a dog, it checks on which side of the decision boundary it falls, and makes its prediction accordingly. We call these discriminative learning algorithm.

Here's a different approach. First, looking at elephants, we can build a model of what elephants look like. Then, looking at dogs, we can build a separate model of what dogs look like. Finally, to classify a new animal, we can match the new animal against the elephant model, and match it against the dog model, to see whether the new animal looks more like the elephants or more like the dogs we had seen in the training set. We call these generative learning algorithm.

Generally, there is a practice in machine learning community not to learn something that you don’t want to. For example, consider a classification problem where one's goal is to assign y labels to a given x input. If we use generative model

p(x,y)=p(y|x).p(x)

we have to model p(x) which is irrelevant for the task in hand. Practical limitations like data sparseness will force us to model p(x) with some weak independence assumptions. Therefore, we intuitively use discriminative models for classification.

An addition informative point that goes well with the answer by StompChicken above.

The fundamental difference between discriminative models and generative models is:

Discriminative models learn the (hard or soft) boundary between classes

Generative models model the distribution of individual classes

Edit:

A Generative model is the one that can generate data. It models both the features and the class (i.e. the complete data).

If we model P(x,y): I can use this probability distribution to generate data points - and hence all algorithms modeling P(x,y) are generative.

Eg. of generative models

Naive Bayes models P(c) and P(d|c) - where c is the class and d is the feature vector.

Also, P(c,d) = P(c) * P(d|c)

Hence, Naive Bayes in some form models, P(c,d)
Bayes Net
Markov Nets

A discriminative model is the one that can only be used to discriminate/classify the data points. You only require to model P(y|x) in such cases, (i.e. probability of class given the feature vector).

Eg. of discriminative models:

logistic regression
Neural Networks
Conditional random fields

In general, generative models need to model much more than the discriminative models and hence are sometimes not as effective. As a matter of fact, most (not sure if all) unsupervised learning algorithms like clustering etc can be called generative, since they model P(d) (and there are no classes:P)

PS: Part of the answer is taken from source

The different models are summed up in the table below:

My two cents: Discriminative approaches highlight differences Generative approaches do not focus on differences; they try to build a model that is representative of the class. There is an overlap between the two. Ideally both approaches should be used: one will be useful to find similarities and the other will be useful to find dis-similarities.

A generative algorithm model will learn completely from the training data and will predict the response.

A discriminative algorithm job is just to classify or differentiate between the 2 outcomes.

All previous answers are great, and I'd like to plug in one more point.

From generative algorithm models, we can derive any distribution; while we can only obtain the conditional distribution P(Y|X) from the discriminative algorithm models(or we can say they are only useful for discriminating Y’s label), and that's why it is called discriminative model. The discriminative model doesn't assume that the X's are independent given the Y($X_i \perp X_{-i} | Y$) and hence is usually more powerful for calculating that conditional distribution.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow