Question

Can both Naive Bayes and Logistic regression classify both of these dataset perfectly ? My understanding is that Naive Bayes can , and Logistic regression with complex terms can classify these datasets. Please help if I am wrong.

Image of datasets is here:

enter image description here

Was it helpful?

Solution

Lets run both algorithms on two similar datasets to the ones you posted and see what happens...

EDIT The previous answer I posted was incorrect. I forgot to account for the variance in Gaussian Naive Bayes. (The previous solution was for naive bayes using Gaussians with fixed, identity covariance, which gives a linear decision boundary).

It turns out that LR fails at the circular dataset while NB could succeed. Both methods succeed at the rectangular dataset.

The LR decision boundary is linear while the NB boundary is quadratic (the boundary between two axis-aligned Gaussians with different covariances).

Applying NB the circular dataset gives two means in roughly the same position, but with different variances, leading to a roughly circular decision boundary - as the radius increases, the probability of the higher variance Gaussian increases compared to that of the lower variance Gaussian. In this case, many of the inner points on the inner circle are incorrectly classified.

The two plots below show a gaussian NB solution with fixed variance.

Circular Dataset (Identity covariance gaussian naive bayes)

Rectangular Dataset (Identity covariance gaussian naive bayes)

In the plots below, the contours represent probability contours of the NB solution. This gaussian NB solution also learns the variances of individual parameters, leading to an axis-aligned covariance in the solution.

Circular Dataset (Gaussian naive bayes - axis-aligned covariance)

Rectangular Dataset (Gaussian naive bayes - axis-aligned covariance)

OTHER TIPS

Naive Bayes/Logistic Regression can get the second (right) of these two pictures, in principle, because there's a linear decision boundary that perfectly separates.

If you used a continuous version of Naive Bayes with class-conditional Normal distributions on the features, you could separate because the variance of the red class is greater than that of the blue, so your decision boundary would be circular. You'd end up with distributions for the two classes which had the same mean (the centre point of the two rings) but where the variance of the features conditioned on the red class would be greater than that of the features conditioned on the blue class, leading to a circular decision boundary somewhere in the margin. This is a non-linear classifier, though.

You could get the same effect with histogram binning of the feature spaces, so long as the histograms' widths were narrow enough. In this case both logistic regression and Naive Bayes will work, based on histogram-like feature vectors.

How would you use Naive Bayes on these data sets?

In the usual form, Naive Bayes needs binary / categorial data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top