Logistic Regression and Naive Bayes for this dataset

Question 1

Lets run both algorithms on two similar datasets to the ones you posted and see what happens...

EDIT The previous answer I posted was incorrect. I forgot to account for the variance in Gaussian Naive Bayes. (The previous solution was for naive bayes using Gaussians with fixed, identity covariance, which gives a linear decision boundary).

It turns out that LR fails at the circular dataset while NB could succeed. Both methods succeed at the rectangular dataset.

The LR decision boundary is linear while the NB boundary is quadratic (the boundary between two axis-aligned Gaussians with different covariances).

Applying NB the circular dataset gives two means in roughly the same position, but with different variances, leading to a roughly circular decision boundary - as the radius increases, the probability of the higher variance Gaussian increases compared to that of the lower variance Gaussian. In this case, many of the inner points on the inner circle are incorrectly classified.

The two plots below show a gaussian NB solution with fixed variance.

Circular Dataset (Identity covariance gaussian naive bayes)

Rectangular Dataset (Identity covariance gaussian naive bayes)

In the plots below, the contours represent probability contours of the NB solution. This gaussian NB solution also learns the variances of individual parameters, leading to an axis-aligned covariance in the solution.

Circular Dataset (Gaussian naive bayes - axis-aligned covariance)

Rectangular Dataset (Gaussian naive bayes - axis-aligned covariance)

Question 2

Naive Bayes/Logistic Regression can get the second (right) of these two pictures, in principle, because there's a linear decision boundary that perfectly separates.

If you used a continuous version of Naive Bayes with class-conditional Normal distributions on the features, you could separate because the variance of the red class is greater than that of the blue, so your decision boundary would be circular. You'd end up with distributions for the two classes which had the same mean (the centre point of the two rings) but where the variance of the features conditioned on the red class would be greater than that of the features conditioned on the blue class, leading to a circular decision boundary somewhere in the margin. This is a non-linear classifier, though.

You could get the same effect with histogram binning of the feature spaces, so long as the histograms' widths were narrow enough. In this case both logistic regression and Naive Bayes will work, based on histogram-like feature vectors.

Question 3

How would you use Naive Bayes on these data sets?

In the usual form, Naive Bayes needs binary / categorial data.