How do I plot the decision boundary of a regression using matplotlib?

https://stackoverflow.com//questions/20045994

26-12-2019
|

Question

How do I add a countour map of the results of the logistic regression to my scatterplot? I want colored 0/1 zones, which delineate the decision boundary of the classifier.

import pandas as pd
import numpy as np
import pylab as pl
import statsmodels.api as sm

# Build X, Y from file
f = open('ex2data2.txt')
lines = f.readlines()
x1 = []
x2 = []
y = []
for line in lines:
    line = line.replace("\n", "")
    vals = line.split(",")
    x1.append(float(vals[0]))
    x2.append(float(vals[1]))
    y.append(int(vals[2]))

x1 = np.array(x1)
x2 = np.array(x2)
y = np.array(y)

x = np.vstack([x1, x2]).T

# Scatter plot 0/1s
pos_mask = y == 1
neg_mask = y == 0
pos_x1 = x1[pos_mask]
neg_x1 = x1[neg_mask]
pos_x2 = x2[pos_mask]
neg_x2 = x2[neg_mask]
pl.clf()
pl.scatter(pos_x1, pos_x2, c='r')
pl.scatter(neg_x1, neg_x2, c='g')

# Run logistic regression
logit = sm.Logit(y, x)
result = logit.fit()
result.params
result.predict([1.0, 1.0])

# Now I want to add a countour for 0/1 regression results to the scatter plot.

Solution

I will try to answer, but there are a few assumptions you must understand about my answer, which may or may not apply to your code:

my imports:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

X contains your features it looks like this:

print type(X)
<type 'numpy.ndarray'>

It is 102,2 as shown:

print X
[[-13.15490196 -23.        ]
[-22.95490196 -25.        ]
[-12.75490196  -8.        ]
[  0.14509804  -6.        ]
.
.
.

ytrain contains the ground truth, which in this case is boolean but you could do 0/1 just the same.

print type(ytrain)
<type 'numpy.ndarray'>

It is 51,

print (train)
[False False False False  True  True  True  True  True  True False  True
False  True  True  True False False False  True  True  True  True  True
False False False False  True  True  True  True  True  True False  True
False  True  True  True False False False False False  True  True  True
False  True False]

and finally clf contains your model, in my case a fitted model Also I am using LogisticRegression from scikit learn, this relies on my clf.predict_proba giving the information I need to build the labels and contours. I am not familiar with the exact package you are using but just keep this in mind.

# evenly sampled points
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 50),
                     np.linspace(y_min, y_max, 50))
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

#plot background colors
ax = plt.gca()
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
Z = Z.reshape(xx.shape)
cs = ax.contourf(xx, yy, Z, cmap='RdBu', alpha=.5)
cs2 = ax.contour(xx, yy, Z, cmap='RdBu', alpha=.5)
plt.clabel(cs2, fmt = '%2.1f', colors = 'k', fontsize=14)

# Plot the points
ax.plot(Xtrain[ytrain == 0, 0], Xtrain[ytrain == 0, 1], 'ro', label='Class 1')
ax.plot(Xtrain[ytrain == 1, 0], Xtrain[ytrain == 1, 1], 'bo', label='Class 2')

# make legend
plt.legend(loc='upper left', scatterpoints=1, numpoints=1)

Your result will look something like this:

enter image description here

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow