Question

I have used the Iris Dataset's 1st and 3rd Column for the features. and the labels of Iris Setosa (-1) and Iris Versicolor (1). I am using ADALINE as a simple classification model for my dataset. I am using gradient descent as the cost minimizing function. But on every iteration the error increases. What am I doing wrong in the python code?

import numpy as np
import pandas as pd

class AdalineGD(object):

    def __init__(self, eta = 0.01, n_iter = 50):
        self.eta = eta
        self.n_iter = n_iter

    def fit (self, X, y):
        """Fit training data."""

        self.w_ = np.random.random(X.shape[1])
        self.cost_ = []
        print ('Initial weights are: %r' %self.w_)
        for i in range(self.n_iter):
            output = self.net_input(X)
            print ("On iteration %d, output is: %r" %(i, output))
            errors = output - y
            print("On iteration %d, Error is: %r" %(i, errors))
            self.w_ += self.eta * X.T.dot(errors)
            print ('Weights on iteration %d: %r' %(i, self.w_))
            cost = (errors**2).sum() / 2.0
            self.cost_.append(cost)
            print ("On iteration %d, Cost is: %r" %(i, cost))
            prediction = self.predict(X)
            print ("Prediction after iteration %d is: %r" %(i, prediction))
            input()
        return self

    def net_input(self, X):
        """Calculate net input"""
        return X.dot(self.w_)

    def activation(self, X):
        """Computer Linear Activation"""
        return self.net_input(X)

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(X) >= 0.0, 1, -1)

####### END OF THE CLASS ########
#importing the Iris Dataset 
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", header = None)
y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)
X = df.iloc[0:100, [0, 2]].values
#Adding the ones column to the X matrix
X = np.insert(X, 0,  np.ones(X.shape[0]), axis = 1)
ada = AdalineGD(n_iter = 20, eta = 0.001).fit(X, y)
Was it helpful?

Solution

I think something is wrong here.

self.w_ += self.eta * X.T.dot(errors)

You are going to the positive to the gradient while you should be doing is going to the negative direction of it

self.w_ -= self.eta * X.T.dot(errors)

or

self.w_ += -self.eta * X.T.dot(errors)

see this for more clarification.

OTHER TIPS

If you want to do

self.w_ += self.eta * X.T.dot(errors)

like i like to do.

you just have to change

errors = output - y

to

errors = y - output

Hope this helps : )

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top