Question

So I'm a little bit baffled. I have just started working with the Keras framework for Python (which is awesome by the way!).

However just trying a few simple test of neural networks has got me a bit confused. I initially tried to classify the Iris data as it was a small, quick and simple dataset. However when I constructed a neural network for it (4 input dimensions, 8 node hidden layer, 3 node output layer for binary classification of the 3 classes). This however didn't produce any predictive powers at all (100 samples in the training set, a separate 50 in the test set. Both had been shuffled so as to include a good distribution of classes in each).

Now I thought I was doing something wrong, but I thought I'd give the network a quick test on the MNIST dataset just in case. So I used basically the exact same network (other than changing the input dimensions to 784, hidden nodes to 30, and output nodes to 10 for the binary encoded 0-9 output). And this worked perfectly! With a 97% accuracy rate and a 5% loss.

So now I'm not sure why the Iris dataset isn't playing ball, does anyone have any clues? I've tried changing the number of hidden layer nodes and also normalized the X input.

Here's an output I've produced when I've manually run some of the test set through the trained Iris model. As you can see it's basically producing a uniform random guess.

Target: [ 1.  0.  0.]  | Predicted: [[ 0.44635904  0.43874186  0.45729554]]
Target: [ 0.  0.  1.]  | Predicted: [[ 0.44618103  0.43869928  0.45735642]]
Target: [ 0.  1.  0.]  | Predicted: [[ 0.44612524  0.43863046  0.45729461]]
Target: [ 0.  0.  1.]  | Predicted: [[ 0.44617626  0.43870446  0.45736298]]
Target: [ 0.  0.  1.]  | Predicted: [[ 0.44613886  0.43865535  0.45731983]]

And here's my full code for the Iris MLP (the MNIST one is essentially the same)

import numpy as np
import random
from keras.models import Sequential
from keras.layers import Dense, Activation
from sklearn.datasets import load_iris
from sklearn import preprocessing

# Model Layers Defnition
# Input layer (12 neurons), Output layer (1 neuron)
model = Sequential()
model.add(Dense(8, input_dim=4, init='uniform', activation='sigmoid'))
model.add(Dense(3, init='uniform', activation='sigmoid'))


model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Iris has 150 samples, each sample has 4 dimensions
iris = load_iris()
iris_xy = zip(iris.data, iris.target)
random.shuffle(iris_xy) # Iris data is sequential in it's labels
iris_x, iris_y = zip(*iris_xy)
iris_x = preprocessing.normalize(np.array(iris_x))
iris_y = np.array(iris_y)

# Encode decimal numbers to array
iris_y_enc = np.zeros(shape=(len(iris_y),3))
for i, y in enumerate(iris_y):
    iris_y_enc[i][y] = 1

train_data = np.array(iris_x[:100]) # 100 samples for training
test_data  = np.array(iris_x[100:]) # 50 samples for testing
train_targets = np.array(iris_y_enc[:100])
test_targets  = np.array(iris_y_enc[100:])

model.fit(train_data, train_targets, nb_epoch=10)

#score = model.evaluate(test_data, test_targets)

for test in zip(test_data, test_targets):
    prediction = model.predict(np.array(test[0:1]))
    print "Target:", test[1], " | Predicted:", prediction
Was it helpful?

Solution

By default sklearn.preprocessing.normalize normalizes samples, not features. Replace sklearn.preprocessing.normalize with sklearn.preprocessing.scale. This will center and scale (to unit variance) every feature.

Also give it more than 10 epochs. Here are learning curves (log loss) for 5000 epochs:

Learning Curves

This should end up with an accuracy of about 96%.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top