Classifying boat images

https://datascience.stackexchange.com/questions/73424

10-12-2020
|

Question

I trying to get some experience by exploring this Kaggle dataset.

It consists of 1500 pictures of boats classified in 9 categories. The data is as follows :

#x_train consists of 1159 images, with 80% of images from each category

x_train.shape = (1159,200,200,3)

y_train contains the number-label for each boat

y_train.shape = (1159,)

I have tried many variations of models like the following one but without any success.

model = Sequential()

model.add( Conv2D(32, (3,3),  input_shape = x_train.shape[1:] , activation='relu') )
model.add(MaxPooling2D(pool_size=(3,3)))

model.add(Flatten())    
model.add(Dense(4, activation='relu'))

model.add(Dense(2, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

h = model.fit(x_train, y_train, epochs=50,
              batch_size = 64, 
              validation_data = (x_val, y_val) )

Could you give me any advice on how to get a model with decent test_accuracy?

Solution

By looking at your code snippet, I realize you are training your CNN from scratch.

Use Transfer Learning Instead. Training a new model (choice of model architecture i.e. how deep your model should be, hyperparameters etc.) is very difficult if not impossible with only 1500 images. You can achieve great results quickly by using an already-trained model (aka Transfer Learning). If you are not quite familiar with the subject, read this article Transfer learning from pre-trained models, or this one First steps with Transfer Learning for custom image classification with Keras. There are codes included that helps to get started faster. One of the recent advances in Transfer Learning is efficientnet, you may want to jump using that one! But I would guess boats would be easy even with earlier models.

OTHER TIPS

Further to the above great reply.. I tried a transfer learning approach where I use the CNN layers of a pre-trained model for feature extraction and then use these features to train a DNN classifier.

However there is still problem with my code, as during training, no matter the number of epochs, I always get accuracy 0.0463 and val_accuracy 0.0479. Obviously there is a problem.. but I cannot find it.

I would be grateful if someone could advise me where the bug is.

boat_categories = ['buoy','cruise ship','ferry boat','freight boat',
                  'gondola','inflatable boat','kayak','paper boat','sailboat']
labels = [0,1,2,3,4,5,6,7,8]
img_size = 224

def create_sets () :

    train = []
    val = []
    test = []

    for category, label in zip(boat_categories, labels) :

        path = os.path.join (data_dir, category) 
        data = []

        for img in os.listdir(path):     
            dir_for_image = os.path.join(path,img)
            img_array= cv2.imread(dir_for_image)
            #print(img_array.shape)
            img_array = cv2.resize( img_array , (img_size, img_size) )
            data.append([img_array,label])

        tr = data[:int(len(data)*0.8)]
        v = data[int(len(data)*0.8):int(len(data)*0.9)]
        te = data[int(len(data)*0.9):]

        for j in range(len(tr)):
            train.append(tr[j])
        for j in range(len(v)):
            val.append(v[j])
        for j in range(len(te)):
            test.append(te[j])

    return train, val, test

# create train, validation and test set
sets = create_sets()
train = sets[0]
val = sets[1]
test = sets[2]

# shuffle the data
random.shuffle(train)
random.shuffle(val)
random.shuffle(test)

#separate images from labels
x_train = []; y_train = []
for j in range(len(train)):
    x_train.append(train[j][0])
    y_train.append(train[j][1])

x_val = []; y_val = []
for j in range(len(val)):
    x_val.append(val[j][0])
    y_val.append(val[j][1])

x_test = []; y_test = []
for j in range(len(test)):
    x_test.append(test[j][0])
    y_test.append(test[j][1])

#normalize
max_value = max(np.max(x_train), np.max(x_val), np.max(x_test))
x_train = x_train/max_value
x_val = x_val/max_value
x_test = x_test/max_value
y_train = np.array(y_train)/10
y_val = np.array(y_val)/10
y_test = np.array(y_test)/10

# transfer learning
#convolutional layers + flatten 
resnet = ResNet50(include_top=False, weights='imagenet', 
                  input_shape=(224,224,3))
output = resnet.layers[-1].output
output = layers.Flatten()(output)
resnet_model = Model(resnet.input, output)

#get features
x_train_feat = resnet_model.predict(x_train, verbose=0)
x_val_feat = resnet_model.predict(x_val, verbose=0)

#the model
model = Sequential()
model.add(layers.Dense(256, activation='relu', 
                       input_dim=resnet_model.output_shape[1]))
model.add(layers.Dense(9, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(x_train_feat, y_train,
         epochs = 10,
         validation_data = (x_val_feat, y_val))

I was reading a guide to image classification just the other day that uses this very dataset. It covers preprocessing, training, and modeling. Here's the article.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange