Loaded model predicts well in colab but gives same label and accuracy when downloaded

https://datascience.stackexchange.com/questions/76600

12-12-2020
|

Pergunta

I have developed a Recurrent Neural Network to perform sentiment analysis on tweets using the Kazanova/sentiment140 dataset in Kaggle.

The model looks like this:

def scheduler(epoch):
  if epoch < 10:
    return 0.001
  else:
    return 0.001 * tf.math.exp(0.1 * (10 - epoch))

callback1 = tf.keras.callbacks.LearningRateScheduler(scheduler)
callback2 = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',patience=10, verbose=0, mode='auto',min_delta=0.0001, cooldown=0, min_lr=0)
callback3 = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, patience=3, verbose=0, mode='auto',baseline=None, restore_best_weights=True)

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=max_length, weights=[embeddings_matrix], trainable=False),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(pool_size=4),
    tf.keras.layers.LSTM(64),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
num_epochs = 50

training_padded = np.array(training_sequences)
training_labels = np.array(training_labels)
testing_padded = np.array(test_sequences)
testing_labels = np.array(test_labels)

history = model.fit(training_padded, training_labels, epochs=num_epochs, validation_data=(testing_padded, testing_labels), verbose=2,callbacks=[callback1,callback2])

print("Training Complete")

model.save('sentiment_final.h5')

The model runs fine and predicts output perfectly when loaded from colab itself

The loaded colab code:

load_model= tf.keras.models.load_model('sentiment_final.h5')
#load_model.summary()

def decode_sentiment(score):

    if score < 0.5:
        return "NEGATIVE"
    else:
        return "POSITIVE"

def predict(text):
    
    x_test = pad_sequences(tokenizer.texts_to_sequences([text]), maxlen=16)
    
    score = load_model.predict([x_test])[0]

    return {"label": decode_sentiment(score), "score": float(score)}

predict("I love this day") #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}
predict("I hate this day") #Outputs -> {'label': 'NEGATIVE', 'score': 0.38644927740097046}
predict("I shouldn't be alive") #Outputs -> {'label': 'NEGATIVE', 'score': 0.12737956643104553}

But If I load the model in VSCode , the output is the same for all the models.

VSCode Implementation:

import tensorflow
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import os


tokenizer=Tokenizer()
model = load_model('sentiment_final.h5')

def decode_sentiment(score):

    if score<0.5:
        return "Negative"
    else:
        return "Positive"

def predict_score(text):

    x_test=pad_sequences(tokenizer.texts_to_sequences([text]),maxlen=16)
    score=model.predict([x_test])[0]
    return {"label":decode_sentiment(score),"score": float(score)}

def call_predict_function(text):
    return predict_score(text)

        
print(call_predict_function("I love this day")) #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}
print(call_predict_function("I hate this day")) #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}
print(call_predict_function("I shouldn't be alive")) #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}

Where am I going wrong? Can somebody resolve this problem?

Solução

As far as I am aware you also need to save and load the tokenizer you used. The tokenizer is not fitted/trained and therefore is outputting nothing sensible for the model to predict on.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange