Question

As a newbie to tensorflow, I am using this tutorial from google for binary classification using a simple dense neural network.

The slightly annoying thing about this (and a few other) tutorials is they completely gloss over the part of how to actually make a prediction from a dataframe of features, and directly move to model evaluation using some method of the trained model which hides the actual prediction procedure. So basically, I finished the model training, but even after that, I see no way mentioned on how to actually use the model to predict classes of unknown samples.

To put concretely, I have a trained model, a pandas dataframe called test, and a list of columnames which correspond to feature names. Based on the variables declared in the tutorial, I tried

feature_names=['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']

model.predict_proba(x=test[feature_names])

But it is throwing a type error. Basically, I need a function which will give me the classes, or ideally, the softmax probabilities of the classes from the feature frames without the label, because that is how we use any model.

Was it helpful?

Solution

First, you need to convert the dataframe in numpy array or tf.data dataset that the model understands. For this purpose, the tutorial provides you with a function:

# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()
  labels = dataframe.pop('target')
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)
  return ds 

test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)

Then you can get the predictions like this:

model.predict_proba(x=test_ds)

Take into account that the test-input format should be the same as training-input format, so if you have done any preprocessing (on_hot_encode, standardize, bucketize, etc) to the training dataset, you should do it also to the test dataset

Another way, if your dataset is ready for prediction, you can just transform it to a numpy array:

model.predict_proba(x=test[feature_names].values)
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top