Keras Model Predict is not predicting all images flowing from directory?

https://datascience.stackexchange.com/questions/75347

11-12-2020
|

Pergunta

I have the following code where I have done all the training and passed the testing set as a flow from directory. After that when I pass that object into the model.predict option, the array received is not of the same length as the test set length. Code:

PATH = '/content/testing'
testGen.reset()
testGen = valAug.flow_from_directory(
    PATH,
    class_mode="categorical",
    target_size=(75, 75),
    color_mode="rgb",
    shuffle=False,
    batch_size=BS)

predIdxs = model1.predict_generator(testGen,
    steps=(totalTest // 32))
print(len(predIdxs))
# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)
print(len(predIdxs))
import pandas as pd
from glob import glob
test_df = pd.DataFrame()
id = []
for x in glob('/content/testing/0/*'):
    id.append(x)
for x in glob('/content/testing/1/*'):
    id.append(x)
test_df['id'] = id
test_df['category'] = predIdxs
print(test_df)
test_df.to_csv('submission.csv', index=False)

After that the output I got is as follows:

Found 55505 images belonging to 2 classes.
55488
55488
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-81-09c1488e5ac6> in <module>()
     25     id.append(x)
     26 test_df['id'] = id
---> 27 test_df['category'] = predIdxs
     28 print(test_df)
     29 test_df.to_csv('submission.csv', index=False)

3 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
    609 
    610     if len(data) != len(index):
--> 611         raise ValueError("Length of values does not match length of index")
    612 
    613     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index

Lenght of test set(totalTest = 55505) but only 55488 data is predicted. Why is data lost here? P.S: The model I have used is a pretrained Inception V3 model where I have downloaded the weights beforehand and run the model. I got about 85% accuracy. And I have tried the same method using Resnet block also and I have received the results without error. Why am I getting an error here? Any help would be appreciated.

Solução

I presume that your generator is at fault here :

predIdxs = model1.predict_generator(testGen,
    steps=(totalTest // 32))

You do an integer division on the size of your test set, but the result is not an integer and thus truncated (floored). Later on you (presumably) use original size for one column, and want to assign the data from your predict_generator (which is shorter) to another column of that same dataframe, creating a mismatch.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange