In Mini Batch Gradient Descent what happens to remaining examples
-
13-12-2020 - |
Question
Suppose my dataset has 1000 samples (X=1000) . I choose batch size of 32.
As 1000 is not perfectly divisible by 32 , remainder is 8.
My question is what happens to the last 8 examples. Are they considered? If they are, will they effect the efficiency of my model.
def next_batch(X, y, batchSize):
for i in np.arange(0, X.shape[0], batchSize):
yield (X[i:i + batchSize], y[i:i + batchSize])
This code is from a book and according to me this code is not considering the last remaining data points
Solution
It's an implementation-dependent point but there is no reason that the last few records should be left.
In Keras - It takes the remaining data points as the last step.
Addition of one extra elemtn increases the steps by 1.
Case-I - Data count is divisible by batch_size
epochs = 1
batch_size = 16
history = model.fit(x_train.iloc[:864], y_train[:864], batch_size=batch_size, epochs=epochs)
54/54 [==============================] - 0s 3ms/step
Case-II - Adding an extra data point
epochs = 1
batch_size = 16
history = model.fit(x_train.iloc[:865], y_train[:865], batch_size=batch_size, epochs=epochs)
55/55 [==============================] - 0s 3ms/step -
In your example too, same thing is happening
batch_size = 16
np.arange(0, x_train.shape[0], batch_size)
.....672, 688, 704, 720, 736, 752, 768,832, 848, 864])
When the last slice will happen, it will be a batch of 11 datapoints
len(x_train[864:880]) # Although x_train end at 875
11