How to ignore vectors of zeros (i.e. paddings) in Keras?
-
10-12-2020 - |
Pergunta
I'm implementing a LSTM
model with Keras. My dataset is composed by words and each word is an 837 long vector. I grouped the words in groups of 20 and to do this I padded them: initially I had groups of words of variable length and the maximum group length that I found was 20, this is why I padded all groups to 20.
For example, a group of 5 words is:
[[x1,x2....x837],
[x1,x2....x837],
[x1,x2....x837],
[x1,x2....x837],
[x1,x2....x837]]
where xi
is the i-th feature of the vector.
To pad this group to a length of 20, I added 15 vectors composed by 837 feature with value equal to zeros:
[[0.......0],
............
............
[0........0]]
So, at the end, my group is of the form:
[[x1,x2....x837],
[x1,x2....x837],
[x1,x2....x837],
[x1,x2....x837],
[x1,x2....x837],
[0...........0],
..............
..............
[0...........0]]
How could I ignore the vectors of zeros during training?
Solução
You can use Masking
layer (with mask value of zero) before LSTM
layer in order to ignore all timesteps with only zeros (i.e. zeros vector). You can find more information about this layer on its documentation. Here is an example from documentation which uses Masking
layer:
Consider a Numpy data array
x
of shape(samples, timesteps, features)
, to be fed to anLSTM
layer. You want to mask sample #0 at timestep #3, and sample #2 at timestep #5, because you lack features for these sample timesteps. You can do:
- set
x[0, 3, :] = 0.
andx[2, 5, :] = 0.
insert a
Masking
layer withmask_value=0.
before theLSTM
layer:model = Sequential() model.add(Masking(mask_value=0., input_shape=(timesteps, features))) model.add(LSTM(32))