Pergunta

A bidirectional RNN consists of two RNNs, one for the forward and another for the backward sequential directions, which outcome is concatenated at each time step. Would this configuration restrict the model to always use a fixed sequence length? Or would it still work as the unidirectional RNN, which can be applied to any sequence length?

This question was raised because the bidirectional architecture merges the output of both forward and backward RNNs at each time step. Thus, if the sequence length was 4, the outputs of both forward and backward RNNs would merge this way: 1th forward with 4th backward, 2nd forward with 3rd backward,... 4th forward with 1st backward. However, if a different sequence length was used this merge order would be modified:

Let's say the network was trained with sequence length 4, but at test time a sequence length of 5 was used. Merge would be: 1th forward with 5th backward, 2nd forward with 4th backward... 5th forward with 1th backward. Would this shift in merge order negatively affect the bidirectional RNN performance?

Foi útil?

Solução

The short answer is no, a bidirectional architecture will still take in a variable sequence length. To understand why, you should understand how padding works.

For example, let's say you are implementing a bidirectional LSTM-RNN in tensorflow on variable length time series data for multiple subjects. The input is a 3D array with shape: [n_subjects, [n_features, [n_timesteps...] ...] ...] so to ensure that the array has consistent dimensions, you pad the other subject's features up to the length of the subject with features measured for the longest period of time.

Let's say subject 1 has one feature with values = [22,20,19,21,33,22,44,21,19,26,27] measured at times = [0,1,2,3,4,5,6,7,8,9,10]. subject 2 has one feature with values = [21,12,22,30,13,42,20] measured at times = [0,1,2,3,4,5,6]. You would pad features for Subject 2 by extending the array so that the padded_values = [21,12,22,30,13,42,20,0,0,0,0] at times = [0,1,2,3,4,5,6,7,8,9,10], then do the same thing for every subsequent subject.

This means the number of timesteps for each subject can be variable, and the merge you refer to occurs with the dimension for that particular subject.

Below is an example of a bidirectional LSTM-RNN architecture for a model that predicts sleep stages for different subjects using biometric features measured over variable lengths of time.

enter image description here

Licenciado em: CC-BY-SA com atribuição
scroll top