Need for Dense layer in Text Classifcation
-
12-12-2020 - |
Question
While creating a model for text classification, what is the need for a Dense Layer? I noticed in multiple examples the following is the structure. A softmax is what required right instead of the Dense Layer?
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(1)
])
Consider the following sentence in 5 class classification:
"movie is good" . The model structure could be:
a = activation_unit
emb= embedding_vector(word)
a0 -> emb("movie") ->a1->emb("is") ->a2->emb("good") ->a3, and
sample_y = softmax(np.dot(Wya,a3))
and
sample_y = [0.1,0.2,0.2,0.4,0.1]
which says the sentence belongs to "class 4". So where is the need for a "Dense Layer"? Can anyone please explain this
Solution
In neural networks meant for classification, you need a linear layer before the softmax to project the internal representation, which has some dimensionality $d_i$, to the output space, which has dimensionality $d_o$ equal the number of choices (5 in your case).
So you either place a Dense(5)
layer after the BiLSTM or you take the output of the BiLSTM "manually" and implement the projection.
The code above has some strange things:
- Uses
numpy.dot
to multiply the output of the BiLSTM. Is this a typo and you actually meanttf.dot
ortf.matmul
? - The model ends with a
tf.keras.layers.Dense(1)
, maybe because it was originally meant for binary classification. - Has both a Dense layer and then a dot product (i.e. matrix multiplication). These two operations are equivalent to a single Dense layer, so it is pointless to have both.
So yo answer your question: assuming that the np.dot
actually means a tf matrix multiplication, then the Dense layer in the model is pointless.
OTHER TIPS
Softmax is simply an activation function. In your example of 5 class classification, you will need a dense layer with 5 output neurons on which you can then apply softmax to obtain the probability for each class.