Choosing an embedding feature dimension

https://datascience.stackexchange.com/questions/26763

31-10-2019
|

题

I'm trying to tackle a classification problem with a neural net tensor using flow. I have some continuous variable features and some categorical features. The continuous features are normalized using sklearn's StandardScaler. For the categorical features I am using a series of embedding features that I'm concatenating together with my continuous features.

The embedding features are created like so :

 airline = tf.feature_column.categorical_column_with_hash_bucket(
        'AIRLINE', hash_bucket_size=10)

then :

 tf.feature_column.embedding_column(airline, 8)

However I am having trouble choosing my embedding feature output size. I understand this transforms my sparse one hot encode "airline" feature into a float vector of size 8.

Is there a heuristic I can use to choose an embedding feature size ?

My neural net's accuracy remains stuck at 31%. It doesn't seem to be learning even after a 100 epochs. Could the size of the embedding features be a cause for such a behaviour ?

没有正确的解决方案

许可以下： CC-BY-SA 和归因

不隶属于 datascience.stackexchange