Can train a neural network with a training set with a label and test it with a dataset without a label?

https://stackoverflow.com/questions/20634116

18-09-2022
|

質問

I have a question that has been very persistent for me. Can I train a neural network with a labelled dataset (i.e.: a dataset with the information of the target) and then apply another dataset without label?

I want to train the network with the examples I have, but then in the real situation want it to classify the examples (that has no target associated). For example:

Training SET:

Var1  Var2  Var3  Var4  Target
1     2     3      1     blue

Test set (don’t have the target, in fact that’s what I want to know)

Var1  Var2  Var3  Var4  
 1     2     3      1

The suppose prediction would have to be blue.

I'm using rapid miner to test neural networks, but I soon understood that I’m not able to apply this test set because it misses the label.

How can I address my problem then? I wonder if I need to explore the unsupervised neural networks for this problem, but I honestly don’t think so.

kind regards.

解決

For supervised learning you use labeled training set to train whatever model you have. You can then use the model to predict labels for an unlabeled set.

If you happen to have labels for the test set too, you can compare the predicted values to the test set labels. This way you can assess the prediction error (i.e. test the model, hence the name - test set)

If you are however only interested in the prediction, you definitely don't need the labels.

他のヒント

Applying a classifier (including a nn like mlp) on a dataset without label is the actual use of classifiers. But when you say test you mean you want to see quality measures like false alarm rate or precision and you need label to do this.

Assuming you want to train a classifier and then use it in a real case, I highly recommend you to test it with labeled data before and try to use model with most desirable precision. Otherwise you may have a large number of false predictions which of course will bother you.

If you have just a labeled dataset with few samples in it, you could try k-fold validation.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow