Question

In a typical supervised learning setting with a few positive and a few negative examples, it is clear that unlabeled data carries some information that can benefit learning and that is not captured in the labeled data. For example one can estimate mean values, bounds and some other geometrical characteristics of the data-set with much higher precision if you do not discard the (massive) unlabeled data.

On the other hand, the most common ML algorithms from Neural Networks to SVM do not take advantage of this information (at least in their standard, most common form). My question:

  • Is there any theoretical framework where unlabeled data is treated in the supervised setting?

I can think of semi-supervised ways to approach this (first cluster and then label the clusters). Are there any other?

Was it helpful?

Solution

In a neural network model, you can use autoencoders.

The basic idea of an autoencoder is to learn a hidden layer of features by creating a network that simply copies the input vector at the output. So the training features and training "labels" are initially identical, no supervised labels are required. This can work using a classic triangular network architecture with progressively smaller layers that capture a compressed and hopefully useful set of derived features. The network's hidden layers learn representations based on the larger unsupervised data set. These layers can then be used to initialise a regular supervised learning network to be trained using the actual labels.

A similar idea is pre-training layers using a Restricted Boltzmann Machine, which can be used in a very similar way, although based on different principles.

OTHER TIPS

You already dropped the key word "semi-supervised" in your question. Indeed, semi-supervised learning is the answer to your question. Search for this term with your favourite search engine or library catalog to find out about algorithms for semi-supervised learning.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top