Question

I have time series data out of which only 10% is labeled into 10 classes. What should be the methods or models that I should look for to analyze this problem? I know the question is a bit vague but I am not looking for exact answers. I would appreciate any pointers or resources on how to approach these types of problems.

Was it helpful?

Solution

As mentioned in the comments semi-supervised methods are worth having a read. If the time-series are labelled per time-step you can make use of dynamical systems and Gaussian processes. Some methods that could be useful using NN's are:

  • Pretraining a neural network using an auto encoder. You can use an RNN to encode the temporal aspect of the series (https://papers.nips.cc/paper/5271-pre-training-of-recurrent-neural-networks-via-linear-autoencoders.pdf).
  • GAN's (Generative adversarial networks) currently very popular way of making use of your copious amounts of unlabeled data by jointly training two networks (supervised vs unsupervised) to play a zero sum game vs each other. https://arxiv.org/pdf/1611.09904.pdf (some work on RNN-GAN's again if you want to account on the temporal structure of your data). Note these are very hard to train.
  • Ladder Network https://arxiv.org/pdf/1507.02672.pdf . A very interesting semi-supervised method for regular fully connected networks. Which yielded very interesting results on the MNIST data set (only 300 labelled samples were used to reach state of the art results).

  • CNN's are known to be effective with time series too. Since they implement adaptive FIR filters on the data. It can be worth trialing in conjunction to RNN's.

All the methods above still require some reasonable amount of labelled data.

OTHER TIPS

I'm not an expert, but here my 2 preliminary thoughts:

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top