Question

I'm working on a binary classification problem, and if each data instance has several time series of different metrics and there're also some other attributes. How to deal with the time series, treat them as they are separate attributes? But that loses the information related to time dimension.

To make it more concrete, an example of the training instance will look like this:

ID MetricA_Day1 MetricA_Day2 ..... MetricA_Day31 MetricB_Day1....MetricB_Day2 AttributeC AttributeD AttributeE

Is there a best practice on how to handle this?

Was it helpful?

Solution

First, the "generic" answer: not, there is no such thing at best practise in machine learning. In fact, there are only bad ones and currently popular ones, but not "the best ones". Even for binary, linear classification it is data dependent whether it is better to train the geometrical model (SVM?) or a probabilistic one. Even if we narrow down to some one particular model (lets say neural network) there is no best practise even in terms of data scaling. Should it be in the [0,1]? Or maybe [-1,1]? What about mean 0 and variance 1? And what with the covariance matrix?

Now getting to details - your question gives almost no information regarding the data. Saying "I have a few attributes and few time series" is far from giving any chance for really insightfull and valuable answer. But it does not seem like a task that would have any definite answer, it is a long term-research task.

What you could search for during research?

  • Are such rich representation really required here? People tend to think that "more is better", which in terms of ML is not neceserly true. Simplier, more informative representation is often much better Maybe try to represent your time series as some general features, like sime frequencies characteristics, amplitudes etc.
  • Kernel methods can use very broad scope of similarity measures, in particular you could define different measure for each "part" of your representation - so time-series would use the specialistic kernel, while simple attributes some simple one, and in general you just use the sum of those kernels (which is also a valid kernel)
  • Analogous approach can be performed with distance based methods like kNN by defining various distances for eeach dimension
  • You can also train separate classifier for each "part" of the representation - so again - time series have some specialized time series classifier etc. and use them as an ensamble (there are many methods of working with such ensambles, it is currently quite hot topic in ML)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top