문제

I have a set of data composed of time series (8 points) with about 40 dimensions (so each time series is 8 by 40). The corresponding ouput (the possible outcomes for the categories ) is eitheir 0 or 1.

What would be the best approach to design a classifier for time series with multiple dimensions ?

My initial strategy was to extract features from those time series : mean, std, maximum variation for each dimension. I obtained a dataset which I used to train a RandomTreeForest. Being aware of the total naivety of this, and after obtaining poor results, I am now looking for a more improved model.

My leads are the following : classify the series for each dimension (using KNN algorithm and DWT), reduce the dimensionality with PCA and use a final classifier along the multidimensions categories. Being relatively new to ML, I don't know if I am totally wrong.

도움이 되었습니까?

해결책

You're on the right track. Look at calculating a few more features, both in time and frequency domain. As long as number of samples >> number of features, you aren't likely to overfit. Is there any literature on a similar problem? If so, that always provides a great starting point.

Try a boosted tree classifier, like xgboost or LightGBM. They tend to be easier to tune hyperparameters, and provide good results with default parameters. Both Random Forest and boosted tree classifiers can return feature importance, so you can see which features are relevant to the problem. You can also try removing features to check for any covariance.

Most importantly though, if your results are unexpectedly poor, ensure your problem is properly defined. Manually check through your results to make sure there aren't any bugs in your pipeline.

다른 팁

If you're in Python, there are a couple of packages that can automatically extract hundreds or thousands of features from your timeseries, correlate them with your labels, choose the most significant, and train models for you.

https://github.com/blue-yonder/tsfresh

https://github.com/rtavenar/tslearn

You can add more features to your dataset as below.

  1. You can try nolds package if your data is from a highly non linear process.

  2. max, min, mean , skew, kurtosis, and if possible some rolling stats.

I am working on something similar, and I asked a related question.

I do agree with Jan van der Vegt, standardization (e.g, [-1, 1]) or normalization N(0, 1) combined with the activation function can be very important with neural networks. I would check the dissertation of Pichaid Varoonchotikul: “Flood forecasting using artificial neural networks” for the ins and outs of ANNs. It has very interesting caveats. Anyway, I'm use to try first without, but when results are unsatisfactory, I'm use to made trials with either both. Not sure it will help but I would check the R package TSclust and related docs. The authors are very kindly and they will help you to find specific models to do so. They are experts on time series analyses! Good luck!

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top