Question

I am looking for clustering algorithm which can handle with multiple time series information for each objects.

For example, for company "A" we have time series of 3 features(ex. income, sales, inventory)
At the same way, company "B" also has same time series of same features. and so on..

Then, how we can make cluster between set of company? Is there some wise way to handle this?

Was it helpful?

Solution

A lot of clustering algorithms ask you to provide some measure of the similarity or distance between two points. It is really up to you to decide what features are important and what the distance really is. One way forwards would be to use the correlation between two time series. This gives you a similarity. If you have to convert this to a distance I would use sqrt(1-r), where r is the correlation, because if you look e.g. at the equation at the bottom of http://www.analytictech.com/mb876/handouts/distance_and_correlation.htm you can see that this is proportional to a distance if you have points in n-dimensional space. If you have three different time series (income, sales, inventory) I would use the sum of the three distances worked out from the correlations between the two time series of the same type.

Another option, especially if the time series are not very long, would be to regard a time series of length n as a point in n-dimensional space and feed this into the clustering algorithm, or use http://en.wikipedia.org/wiki/Principal_component_analysis to reduce the n dimensions down to 1 by looking at the most significant components (while you are doing this, it never hurts to plot the points using the least significant components and investigate points that stand out from the others. Points where the data is in error sometimes stand out here).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top