Question

Currently, I have 6 curves shown in 6 different colors as below. enter image description here The 6 curves are in fact generated by 6 trials of one same experiment. That means, ideally they should be the same curve, but due to the noise and different trial participants, they just look similar but not exactly the same.

Now I wish to create an algorithm that is able to identify that the 6 curves are essentially the same and cluster them together into one cluster. What similarity metrics should I use?

Note:

  1. The x-axis does NOT matter at all! I simply align them together for visual purpose. Thus, feel free to left/right shift the curves, if doing so helps.
  2. "Sub-curves" that are part of the curves may appear. The "belongingness" is important and thus needs identifying as well. But again, left/right shifting is allowed.

I have attemped to learn some of the clustering algorithm, such as DBSCAN, K-means, Fuzzy C-means, etc. But I don't see their appropriateness in this case, because the "belongingness" needs to be spotted!

Any suggestions or comments are well welcomed. I understand that it is hard to give some exact solutions to this question. I am only expecting some enlightening suggestions here.

Was it helpful?

Solution

Have a look at time series similarity functions, such as dynamic time warping.

They can be used with e.g. DBSCAN but NOT with k-means (you cannot compute a reasonable "mean" for these distances; k-means is really designed for squared Euclidean distances).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top