Similarity algorithm (mathematics) of sampled signals

https://stackoverflow.com/questions/14087045

13-12-2021
|

Question

Let's say I have sampled some signals and constucted a vector of the samples for each. What is the most efficent way to calculate (dis)similarity of those vectors? Note that offset of the sampling must not count, for instance sample-vectors of sin and cos -signals should be considered similar since in sequential manner they are exately the same.

There is a simple way of doing this by "rolling" the units of the other vector, calculating euclidian distance for each roll-point and finally choosing the best match (smallest distance). This solution works fine since the only target for me is to find most similar sample-vector for input signal from a vector pool.

However, the solution above is also very inefficent when the dimension of the vectors grow. Compared to "non-sequential vector matching" for N-dimensional vector, the sequential one would have N-times more vector distance calculations to do.

Is there any higher/better mathematics/algorithms to compare two sequences with differing offsets?

Use case for this would be in sequence similarity visualization with SOM.

EDIT: How about comparing each vector's integrals and entropies? Both of them are "sequence-safe" (= time-invariant?) and very fast to calculate but I doubt they alone are enough to distinguish all possible signals from each other. Is there something else that could be used in addition for these?

EDIT2: Victor Zamanian's reply isn't directly the answer but it gave me an idea that might be. The solution might be to sample the original signals by calculating their Fourier transform coefficents and inserting those into sample vectors. First element (X_0) is the mean or "level" of the signal and the following (X_n) can be directly used to compare similarity with some other sample vector. The smaller the n is, the more it should have effect in similarity calculations, since the more coefficents there has been calculated with FT, the more accurate representation will the FT'd signal be. This brings up an bonus question:

Let's say we have FT-6 sampled vectors (values just fell out of the sky)

X = {4, 15, 10, 8, 11, 7}
Y = {4, 16, 9, 15, 62, 7}

Similarity value of these vectors could MAYBE be calculated like this: |16-15| + (|10 - 9| / 2 ) + (|8 - 15| / 3) + (|11-62| / 4) + (|7-7| / 5)

Those bolded ones are the bonus question. Is there some coefficents/some other way to know how much each FT-coefficent has effect on the similarity in relation to other coefficents?

Solution

If I understand your question correctly, maybe you would be interested in some type of cross-correlation implementation? I'm not sure if it's the most efficient thing to do or fits the purpose, but I thought I would mention it since it seems relevant.

Edit: Maybe a Fast Fourier Transform (FFT) could be an option? Fourier transforms are great for distinguishing signals from each other and I believe helpful to find similar signals too. E.g. a sine and a cosine wave would be identical in the real plane, and just have different imaginary parts (phase). FFTs can be done in O(N log N).

OTHER TIPS

Google "translation invariant signal classificiation" and you'll find things like these.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow