Similarity distance measures

Question 1

It sounds like you need cosine similarity measure:

similarity = cos(v1, v2) = v1 * v2 / (|v1| |v2|)

where v1 * v2 is dot product between v1 and v2:

v1 * v2 = v1[1]*v2[1] + v1[2]*v2[2] + ... + v1[n]*v2[n]

Essentially, dot product shows how many elements in both vectors have 1 at the same position: if v1[k] == 1 and v2[k] == 1, then final sum (and thus similarity) is increased, otherwise it isn't changed.

You can use dot product itself, but sometimes you would want final similarity to be normalized, e.g. be between 0 and 1. In this case you can divide dot product of v1 and v2 by their lengths - |v1| and |v2|. Essentially, vector length is square root of dot product of the vector with itself:

|v| = sqrt(v[1]*v[1] + v[2]*v[2] + ... + v[n]*v[n])

Having all of these, it's easy to implement cosine distance as follows (example in Python):

from math import sqrt

def dot(v1, v2):
    return sum(x*y for x, y in zip(v1, v2))

def length(v):
    return sqrt(dot(v, v))

def sim(v1, v2): 
    return dot(v1, v2) / (length(v1) * length(v2))

Note, that I described similarity (how much two vectors are close to each other), not distance (how far they are). If you need exactly distance, you can calculate it as dist = 1 / sim.

Question 2

There are literally hundreds of distance functions, including distance measures for sets, such as Dice and Jaccard.

You may want to get the book "Dictionary of Distance Functions", it's pretty good.

Question 3

Case 1: IF the position of the ones in the series is relevant, THEN:

I recommend Dynamic Time Warping Distance (DTW). In application of time-series data it has proven incredibly useful.

To check whether it can be applied to your problem, I used the code presented here: https://jeremykun.com/2012/07/25/dynamic-time-warping/

d13 = dynamicTimeWarp(v1,v3)
d12 = dynamicTimeWarp(v1,v2)
d23 = dynamicTimeWarp(v2,v3)

d23,d12,d13
(3, 1, 3)

As you see, d12 is lowest, therefore v1 and v2 are most similiar. Further information of DTW can be found anywhere in this forum and for research papers, I recommend anything by Eamonn Keogh.

Case 2: Position of ones is not relevant:

I simply agree Deepu for taking the average as a feature.

Question 4

I think you can simply take the average of the values in each set. For example v1 here will have an average 0.4545, average of v2 is 0.6363 and average of v3 is 0.0909. If the only possible values in the set are 0 and 1, then the sets with equal or nearly equal values will serve your purpose.

Question 5

There is a web site introducing the various type of vector similarity methods

http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/

I think it will help you to decide what similarity you should use

.

Briefly explaining the above link, there are five popular similarity measurement between vectors

Euclidean Distance - Simply the absolute distance between the vectors
Cosine - Cosine degree(theta) difference between the vectors
Manhattan -the sum of the absolute differences of their Cartesian coordinates, for example,

In a plane with p1 at (x1, y1) and p2 at (x2, y2). Manhattan distance = |x1 – x2| + |y1 – y2|

Minkowski - generalized metric form of Euclidean distance and Manhattan distance
Jaccard - Similarity between the objects. So each feature in one set will be compared to another set and finds out its difference

.

With the keyword above you can google for further explanation. Hope it would help you