Question

I want to use Treap structure,but I don't familiar with this type of tree well.

I have two set and i want to write a method to compare them with Treap. This method should return a value that show the similarity of two set. (My work is retrieve a set that is mostly similar to an input set)

How can i do this work?

Thanks

Was it helpful?

Solution

Treap

A Treap is an example of a Balanced Binary Search Tree (you can use any of them for this problem). The expected height of a Treap containing n elements is O(logn) - expected, because Treap is a randomized data structure.

The following solution works for any Binary Search Tree, but it performs much better if the Balanced Binary Search Tree is used (e.g. Treap).

Measure

One measure of a similarity between two sets is the Jaccard Index. Let's call our sets A and B. The Jaccard Index is defined by:

enter image description here

So to compute the Jaccard Index of A and B, we must compute the sum and the intersection of A and B.

Operations

Let's assume that A and B are implemented as Balanced Binary Search Trees.

A Binary Search Tree could support many operations, but three of them are sufficient for this problem:

  • find(x) - returns true if an only if x is in the Tree
  • insert(x) - inserts x in the Tree, if x is not in the Tree before this operation
  • size() - returns the number of elements in the Tree

In the Balanced Binary Search Tree, find(x) and insert(x) have O(logn) running time, where n is the number of elements in the Tree.

In addition, during insertion, we can keep track of the size of the Tree, so size() can be implemented in a constant time.

Of course, we could iterate over all elements of our Tree.

Pseudocode

Step 1.

sum(A, B):

    C = A 

    foreach x in B:
        C.insert(x)

    return C

Step 2.

intersection(A, B):

    C = new BalancedBinarySearchTree()

    foreach x in B:
        if(A.find(x) == true):
            C.insert(x)

    return C

Step 3.

Calculate the Jaccard index of A and B:

JaccardIndex(A, B)
    S = sum(A, B)
    I = intersect(A, B)

    return I.size() / S.size()

Complexity

Let's assume that:

n = A.size()
m = B.size()

Then the complexity of computing the sum is O(n + m * log(n + m)), and the complexity of calculating the intersection is O(m * log n).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top