Question

I know the STL has set_difference, but I need to just know if 2 sets are disjoint. I've profiled my code and this is slowing my app down quite a bit. Is there an easy way to see if 2 sets are disjoint, or do I need to just roll my own code?

EDIT: I also tried set_intersection but it took the same time...

Was it helpful?

Solution

Modified hjhill's code to reduce complexity by a factor of O(log n) by getting rid of the count() call.

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    if(set1.empty() || set2.empty()) return true;

    typename Set1::const_iterator 
        it1 = set1.begin(), 
        it1End = set1.end();
    typename Set2::const_iterator 
        it2 = set2.begin(), 
        it2End = set2.end();

    if(*it1 > *set2.rbegin() || *it2 > *set1.rbegin()) return true;

    while(it1 != it1End && it2 != it2End)
    {
        if(*it1 == *it2) return false;
        if(*it1 < *it2) { it1++; }
        else { it2++; }
    }

    return true;
}

I've complied and tested this code now so it should be good.

OTHER TIPS

Since std::set is a sorted container, you can compare the set boundaries to see if they possibly intersect, and if so, do the expensive STL operation.

Edit:

Here's where clam fishing gets serious ...

All the code posted so far tries to re-implement what's already in <algorithm>. Here's the gist of set_intersection:


  template<typename _InputIterator1, typename _InputIterator2,
           typename _OutputIterator>
    _OutputIterator
    set_intersection(_InputIterator1 __first1, _InputIterator1 __last1,
                     _InputIterator2 __first2, _InputIterator2 __last2,
                     _OutputIterator __result)
    {
      while (__first1 != __last1 && __first2 != __last2)
        if (*__first1 < *__first2)
          ++__first1;
        else if (*__first2 < *__first1)
          ++__first2;
        else
          {
            *__result = *__first1;
            ++__first1;
            ++__first2;
            ++__result;
          }
      return __result;
    }

Pretty optimal already except for the assignment to the output iterator, which is unnecessary for finding just the fact whether two sets are joint. This can be used though to flip a flag as follows:


  /// fake insert container
  template <typename T> struct intersect_flag
  {       
    typedef int iterator;
    typedef typename T::const_reference const_reference;

    bool flag_; // tells whether given sets intersect

    intersect_flag() : flag_( false ) {}

    iterator insert( iterator, const_reference )
    {       
      flag_ = true; return 0;
    }
  };

  // ...
  typedef std::set<std::string> my_set;

  my_set s0, s1;
  intersect_flag<my_set> intf;

  // ...        
  std::set_intersection( s0.begin(), s0.end(),
    s1.begin(), s1.end(), std::inserter( intf, 0 ));

  if ( intf.flag_ ) // sets intersect
  {
    // ...

This avoids copying the objects from the original sets and lets you reuse STL algorithms.

You could use set_intersection and test if the resulting set is empty, but I don't know if that's much faster.

The optimal implementation would stop the testing and return false as soon as the first equal element is found. I don't know of any ready-made solution for that, though

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    Set1::const_iterator it, itEnd = set1.end();
    for (it = set1.begin(); it != itEnd; ++it)
        if (set2.count(*it))
            return false;

    return true;
}

isn't too complex and should do the job nicely.

EDIT: If you want O(n) performance, use the slightly less compact

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    Set1::const_iterator it1 = set1.begin(), it1End = set1.end();
    if (it1 == it1End)
        return true; // first set empty => sets are disjoint

    Set2::const_iterator it2 = set2.begin(), it2End = set2.end();
    if (it2 == it2End)
        return true; // second set empty => sets are disjoint

    // first optimization: check if sets overlap (with O(1) complexity)
    Set1::const_iterator it1Last = it1End;
    if (*--it1Last < *it2)
        return true; // all elements in set1 < all elements in set2
    Set2::const_iterator it2Last = it2End;
    if (*--it2Last < *it1)
        return true; // all elements in set2 < all elements in set1

    // second optimization: begin scanning at the intersection point of the sets    
    it1 = set1.lower_bound(*it2);
    if (it1 == it1End)
        return true;
    it2 = set2.lower_bound(*it1);
    if (it2 == it2End)
        return true;

    // scan the (remaining part of the) sets (with O(n) complexity) 
    for(;;)
    {
        if (*it1 < *it2)
        {
            if (++it1 == it1End)
                return true;
        } 
        else if (*it2 < *it1)
        {
            if (++it2 == it2End)
                return true;
        }
        else
            return false;
    }
}

(modified Graphics Noob's modification further, using only operator <)

It is possible to get O(log(n)) by using the fact that both sets are sorted. Just use std::lower_bound instead of moving iterator by one.

enum Ordering { EQ = 0, LT = -1, GT = 1 };

template <typename A, typename B>
Ordering compare(const A &a, const B &b)
{
    return
        a == b ? EQ :
        a < b ? LT : GT;
}

template <typename SetA, typename SetB>
bool is_disjoint(const SetA &a, const SetB &b)
{
    auto it_a = a.begin();
    auto it_b = b.begin();
    while (it_a != a.end() && it_b != b.end())
    {
        switch (compare(*it_a, *it_b))
        {
        case EQ:
            return false;
        case LT:
            it_a = std::lower_bound(++it_a, a.end(), *it_b);
            break;
        case GT:
            it_b = std::lower_bound(++it_b, b.end(), *it_a);
            break;
        }
    }
    return true;
}

Use std::set_intersection, and see if the output is empty. You could check to see if the range of the two sets (area covered by the begin and end iterators) overlap first, but I suspect that set_intersection might already do this.

These are all O(n) operations, as is is_disjoint. If O(n) is unacceptable, you'll have to build some side storage to "keep track" of the disjoint-ness of the sets as you add/remove elements. Here, you would pay the overhead at insertion time (by updating your "isdisjoint" structure for each set modification), but is_disjoint could be implemented cheaply. This may or may not be a good tradeoff to make.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top