Question

Let A and B be two sets. I'm looking for really fast or elegant ways to compute the set difference (A - B or A \B, depending on your preference) between them. The two sets are stored and manipulated as Javascript arrays, as the title says.

Notes:

  • Gecko-specific tricks are okay
  • I'd prefer sticking to native functions (but I am open to a lightweight library if it's way faster)
  • I've seen, but not tested, JS.Set (see previous point)

Edit: I noticed a comment about sets containing duplicate elements. When I say "set" I'm referring to the mathematical definition, which means (among other things) that they do not contain duplicate elements.

Was it helpful?

Solution

if don't know if this is most effective, but perhaps the shortest

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(function(x) { return B.indexOf(x) < 0 })

console.log(diff);

Updated to ES6:

A = [1, 2, 3, 4];
B = [1, 3, 4, 7];

diff = A.filter(x => !B.includes(x) );

console.log(diff);

OTHER TIPS

Well, 7 years later, with ES6's Set object it's quite easy (but still not as compact as pythons A - B), and reportedly faster than indexOf for large arrays:

console.clear();
let a = new Set([1, 2, 3, 4]);
let b = new Set([5, 4, 3, 2]);


let a_minus_b = new Set([...a].filter(x => !b.has(x)));
let b_minus_a = new Set([...b].filter(x => !a.has(x)));
let a_intersect_b = new Set([...a].filter(x => b.has(x))); 

console.log([...a_minus_b]) // {1}
console.log([...b_minus_a]) // {5}
console.log([...a_intersect_b]) // {2,3,4}

You can use an object as a map to avoid linearly scanning B for each element of A as in user187291's answer:

function setMinus(A, B) {
    var map = {}, C = [];

    for(var i = B.length; i--; )
        map[B[i].toSource()] = null; // any other value would do

    for(var i = A.length; i--; ) {
        if(!map.hasOwnProperty(A[i].toSource()))
            C.push(A[i]);
    }

    return C;
}

The non-standard toSource() method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping the toSource() invocations.

The shortest, using jQuery, is:

var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];

var diff = $(A).not(B);

console.log(diff.toArray());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

I would hash the array B, then keep values from the array A not present in B:

function getHash(array){
  // Hash an array into a set of properties
  //
  // params:
  //   array - (array) (!nil) the array to hash
  //
  // return: (object)
  //   hash object with one property set to true for each value in the array

  var hash = {};
  for (var i=0; i<array.length; i++){
    hash[ array[i] ] = true;
  }
  return hash;
}

function getDifference(a, b){
  // compute the difference a\b
  //
  // params:
  //   a - (array) (!nil) first array as a set of values (no duplicates)
  //   b - (array) (!nil) second array as a set of values (no duplicates)
  //
  // return: (array)
  //   the set of values (no duplicates) in array a and not in b, 
  //   listed in the same order as in array a.

  var hash = getHash(b);
  var diff = [];
  for (var i=0; i<a.length; i++){
    var value = a[i];
    if ( !hash[value]){
      diff.push(value);
    }
  }
  return diff;
}

Incorporating the idea from Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (each and friends), we can get set difference, union and intersection in linear time in about 20 lines total:

var setOPs = {
  minusAB : function (a, b) {
    var h = {};
    b.each(function (v) { h[v] = true; });
    return a.filter(function (v) { return !h.hasOwnProperty(v); });
  },
  unionAB : function (a, b) {
    var h = {}, f = function (v) { h[v] = true; };
    a.each(f);
    b.each(f);
    return myUtils.keys(h);
  },
  intersectAB : function (a, b) {
    var h = {};
    a.each(function (v) { h[v] = 1; });
    b.each(function (v) { h[v] = (h[v] || 0) + 1; });
    var fnSel = function (v, count) { return count > 1; };
    var fnVal = function (v, c) { return v; };
    return myUtils.select(h, fnSel, fnVal);
  }
};

This assumes that each and filter are defined for arrays, and that we have two utility methods:

  • myUtils.keys(hash): returns an array with the keys of the hash

  • myUtils.select(hash, fnSelector, fnEvaluator): returns an array with the results of calling fnEvaluator on the key/value pairs for which fnSelector returns true.

The select() is loosely inspired by Common Lisp, and is merely filter() and map() rolled into one. (It would be better to have them defined on Object.prototype, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)

Performance: Testing with

var a = [], b = [];
for (var i = 100000; i--; ) {
  if (i % 2 !== 0) a.push(i);
  if (i % 3 !== 0) b.push(i);
}

gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)

I think that's decent payoff for 20 lines of code.

Using Underscore.js (Library for functional JS)

>>> var foo = [1,2,3]
>>> var bar = [1,2,4]
>>> _.difference(foo, bar);
[4]

Some simple functions, borrowing from @milan's answer:

const setDifference = (a, b) => new Set([...a].filter(x => !b.has(x)));
const setIntersection = (a, b) => new Set([...a].filter(x => b.has(x)));
const setUnion = (a, b) => new Set([...a, ...b]);

Usage:

const a = new Set([1, 2]);
const b = new Set([2, 3]);

setDifference(a, b); // Set { 1 }
setIntersection(a, b); // Set { 2 }
setUnion(a, b); // Set { 1, 2, 3 }

As for the fasted way, this isn't so elegant but I've run some tests to be sure. Loading one array as an object is far faster to process in large quantities:

var t, a, b, c, objA;

    // Fill some arrays to compare
a = Array(30000).fill(0).map(function(v,i) {
    return i.toFixed();
});
b = Array(20000).fill(0).map(function(v,i) {
    return (i*2).toFixed();
});

    // Simple indexOf inside filter
t = Date.now();
c = b.filter(function(v) { return a.indexOf(v) < 0; });
console.log('completed indexOf in %j ms with result %j length', Date.now() - t, c.length);

    // Load `a` as Object `A` first to avoid indexOf in filter
t = Date.now();
objA = {};
a.forEach(function(v) { objA[v] = true; });
c = b.filter(function(v) { return !objA[v]; });
console.log('completed Object in %j ms with result %j length', Date.now() - t, c.length);

Results:

completed indexOf in 1219 ms with result 5000 length
completed Object in 8 ms with result 5000 length

However, this works with strings only. If you plan to compare numbered sets you'll want to map results with parseFloat.

This works, but I think another one is much more shorter, and elegant too

A = [1, 'a', 'b', 12];
B = ['a', 3, 4, 'b'];

diff_set = {
    ar : {},
    diff : Array(),
    remove_set : function(a) { ar = a; return this; },
    remove: function (el) {
        if(ar.indexOf(el)<0) this.diff.push(el);
    }
}

A.forEach(diff_set.remove_set(B).remove,diff_set);
C = diff_set.diff;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top