Domanda

I have two arrays with user id's and I want to check different items in them.

arr1 = [123, 456, 789];
arr2 = [123, 456, 789, 098];

The problem is: these arrays can have 10 or 20 millions of items.

I'm trying with underscore.difference() but it took 10 mins to complete.

Is there any faster way to do this?

È stato utile?

Soluzione 3

This considers you do not have the numbers 0 or 1 in the arrays:

var arr1 = [123, 456, 789,3],
    arr2 = [123, 456, 789, 098],    
    has = {},
    different=[],
    length1=arr1.length,
    length2=arr2.length;



for(var i=0;i<length1;i++){
        has[arr1[i]]=true;
}
for(var i=0;i<length2;i++){
    var val=arr2[i];
    if(has[val] === undefined){
       has[val]=val;
    }
    else{
        if(has[val]!=val){
            has[val]=false;
        }
   }
}
for(var i in has){
    if (has[i]) different.push(i);
}

If you want to check also for 0 and 1:

for(var i=0;i<length1;i++){
    has[arr1[i]]=NaN;
}
for(var i=0;i<length2;i++){
    var val=arr2[i];
    if(has[val] === undefined){
        has[val]=null;
    }
    else{
        if(has[val]!=null){
            has[val]=true;
        }
    }
}
for(var i in has){
    if (!has[i]) different.push(i);
}

Altri suggerimenti

How about converting the arrays to object to reduce the sorting complexity:

var arr1 = [123, 456, 789], arr2 = [123, 456, 789, 098];

function toObject(arr){
    return arr.reduce(function(o, v, i) {
      o[v] = i;
      return o;
    }, {});
}

var o1 = toObject(arr1), o2 = toObject(arr2), diff = [];

for(var prop in o2){
    if(o1[prop] === undefined)
        diff.push(prop);
}

console.log(diff);

You would obviously need to start on the biggest set.

http://jsfiddle.net/sHUw5/

Another thing to consider is to sort your collections and do a binary search to reduce the complexity from (O)N to (O)log2N for each array (if I'm thinking straight).

Use native js, rather than a library that tries to accommodate lots of scenarios / inputs.

  • Skip all the function calls and scope changes.
  • Minimize property lookup/namespace walking
  • Don't keep getting the array length on each loop

Simple optimization:

var array1 = [];
var array2 = [];

var difference = [];

for(var i = 0; len = array1.length; i < len; i++)
{
    var value = array1[i];

    if(value == array2[i])
    {
        continue;
    }

    if(array2.indexOf(value) == -1)
    {
        difference.push(value);
    }
}

here is a fast way of cheating the nested iteration that _.difference will invoke:

var arr1 = [123, 456, 789],
    arr2 = [123, 456, 789, 098],    
     has = {};

arr1.forEach(function(a){ this[a]=1;}, has);
alert(  arr2.filter(function(a){return !this[a]; }, has)   );

by using this in the iteration, we hand a pure function to JS that can be executed at the maximum possible speed.

note that this won't work for arrays of objects, or mixed-type arrays like [1, "1"], but it should work for the problem described and demonstrated by the question.

edit: it you want bi-directional compares (like arr1 having, arr2 missing or vice-versa), reverse and repeat the code above. you'll still only be at 40million computations compared to 100trillion that an indexOf()-using method will cost...

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top