Pregunta

I have two arrays with user id's and I want to check different items in them.

arr1 = [123, 456, 789];
arr2 = [123, 456, 789, 098];

The problem is: these arrays can have 10 or 20 millions of items.

I'm trying with underscore.difference() but it took 10 mins to complete.

Is there any faster way to do this?

¿Fue útil?

Solución 3

This considers you do not have the numbers 0 or 1 in the arrays:

var arr1 = [123, 456, 789,3],
    arr2 = [123, 456, 789, 098],    
    has = {},
    different=[],
    length1=arr1.length,
    length2=arr2.length;



for(var i=0;i<length1;i++){
        has[arr1[i]]=true;
}
for(var i=0;i<length2;i++){
    var val=arr2[i];
    if(has[val] === undefined){
       has[val]=val;
    }
    else{
        if(has[val]!=val){
            has[val]=false;
        }
   }
}
for(var i in has){
    if (has[i]) different.push(i);
}

If you want to check also for 0 and 1:

for(var i=0;i<length1;i++){
    has[arr1[i]]=NaN;
}
for(var i=0;i<length2;i++){
    var val=arr2[i];
    if(has[val] === undefined){
        has[val]=null;
    }
    else{
        if(has[val]!=null){
            has[val]=true;
        }
    }
}
for(var i in has){
    if (!has[i]) different.push(i);
}

Otros consejos

How about converting the arrays to object to reduce the sorting complexity:

var arr1 = [123, 456, 789], arr2 = [123, 456, 789, 098];

function toObject(arr){
    return arr.reduce(function(o, v, i) {
      o[v] = i;
      return o;
    }, {});
}

var o1 = toObject(arr1), o2 = toObject(arr2), diff = [];

for(var prop in o2){
    if(o1[prop] === undefined)
        diff.push(prop);
}

console.log(diff);

You would obviously need to start on the biggest set.

http://jsfiddle.net/sHUw5/

Another thing to consider is to sort your collections and do a binary search to reduce the complexity from (O)N to (O)log2N for each array (if I'm thinking straight).

Use native js, rather than a library that tries to accommodate lots of scenarios / inputs.

  • Skip all the function calls and scope changes.
  • Minimize property lookup/namespace walking
  • Don't keep getting the array length on each loop

Simple optimization:

var array1 = [];
var array2 = [];

var difference = [];

for(var i = 0; len = array1.length; i < len; i++)
{
    var value = array1[i];

    if(value == array2[i])
    {
        continue;
    }

    if(array2.indexOf(value) == -1)
    {
        difference.push(value);
    }
}

here is a fast way of cheating the nested iteration that _.difference will invoke:

var arr1 = [123, 456, 789],
    arr2 = [123, 456, 789, 098],    
     has = {};

arr1.forEach(function(a){ this[a]=1;}, has);
alert(  arr2.filter(function(a){return !this[a]; }, has)   );

by using this in the iteration, we hand a pure function to JS that can be executed at the maximum possible speed.

note that this won't work for arrays of objects, or mixed-type arrays like [1, "1"], but it should work for the problem described and demonstrated by the question.

edit: it you want bi-directional compares (like arr1 having, arr2 missing or vice-versa), reverse and repeat the code above. you'll still only be at 40million computations compared to 100trillion that an indexOf()-using method will cost...

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top