Question

I'm trying to run pagerank using mapreduce in mongodb.

My documents are in this format:

{
        "_id" : "u: 10000",
        "value" : [
                [
                        "u: 10000",
                        "s: 985272",
                        1
                ],
                [
                        "s: 985272",
                        "u: 10000",
                        1
                ],
                [
                        "u: 10000",
                        "s: 303770",
                        1
                ],
                [
                        "s: 303770",
                        "u: 10000",
                        1
                ]
        ]
}

Now I think the first step is to collect the links by key. However I have several outbound links per document. (These all happen to be bidirectional).

Here are my map and reduce functions:

m = function () {
    for (var i = 0; i < this.value.length; i++){
        var out = {};
        out.out = this.value[i][1];
        out.weight = this.value[i][2];
        emit(this.value[i][0], [out]);
    }
}

r = function(key, values){
    var result = {
      value: [] 
    };
    values.forEach(function(val) {
    result.value.push({out: val.out, weight: val.weight});
    });
    return result;
}

The problem is I'm not sure that emit is producing multiple emissions per document. As I get results like:

{
        "_id" : "s: 1000082",
        "value" : [
                {
                        "out" : "u: 37317",
                        "weight" : 1
                }
        ]
}

When I would expect multiple items per document.

Anyone have any ideas? Help would be appreciated!

EDIT:

I'm not completely satisfied, for example how do things like this work?. The reduce result doesn't at all look like the emit output.

Was it helpful?

Solution

The issue is that you are not mapping an array but your reduce is trying to push to an array.

If you want to have each key map to an array of "out" and "weight" pairs, then you need to emit an array containing that, and in your reduce you need to concat the arrays together.

Remember, the structure of the object returned by the reduce function must be identical to the structure of the map function's emitted value.

That means that when your map emits (key, value) the structure of "value" must be identical to the structure of what your reduce function returns as a result.

If you change your map function to this, so that value is a document with field "value" which is an array of documents each having field "out" and field "weight":

function () {
    for (var i = 0; i < this.value.length; i++) {
        key = this.value[i][0];
        value = {value:[{out:this.value[i][1], weight:this.value[i][2]}]};
        emit(key, value);
    }
}

and your reduce function to this, which constructs result to have identical structure to the value you emit above (since it just concatenates what it gets passed in for each key):

function (key, values) {
    result = {value:[]};
    for (var i in values) {
        result.value = values[i].value.concat(result.value);
    }
    return result;
}

you will then get what you are expecting back.

{
    "_id" : "s: 303770",
    "value" : {
        "value" : [
            {
                "out" : "u: 10000",
                "weight" : 1
            }
        ]
    }
}
{
    "_id" : "s: 985272",
    "value" : {
        "value" : [
            {
                "out" : "u: 10000",
                "weight" : 1
            }
        ]
    }
}
{
    "_id" : "u: 10000",
    "value" : {
        "value" : [
            {
                "out" : "s: 303770",
                "weight" : 1
            },
            {
                "out" : "s: 985272",
                "weight" : 1
            }
        ]
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top