Question

I have started learning MongoDB and stuck with an problem. I have a collection names as server_logs.

It contains the below columns (SOURCE_SERVER, SOURCE_PORT, DESTINATION_PORT, DESTINATION_SERVER, MBYTES).

I need the SOURCE_SERVER with the total amount to MBYTES transferred to each SOURCE_SERVER.(But here is one more point is that if any source_server is exist is also exist in target_server then their MBYTES will also added in each SOURCE_SERVER).

For example : I have below table structure

  SOURCE   S_PORT   DEST    D_PORT  MBYTES
1)server1   446    server2   555     10MB
2)server3   226    server1   666     2MB
3)server1   446    server3   226     5MB

I need below result:

Server1  17MB
Server3  7MB

I have created an query in mysql to calculate to top SOURCE as per MBYTES of data transferred to that SOURCE. It is working fine and i am getting required results in MYSQL through this query.

SELECT SOURCE, DEST, sum( logs.MBYTES )+(
    SELECT SUM(log.MBYTES) as sum
    from logs as log
    where logs.DEST=log.SOURCE
) AS MBYTES

I want this query in MongoDB. Please help..

Thanks in advance..

Was it helpful?

Solution

Though this sort of "self join" type of query might not seem immediately apparent to how you would do this with MongoDB, it can be done with the aggregation framework but just requires a little change in your thinking.

With your data in MongoDB in this form, which is still very much like the original SQL source:

{ 
    "source" : "server1",
    "s_port" : 446,
    "dest" : "server2", 
    "d_port" : 555, 
    "transferMB" : 10
},
{ 
    "source" : "server3",
    "s_port" : 226,
    "dest" : "server1",
    "d_port" : 666,
    "transferMB" : 2
},
{ 
    "source" : "server1",
    "s_port" : 446, 
    "dest" : "server3",
    "d_port" : 226,
    "transferMB" : 5
}

Working with a pre 2.6 version of MongoDB your query will look like this:

db.logs.aggregate([

    // Project a "type" tag in order to transform, then unwind
    { "$project": {
         "source": 1,
         "dest": 1,
         "transferMB": 1,
         "type": { "$cond": [ 1,[ "source", "dest" ],0] }
    }},
    { "$unwind": "$type" },

    // Map the "source" and "dest" servers onto the type, keep the source       
    { "$project": {
        "type": 1,
        "tag": { "$cond": [
            { "$eq": [ "$type", "source" ] },
            "$source",
            "$dest"
        ]},
        "mbytes": "$transferMB",
        "source": 1
    }},

    // Group for totals, keep an array of the "source" for each
    { "$group": {
        "_id": "$tag",
        "mbytes": { "$sum": "$mbytes" },
        "source": { "$addToSet": "$source" }
    }},


    // Unwind that array
    { "$unwind": "$source" },

    // Is our grouped tag one on the sources? Inner join simulate
    { "$project": {
        "mbytes": 1,
        "matched": { "$eq": [ "$source", "$_id" ] }
    }},

    // Filter the results that did not match
    { "$match": { "matched": true }},


    // Discard duplicates for each server tag
    { "$group": { 
        "_id": "$_id",
        "mbytes": { "$first": "$mbytes" }
    }}
])

For versions 2.6 and above, you get a few additional operators to streamline this, or a least makes use of different operators:

db.logs.aggregate([

    // Project a "type" tag in order to transform, then unwind
    { "$project": {
         "source": 1,
         "dest": 1,
         "transferMB": 1,
         "type": { "$literal": [ "source", "dest" ] }
    }},
    { "$unwind": "$type" },

    // Map the "source" and "dest" servers onto the type, keep the source       
    { "$project": {
        "type": 1,
        "tag": { "$cond": [
            { "$eq": [ "$type", "source" ] },
            "$source",
            "$dest"
        ]},
        "mbytes": "$transferMB",
        "source": 1
    }},

    // Group for totals, keep an array of the "source" for each
    { "$group": {
        "_id": "$tag",
        "mbytes": { "$sum": "$mbytes" },
        "source": { "$addToSet": "$source" }
    }},

    // Co-erce the server tag into an array ( of one element )
    { "$group": {
        "_id": "$_id",
        "mbytes": { "$first": "$mbytes" },
        "source": { "$first": "$source" },
        "tags": { "$push": "$_id" }
    }},

    // User set intersection to find common element count of arrays
    { "$project": {
       "mbytes": 1,
       "matched": { "$size": { 
           "$setIntersection": [
               "$source",
               "$tags"
           ]
       }}
    }},

    // Filter those that had nothing in common
    { "$match": { "matched": { "$gt": 0 } }},

    // Remove the un-required field
    { "$project": { "mbytes": 1 }}
])

Both forms produce the results:

{ "_id" : "server1", "mbytes" : 17 }
{ "_id" : "server3", "mbytes" : 7 }

The general principle in both is that by keeping a list of the valid "source" servers you can then "filter" the combined results so that only those that were listed as a source will have their total transfer recorded.

So there are a couple of techniques you can use to "re-shape", "combine" and "filter" your documents to get your desired result.

Read up more on the aggregation operators and also worth looking at for an introduction is the SQL to Aggregation mapping chart within the documentation to give you some idea of converting common operations.

Even browse tags here on Stack Overflow to find some interesting transformation operations.

OTHER TIPS

You can use aggregation framework for this:

db.logs.aggregate([
    {$group:{_id:"$SOURCE",MBYTES:{$sum:"$MBYTES"}}}
])

Assume that You have only numer values in MBYTES field. So as result You will have:

{
    _id: server1,
    MBYTES: 17
},
{
    _id: server3,
    MBYTES: 7
}

In case You have to count this also for server appears in DEST field You should use map-reduce method:

var mapF = function(){
    emit(this.SOURCE,this.MBYTES);
    emit(this.DEST,this.MBYTES);
}

var reduceF = function(serverId,mbytesValues){
    var reduced = {
        server: serverId,
        mbytes: 0
    };

    mbytesValues.forEach(function(value) {
        reduced.mbytes += value;
    });

    return reduced;
}

db.logs.mapReduce(mapF,reduceF,{out:"server_stats"});

After that You can find results in server_stats collection.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top