CouchDB Map/Reduce function to get average based on group=true and then get the max average of all grouped values

StackOverflow https://stackoverflow.com/questions/23667862

  •  22-07-2023
  •  | 
  •  

I have a map function to emit the time and value and lets say I have 4 docs in this format.

Doc1  ->(time1, 20)
Doc2  ->(time1, 60)
Doc1  ->(time2, 30)
Doc2  ->(time2, 15)

What I need is group by time and then get the average and then return which average is higher.

So, with grouping, I get A = (val1+val2)/2 and B= (val3+val4)/2

I want to check which is a higher number between A and B and return that. So, in the above example, the max value returned would be A = (20+60)/2 = 40.

How do I write a reduce function that gives me that.

有帮助吗?

解决方案 2

here are the docs I have. you will see I have 2 docs per time (2nd key)

{"id":"server_host177.lss.emc.com_2014-05-15_11:39:48","key":["SRMSuite_3.0.2_test1","2014-05-14 11:00:00"],"value":20},
{"id":"server_host180.lss.emc.com_2014-05-15_11:39:48","key":["SRMSuite_3.0.2_test1","2014-05-14 11:00:00"],"value":20},
{"id":"server_host090.lss.emc.com_2014-05-15_11:39:55","key":["SRMSuite_3.0.2_test1","2014-05-14 12:00:00"],"value":22},
{"id":"server_host091.lss.emc.com_2014-05-15_11:39:55","key":["SRMSuite_3.0.2_test1","2014-05-14 12:00:00"],"value":20},
{"id":"server_host177.lss.emc.com_2014-05-15_11:39:48","key":["SRMSuite_3.0.2_test1","2014-05-14 13:00:00"],"value":26},
{"id":"server_host180.lss.emc.com_2014-05-15_11:39:48","key":["SRMSuite_3.0.2_test1","2014-05-14 13:00:00"],"value":20},
{"id":"server_host090.lss.emc.com_2014-05-15_11:39:55","key":["SRMSuite_3.0.2_test1","2014-05-14 14:00:00"],"value":22},
{"id":"server_host091.lss.emc.com_2014-05-15_11:39:55","key":["SRMSuite_3.0.2_test1","2014-05-14 14:00:00"],"value":20}

I want to get the avg value for each time. Here is my reduce function:

"maxcpu": {
         "map": "function(doc) { if ((doc.type == 'performance_stats'))  emit([doc.test_id, doc.start_time], doc.CPU) }",
         "reduce": "function(keys, values) "
                   "{ "
                            "avg = Math.round(sum(values)/values.length);"
                            "return(avg)"
                   " }"
            }

So, you will have 4 rows:

http:yourhostip:5984/longevity/_design/perfstats/_view/maxcpu?group=true

{"rows":[
{"key":["SRMSuite_3.0.2_test1","2014-05-14 11:00:00"],"value":20},
{"key":["SRMSuite_3.0.2_test1","2014-05-14 12:00:00"],"value":21},
{"key":["SRMSuite_3.0.2_test1","2014-05-14 13:00:00"],"value":23},
{"key":["SRMSuite_3.0.2_test1","2014-05-14 14:00:00"],"value":21}}

Now to report just the max value which is 23, we need to use list function. Got it from- http://geekiriki.blogspot.com/2010/08/couchdb-using-list-functions-to-sort.html

    "lists":{
        "sort":"function(head, req) {"
                "var row;"
                "var rows=[];"
                "while(row = getRow()) {"
                    "rows.push(row)"
                "};"
                "rows.sort(function(a,b) {"
                "return b.value-a.value"
                "});"
                "send(JSON.stringify({\"rows\" : rows[0]}))"
        "}"

Then this gets you what you need-

 http://yourhostip:5984/longevity/_design/perfstats/_list/sort/maxcpu?group=true

{"rows":{"key":["SRMSuite_3.0.2_test1","2014-05-14 13:00:00"],"value":87}}

其他提示

This one is a bit tricky, as you're comparing values across multiple documents in multiple ways. Here is my best attempt in a short amount of time, I'm sure others can improve upon this to get it closer to your ultimate goal.

I created 2 documents: (your example wasn't very clear, so I made my best guess)

{
   "times": [
       {
           "ts": 1388556000000,
           "value": 30
       },
       {
           "ts": 1391234400000,
           "value": 15
       }
   ]
}

{
   "times": [
       {
           "ts": 1388556000000,
           "value": 20
       },
       {
           "ts": 1391234400000,
           "value": 30
       }
   ]
}

My map function looks like this: (basically, for each time in each document, I'll emit it's timestamp and value)

function(doc) {
  doc.times.forEach(function (time) {
    emit(time.ts, time.value);
  });
}

and my corresponding reduce function looks like this:

_stats

This is a built-in reduce function, it's written in Erlang so it's performant and efficient. This particular reduce function exposes statistics about the emitted values, namely max, sum and count (the latter 2 can be used to compute an average)

If you call this view using group=true, (reduce=true is implied) you'll get results that look like:

{
  "rows": [
    {
      "key": 1388556000000,
      "value": {
        "sum": 50,
        "count": 2,
        "min": 20,
        "max": 30,
        "sumsqr": 1300
      }
    },
    {
      "key": 1391234400000,
      "value": {
        "sum": 45,
        "count": 2,
        "min": 15,
        "max": 30,
        "sumsqr": 1125
      }
    }
  ]
}

Like I said, this isn't a complete solution, but I meant to introduce 3 main concepts.

1) emitting multiple times for a single document 2) the group=true view query param 3) the built-in reduce function

I suspect that a computation like this will be hard to compute in a single map-reduce, but I wouldn't say it's impossible.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top