SQL/MySQL SELECT and average over certain values

https://stackoverflow.com/questions/653800

19-08-2019
|

Question

I have to work with an analysis tool that measures the Web Service calls to a server per hour. These measurments are inserted in a database. The following is a snippet of such a measurement:

mysql> SELECT * FROM sample s LIMIT 4;  
+---------+------+-------+  
| service | hour | calls |  
+---------+------+-------+  
| WS04    |   04 |    24 |  
| WS12    |   11 |    89 |  
| WSI64   |   03 |    35 |  
| WSX52   |   01 |    25 |  
+---------+------+-------+  
4 rows in set (0.00 sec)

As the end result I would like to know the sum of all web services completions per hour of day. Obviously, this can be easily done with SUM() and GROUP BY:

mysql> SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
+------+------------+  
| hour | SUM(calls) |  
+------+------------+  
|   00 |        634 |  
|   01 |        642 |  
|   02 |        633 |  
|   03 |        624 |  
|   04 |        420 |  
|   05 |        479 |  
|   06 |        428 |  
|   07 |        424 |  
|   08 |        473 |  
|   09 |        434 |  
|   10 |        485 |  
|   11 |        567 |  
|   12 |        526 |  
|   13 |        513 |  
|   14 |        555 |  
|   15 |        679 |  
|   16 |        624 |  
|   17 |        796 |  
|   18 |        752 |  
|   19 |        843 |  
|   20 |        827 |  
|   21 |        774 |  
|   22 |        647 |  
|   23 |        533 |  
+------+------------+  
12 rows in set (0.00 sec)

My problem is that in old sets, the web service calls in the hours from [00-11] were already summed up. The simple statement as listed above would therefore lead to

mysql> SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
+------+------------+  
| hour | SUM(calls) |  
+------+------------+  
|   00 |       6243 | <------ sum of hours 00-11!  
|   12 |        526 |  
|   13 |        513 |  
|   14 |        555 |  
|   15 |        679 |  
|   16 |        624 |  
|   17 |        796 |  
|   18 |        752 |  
|   19 |        843 |  
|   20 |        827 |  
|   21 |        774 |  
|   22 |        647 |  
|   23 |        533 |  
+------+------------+  
13 rows in set (0.00 sec)

This is an undesirable result. To make the old sets [00,12,...,23] comparable to the new sets [00,01,...,23] I would like to have one statement that averages the value of [00] and distributes it over the missing hours, e.g.:

+------+------------+  
| hour | SUM(calls) |  
+------+------------+  
|   00 |    6243/11 |  
|   01 |    6243/11 |  
            [...]  
|   12 |        526 |  
            [...]  
|   23 |        533 |  
+------+------------+

I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.

Any ideas? Cause this is driving me crazy :P

Solution

You'll need a rowset with 12 rows in it to make a join.

The most simple solution will be combining 12 SELECT statements in a union:

SELECT  COALESCE(morning.hour, sample.hour), 
        SUM(CASE WHEN morning.hour IS NULL THEN calls ELSE calls / 12 END) AS calls
FROM    sample
LEFT JOIN
        (
        SELECT 0 AS hour
        UNION ALL
        SELECT 1
        ...
        UNION ALL
        SELECT 11
        ) AS morning
ON      sample.hour = 0 AND sample.service IN ('old_service1', 'old_service2')
GROUP BY
        1

OTHER TIPS

You're probably best doing this with temp tables / views (I'd recommend a view over a temp table) or you will end up with a nasty case specific statement that will be a nightmare to manage over time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow