Question

MODE is the value that occurs the MOST times in the data, there can be ONE MODE or MANY MODES

here's some values in two tables (sqlFiddle)

create table t100(id int auto_increment primary key, value int);
create table t200(id int auto_increment primary key, value int);

insert into t100(value) values (1),
                               (2),(2),(2),
                               (3),(3),
                               (4);
insert into t200(value) values (1),
                               (2),(2),(2),
                               (3),(3),
                               (4),(4),(4);

right now, to get the MODE(S) returned as comma separated list, I run the below query for table t100

     SELECT GROUP_CONCAT(value) as modes,occurs
     FROM
        (SELECT value,occurs FROM 
           (SELECT value,count(*) as occurs
            FROM
            T100
            GROUP BY value)T1,
        (SELECT max(occurs) as maxoccurs FROM 
            (SELECT value,count(*) as occurs
             FROM
             T100
             GROUP BY value)T2
        )T3
        WHERE T1.occurs = T3.maxoccurs)T4
      GROUP BY occurs;

and the below query for table t200 (same query just with table name changed) I have 2 tables in this example because to show that it works for cases where there's 1 MODE and where there are multiple MODES.

     SELECT GROUP_CONCAT(value) as modes,occurs
     FROM
        (SELECT value,occurs FROM 
           (SELECT value,count(*) as occurs
            FROM
            T200
            GROUP BY value)T1,
        (SELECT max(occurs) as maxoccurs FROM 
            (SELECT value,count(*) as occurs
             FROM
             T200
             GROUP BY value)T2
        )T3
        WHERE T1.occurs = T3.maxoccurs)T4
      GROUP BY occurs;

My question is "Is there a simpler way?"

I was thinking like using HAVING count(*) = max(count(*)) or something similar to get rid of the extra join but couldn't get HAVING to return the result i wanted.

UPDATED: as suggested by @zneak, I can simplify T3 like below:

     SELECT GROUP_CONCAT(value) as modes,occurs
     FROM
        (SELECT value,occurs FROM 
           (SELECT value,count(*) as occurs
            FROM
            T200
            GROUP BY value)T1,
        (SELECT count(*) as maxoccurs
             FROM
             T200
             GROUP BY value
             ORDER BY count(*) DESC
             LIMIT 1
        )T3
        WHERE T1.occurs = T3.maxoccurs)T4
      GROUP BY occurs;

Now is there a way to get ride of T3 altogether? I tried this but it returns no rows for some reason

  SELECT value,occurs FROM  
    (SELECT value,count(*) as occurs
     FROM t200
     GROUP BY `value`)T1
  HAVING occurs=max(occurs)  

basically I am wondering if there's a way to do it such that I only need to specify t100 or t200 once.

UPDATED: i found a way to specify t100 or t200 only once by adding a variable to set my own maxoccurs like below

  SELECT GROUP_CONCAT(CASE WHEN occurs=@maxoccurs THEN value ELSE NULL END) as modes 
  FROM 
    (SELECT value,occurs,@maxoccurs:=GREATEST(@maxoccurs,occurs) as maxoccurs
     FROM (SELECT value,count(*) as occurs
           FROM t200
           GROUP BY `value`)T1,(SELECT @maxoccurs:=0)mo
     )T2
Was it helpful?

Solution

You are very close with the last query. The following finds one mode:

SELECT value, occurs
FROM (SELECT value,count(*) as occurs
      FROM t200
      GROUP BY `value`
      LIMIT 1
     ) T1

I think your question was about multiple modes, though:

SELECT value, occurs
FROM (SELECT value, count(*) as occurs
      FROM t200
      GROUP BY `value`
     ) T1
WHERE occurs = (select max(occurs)
                from (select `value`, count(*) as occurs
                      from t200
                      group by `value`
                     ) t
               );

EDIT:

This is much easier in almost any other database. MySQL supports neither with nor window/analytic functions.

Your query (shown below) does not do what you think it is doing:

  SELECT value, occurs  
  FROM (SELECT value, count(*) as occurs
        FROM t200
        GROUP BY `value`
       ) T1
  HAVING occurs = max(occurs) ; 

The final having clause refers to the variable occurs but does use max(occurs). Because of the use of max(occurs) this is an aggregation query that returns one row, summarizing all rows from the subquery.

The variable occurs is not using for grouping. So, what value does MySQL use? It uses an arbitrary value from one of the rows in the subquery. This arbitrary value might match, or it might not. But, the value only comes from one row. There is no iteration over it.

OTHER TIPS

I realize this is a very old question but in looking for the best way to find the MODE in a MySQL table, I came up with this:

SELECT [column name], count(*) as [ccount] FROM [table] WHERE [field] = [item] GROUP BY [column name] ORDER BY [ccount] DESC LIMIT 1 ;

In my actual situation, I had a log with recorded events in it. I wanted to know during which period (1, 2 or 3 as recorded in my log) the specific event occurred the most number of times. (Eg, the MODE of "period" column of the table for that specific event

My table looked like this (abridged):

EVENT_TYPE |   PERIOD
-------------------------
   1       |     3
   1       |     3
   1       |     3
   1       |     2
   2       |     1
   2       |     1
   2       |     1
   2       |     3

Using the query:

SELECT event_type, period, count(*) as pcount FROM proto_log WHERE event_type = 1 GROUP BY period ORDER BY pcount DESC LIMIT 1 ;

I get the result:

> EVENT_TYPE   |   PERIOD   |   PCOUNT
> --------------------------------------
    1         |    3       |    3

Using this result, the period column ($result['period'] for example) should contain the MODE for that query and of course pcount contains the actual count.

If you wanted to get multiple modes, I suppse you could keep adding other criteria to your WHERE clause using ORs:

SELECT event_type, period, count(*) as pcount FROM proto_log WHERE event_type = 1 ***OR event_type = 2*** GROUP BY period ORDER BY pcount DESC LIMIT 2 ;

The multiple ORs should give you the additional results and the LIMIT increase will add the additional MODES to the results. (Otherwise it will still only show the top 1 result)

Results:

EVENT_TYPE   |   PERIOD   |   PCOUNT
--------------------------------------
   1         |   3        |    3
   2         |   1        |    3

I am not 100% sure this is doing exactly what I think it is doing, or if it will work in all situations, so please let me know if I am on or off track here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top