Why Mysql's Group By and Oracle's Group by behaviours are different

Question 1

The MySQL designers put in their nonstandard extension to GROUP BY in an attempt to make development easier and certain queries more efficient.

Here's their rationale.

https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html

There is a server mode called ONLY_FULL_GROUP_BY which disables the nonstandard extensions. You can set this mode using this statement.

 SET SESSION SQL_MODE='ONLY_FULL_GROUP_BY'

Here's a quote from that page, with emphasis added.

If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns... In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.

The important word here is nondeterministic. What does that mean? It means random, but worse. If the server chose random values, that implies it would return different values in different queries, so you have a chance of catching the problem when you test your software. But nondeterministic in this context means the server chooses the same value every time, until it doesn't.

Why might it change the value it chooses? A server upgrade is one reason. A change to table size might be another. The point is, the server is free to return whatever value it wants.

I wish people newly learning SQL would set this ONLY_FULL_GROUP_BY mode; they'd get much more predictable results from their queries, and the server would reject nondeterministic queries.

Question 2

Oracle does not extend the older SQL Standard that states that all items in the select list not contained in an aggregate function must be included in the group by clause.

The MySQL Docs state:

In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:

SELECT o.custid, c.name, MAX(o.payment)
  FROM orders AS o, customers AS c
  WHERE o.custid = c.custid
  GROUP BY o.custid;

For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.

So to answer your question as to why MySQL does this the most pertinent extract is:

You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.

I always advocate to steer clear of this particular MySQL extension unless you fully understand it.

Imagine the following simple table (T):

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |
2   |    A    |    Y     |

In MySQL you can write

SELECT  ID, Column1, Column2
FROM    T
GROUP BY Column1;

This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:

ID  | Column1 | Column2  |
----|---------+----------|
1   |    A    |    X     |

Is no more or less correct than

ID  | Column1 | Column2  |  
----|---------+----------|
2   |    A    |    Y     |

So what you are saying is give me one row for each distinct value of Column1, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY clause to influence the results, so for example the following query:

SELECT  ID, Column1, Column2
FROM    T
GROUP BY Column1
ORDER BY ID DESC;

Would ensure that you get the following result:

ID  | Column1 | Column2  |  
----|---------+----------|
2   |    A    |    Y     |

because of the ORDER BY ID DESC, however this is not true (as demonstrated here).

The MySQL documents state:

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.

So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-determistic.

The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependant on a column in the GROUP BY. From the SQL-2003-Standard:

15) If T is a grouped table, then let G be the set of grouping columns of T. In each contained in , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a whose aggregation query is QS.

For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard - Example here):

SELECT  ID, Column1, Column2
FROM    T
GROUP BY ID;

Since ID is unique for each row, there can only be one value of Column1 for each ID, one value of Column2 there is no ambiguity about what to return for each row.

Question 3

group by is a very well defined SQL construct. To the best of my knowledge, almost all databases will treat equivalent SQL queries the same.

I can think of two differences that might arise when comparing results from Oracle and MySQL.

The first is that Oracle treats empty strings and NULL values as the same. So the following query in Oracle:

select c, count(*)
from (select '' as c from dual union all
      select NULL from dual
     ) t
group by c;

will return one row with a count of "2". Every other database (almost every other?) follows the ANSI-standard and would return two rows with a count of 1.

The second difference is that MySQL extends the standard to allow non-aggregated columns in the select. So, MySQL will allow this:

select a, b
from t
group by a;

This generates a syntax error in almost every other database. And, if a is not a unique column in t, then this behavior violates the ANSI standard. If you are encountering a syntax error, this is probably what you are stumbling upon. Oracle is probably doing the right thing in this case, and you should learn to write better aggregation queries.

Another difference is the ordering of results in a group by. MySQL has deprecated this feature, so no code should actually depend on it any more. However, result sets are inherently unordered, unless there is a specific order by clause, so two result sets in different orders would be equivalent.