Why does MySQL allow “group by” queries WITHOUT aggregate functions?

https://stackoverflow.com/questions/8727675

14-04-2021
|

Question

Surprise -- this is a perfectly valid query in MySQL:

select X, Y from someTable group by X

If you tried this query in Oracle or SQL Server, you’d get the natural error message:

Column 'Y' is invalid in the select list because it is not contained in 
either an aggregate function or the GROUP BY clause.

So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.

There’s even a MySQL configuration parameter to turn off this “looseness”. http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by

This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard. http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html

My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?

Solution

I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:

SELECT user.id, user.name, COUNT(post.*) AS posts 
FROM user 
  LEFT OUTER JOIN post ON post.owner_id=user.id 
GROUP BY user.id

In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY clause (although, as you say, there is definite scope for problems)

OTHER TIPS

According to this page (the 5.0 online manual), it's for better performance and user convenience.

Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.

It sounds to me like they intended it to be treated like the "FIRST(Y)" function that many other systems have.

More than likely, this construct is something that the MySQL team regret, but don't want to stop supporting because of the number of applications that would break.

MySQL treats this is a single column DISTINCT when you use GROUP BY without an aggregate function. Using other options you either have the whole result be distinct, or have to use subqueries, etc. The question is whether the results are truly predictable.

Also, good info is in this thread.

From what I have read in the mysql reference page, it says: "You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group."

I suggest you to read this page (link to the reference manual of mysql): http://dev.mysql.com/doc/refman/5.5/en//group-by-extensions.html

Its actually a very useful tool that all other fields dont have to be in an aggregate function when you group by a field. You can manipulate the result which will be returned by simply ordering it first and then grouping it after. for instance if i wanted to get user login information and i wanted to see the last time the user logged in i would do this.

Tables

USER
user_id | name

USER_LOGIN_HISTORY 
user_id | date_logged_in

USER_LOGIN_HISTORY has multiple rows for one user so if i joined users to it it would return many rows. as i am only interested in the last entry i would do this

select 
  user_id,
  name,
  date_logged_in

from(

  select 
    u.user_id, 
    u.name, 
    ulh.date_logged_in

  from users as u

    join user_login_history as ulh
      on u.user_id = ulh.user_id

  where u.user_id = 1234

  order by ulh.date_logged_in desc 

)as table1

group by user_id

This would return one row with the name of the user and the last time that user logged in.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow