SQL max value in one column

Question 1

Well, I think I found the answer myself. As far as I can understand, a query like this will take a lot of time, and instead the database needs to be modified. I found this:

How to version control a record in a database

The suggestion to use startend and enddate columns and set the enddate to null for the latest version made it very easy to do queries for the latest version. And it is again very very fast. So this is what I needed. It gives me something like this, all put together:

SELECT
  r.id, r.title,
  u.name AS 'created_by', m.name AS 'modified_by', r.version, r.version_displayname, r.informationtype,
r.filetype, r.base_id, r.resource_id, r.created, r.modified,
  GROUP_CONCAT( CONCAT(CAST(c.id as CHAR),',',c.name,',',c.value) separator ';') AS 'categories', startdate, enddate
FROM
  resource r
  INNER JOIN 
  (SELECT
   DISTINCT r.id AS id
  FROM
   resource r
  INNER JOIN
   category_resource cr1 ON (r.id = cr1.resource_id)
  WHERE
   cr1.category_id IN (9)
) mr

    ON r.id = mr.id
  INNER JOIN category_resource cr
    ON r.id = cr.resource_id
  INNER JOIN category c
    ON cr.category_id = c.id
  INNER JOIN user u
    ON r.created_by = u.id
  INNER JOIN user m
    ON r.modified_by = m.id
WHERE r.enddate is null
group by r.id;

And this query once again is back to the 20 ms execution time.

Question 2

Given your own answer, your question was basically the same as in the link you supplied. Since you had some sub-questions I'll try to give you some additional help there.

If you want to have some kind of version control in your database, then you basically extend your primary key by some version column(s). I'd vote for using startdate/enddate-colums, too for the reason you mentioned. Given your own answer, you could modify your layout accordingly. That's the route you should go if you can!

In your given example it is not clear what the primary key is, since the 'id' column has changing values, too. In your case the primary key would be the column 'title'. So you could use some query like

SELECT title, max(version) as version FROM resource GROUP BY title

to get a result in which you see your original primary key and the latest version -- which together form your actual primary key.

To get all other fields in that table, you'd join that result to the resource table and use the primary key fields as join condition.

SELECT * FROM (
        SELECT title, max(version) as version 
        FROM resource 
        GROUP BY title) as s 
    INNER JOIN resource r on (r.title = s.title AND r.version = s.version)

Why did your query give you wrong results?

The reason is, that you had an error in your query that MySQL somewhat "fixed" for you. Normally you would need to supply every column that you did not use in an aggregate function (like MAX()) in your GROUP BY clause. In Your example

SELECT id, title, MAX(version) AS 'version' FROM resource GROUP BY title

you had a colum ('id') in the select-part of your query that you didn't supply in your GROUP BY clause.

In MySQL you can ignore that rule (see here).

When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.

Since the 'id' column had different values for your key (the 'title' column) you just got some result -- in that case MySQL probably just used the first row it found. But the result itself is undefined and might be subject to change e.g. when the database gets updated or the data grows. You should not depend on rules you deduce from results you see while testing!

On other databases like oracle and SQL-Server you would have gotten an error trying to execute that last query.

I hope I could clarify the reason for your results a little.

Question 3

What if you try something like this : -

SELECT r.id
     , r.title
     , u.name created_by
     , m.name modified_by
     , r.version
     , r.version_displayname
     , r.informationtype
     , r.filetype
     , r.base_id
     , r.resource_id
     , r.created
     , r.modified
     , GROUP_CONCAT( CONCAT(CAST(c.id as CHAR),',',c.name,',',c.value) separator ';') categories 
  FROM resource r 
  JOIN category_resource cr 
    ON r.id = cr.resource_id 
  JOIN category c 
    ON cr.category_id = c.id 
  JOIN user u 
    ON r.created_by = u.id 
  JOIN user m 
    ON r.modified_by = m.id 
 WHERE r.base_id = 'uuid_033a7198-a213-11e3-93de-2b47e5a489c2' 
   AND r.version = (SELECT MAX(r1.version) FROM resource r1 where r1.base_id = r.base_id group by r.base_id) 
 GROUP 
    BY r.id;

Question 4

Similar to Steve's answer, you could use the following:-

Select
id = (Select id From Resources R2 Where R2.title = R1.title And R2.version = (Select Max(version) From Resources R2 Where R2.title = R1.title)),
R1.title, 
version = (Select Max(version) From Resources R3 Where R3.title = R1.title) 
From Resources R1 
Group By R1.title
Order By R1.title

Question 5

Try using windowing functions:

SELECT x.* FROM (
    SELECT 
       r.id
     , r.title
     , u.name created_by
     , m.name modified_by
     , r.version
     , row_indicator=row_number() over (partition by r.base_id order by r.version desc)
     , r.version_displayname
     , r.informationtype
     , r.filetype
     , r.base_id
     , r.resource_id
     , r.created
     , r.modified
     , GROUP_CONCAT( CONCAT(CAST(c.id as CHAR),',',c.name,',',c.value) separator ';')     categories 
     FROM resource r 
     JOIN category_resource cr 
     ON r.id = cr.resource_id 
     JOIN category c 
     ON cr.category_id = c.id 
     JOIN user u 
     ON r.created_by = u.id 
     JOIN user m 
     ON r.modified_by = m.id 
     WHERE r.base_id = 'uuid_033a7198-a213-11e3-93de-2b47e5a489c2'
) x
where row_indicator = 1

The key part is the use of the row_number() windowing function. If you look up SQL Server Window Functions, you will find they are VERY powerful and eliminate the need for subqueries and/or self-joins in a lot of cases like this.

To filter by the row_number() (aliased as "row_indicator"), you have to wrap the query in an inline view. Since the partition clause used with the row_number() function sorts by version descending, the highest number version will receive a row_number() of 1.

Good luck!

Question 6

I wrote this from the perspective of SQL Server (2005), but I suspect it will be the same in MySQL.

First, your example query would result in an error:

SELECT id, title, MAX(version) AS 'version' FROM Resource GROUP BY title

Msg 8120, Level 16, State 1, Line XX Column 'Resource.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

And the addition of the ID to fix the error would provide clues to why it won't work to accomplish your goal. If you include the ID in your grouping, you won't filter your "duplicate" titles. You could instead use MAX(ID), and that would probably result in correct data, but (1) it would only be as reliable if higher version numbers were always inserted after earlier version numbers, and (2) the query would become more complicated as you added fields, because they would also be involved in the grouping.

Instead, you can simply find the "TOP" entry in the table for each of the items in the distinct list. You can accomplish this with a query like this:

-- Populate Test Data
DECLARE @Resource TABLE
(
    ID int IDENTITY,
    Title varchar(100),
    Version int
);
INSERT INTO @Resource (Title, Version) VALUES ('Introduction', 1);
INSERT INTO @Resource (Title, Version) VALUES ('Technical Data', 1);
INSERT INTO @Resource (Title, Version) VALUES ('Warranty', 1);
INSERT INTO @Resource (Title, Version) VALUES ('Product Description', 1);
INSERT INTO @Resource (Title, Version) VALUES ('Warranty', 2);
INSERT INTO @Resource (Title, Version) VALUES ('Introduction', 2);
INSERT INTO @Resource (Title, Version) VALUES ('Technical Data', 3);

-- Query with desired results    
SELECT
    *
FROM        @Resource r1
WHERE       r1.ID =
            (
                SELECT
                    TOP 1 r2.ID
                FROM        @Resource r2
                WHERE       r2.Title = r1.Title
                ORDER BY    r2.Version DESC,
                            r2.ID DESC
            );

If you can guarantee that there won't be a duplicate Version number for a given Title, you can use either of these methods (each of which produces the same query plan):

SELECT
    *
FROM        @Resource r1
WHERE       r1.Version =
            (
                SELECT
                    MAX(r2.Version)
                FROM        @Resource r2
                WHERE       r2.Title = r1.Title
            )
ORDER BY    r1.Title;

SELECT      r1.*
FROM        (
                SELECT
                    r2.Title,
                    MAX(r2.Version) AS MaxVersion
                FROM        @Resource r2
                GROUP BY    r2.Title
            ) AS MaxVerList
JOIN        @Resource r1
ON          r1.Title = MaxVerList.Title
AND         r1.Version = MaxVerList.MaxVersion
ORDER BY    r1.Title;

Question 7

Using the Data Riley produced, changing the @ to a # for a temp table, and again from a SQl Server 2008 perspective but it's core SQL the following should work without overly causing performance issues.

SELECT
    *
FROM   #Resource r1
WHERE r1.Version = (SELECT MAX(r2.Version) 
FROM #Resource r2 WHERE r1.Title = r2.Title )
ORDER BY r1.ID

Gives the correct answer

ID    Title                  Version
4     Product Description    1
5     Warranty               2
6     Introduction           2
7     Technical Data         3

You're looking for the Max(Version) per Title from what I can see. The major cost on this query is the order by as there are no indexes.