Question

MYSQL version 5.7.31

How do I make it show the distinct rows of the value/column mainthreadid with the highest value of date?

Note that the data is filtered using

WHERE approved = 1 AND section != 150 

A fiddle is available here.

Fiddle DDL and DML

Table

CREATE TABLE forum (
    id int,
    mainthreadid int,
    section int,
    approved int,
    date int(20),
    title varchar(255),
    text varchar(255)
   
);

Sample Data

INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (1, 1, 5, 1, 1000, "title1", "text1");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (2, 2, 5, 1, 1000, "title2", "text2");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (3, 2, 5, 1, 1001, "title3", "text3");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (4, 2, 5, 1, 1002, "title4", "text4");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (5, 2, 5, 1, 1003, "title5", "text5");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (6, 2, 5, 1, 1004, "title6", "text6");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (7, 2, 5, 0, 1005, "title7", "text7");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (8, 8, 150, 1, 1004, "title8", "text8");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (9, 1, 5, 1, 1006, "title9", "text9");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (10, 1, 5, 1, 1005, "title10", "text10");

SQL Statement

SELECT date, id, mainthreadid 
FROM `forum` 
WHERE approved = 1 AND section != 150 ORDER BY date DESC

Resultset

My desired result set would be:

date    id  mainthreadid
1006    9   1
1004    6   2
Was it helpful?

Solution

Maybe you need in

SELECT forum.*
FROM forum
JOIN ( SELECT mainthreadid, MAX(`date`) `date`
       FROM forum
       GROUP BY mainthreadid ) lastdates USING (mainthreadid, `date`)

?

fiddle

For each mainthreadid only last row (the row with greatest date) is selected. If there exists more than one such row (the same maximal date) then all of them will be returned.


you dont include checks for approved = 1 and if section != 150

No problems.

SELECT forum.*
FROM forum
JOIN ( SELECT mainthreadid, MAX(`date`) `date`
       FROM forum
       WHERE approved = 1 AND section != 150
       GROUP BY mainthreadid ) lastdates USING (mainthreadid, `date`)

fiddle

OTHER TIPS

This is a classic greatest-n-per-group question and the canonical way of solving these is to use Window functions.

I decided to look at answering this problem using the latest version of MySQL 8 - 8.022. I really can't urge you strongly enough to upgrade to the latest version - you get all sorts of goodies - Window functions - extremely powerful and well worth mastering (see use of ROW_NUMBER() below), plus you also get GENERATEd columns (also in 5.7) and you also get the powerful EXPLAIN ANALYZE functionality - a vast improvment over the pathetic EXPLAIN EXTENDED.

Having said that, @Akina's solution is simple and elegant and works with 5.5 (the oldest available server), so pretty good - I just wanted to explore the possibilites with MySQL 8.

So, to solve your issue I did the following using @Akina's fiddle - dbfiddle.uk has many more servers than any of the other fiddles out there and also has the best interface.

I'm showing the progression of my thought process for this question - give a man a fish, he'll eat for a day, teach him to fish and he'll never go hungry! :-) Take a look at some of the questions and answers in the greatest-n-per-group tag and you'll learn a lot - that's really the ultimate goal of this site - not just simply answering questions on-demand!

The fiddle for my solution is available here. One piece of advice for your own sanity and it's good practice anyway - NEVER call your table or column names by SQL keywords (i.e. in your example, date and text) - I substituted my_date and my_text - easier to read, debug and is more portable!

Another couple of words of advice - if you want to use DATEs, then use the DATE datatype - if you simply use an INTEGER, you deprive the optimizer of a chance to make the best use of the distribution of the data - ALWAYS use the appropriate type for your data. Furthermore, INT (20) is meaningless (Bill Karwin was a MySQL manager of some sort) - plus see the fiddle here.

The first thing I did was to eliminate unwanted fields and also the records to be removed according to the question's criteria.

--
-- Eliminate approved = 0 and section = 150
-- We can see that we want the max of date and mainthreadid 
-- as per Akina's solution. However, we will get them in a different way.
--
-- We can see that records with rn = 1 are the desired resultset!
--
-- Unfortunately, we can't use Window functions in a WHERE clause
--
SELECT
  ROW_NUMBER() OVER (PARTITION BY mainthreadid ORDER BY my_date DESC) AS rn,
  my_date,
  id,
  mainthreadid
FROM forum
WHERE approved = 1
AND section != 150
ORDER BY mainthreadid, my_date DESC

Result:

rn  my_date id  mainthreadid
 1     1006  9      1         <<---- desired record - rn = 1
 2     1005 10      1
 3     1000  1      1
 1     1004  6      2         <<---- desired record - rn = 1
 2     1003  5      2
 3     1002  4      2
 4     1001  3      2
 5     1000  2      2

As mentioned in the fiddle, one can't use Window functions in a WHERE clause, so we need to do a sub-SELECT using this table thus:

SELECT my_date, id, mainthreadid, rn -- rn not strictly necessary
FROM
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY mainthreadid ORDER BY my_date DESC) AS rn,
    my_date,
    id,
    mainthreadid
  FROM forum
  WHERE approved = 1
  AND section != 150
) AS tab
WHERE rn = 1
ORDER BY my_date DESC, mainthreadid;

Result:

my_date id  mainthreadid    rn
   1006  9      1            1
   1004  6      2            1

QED!

I decided to look at the MySQL >= 8.0.18 EXPLAIN ANALYZE functionality for both my solution and Akina's answer. The result for my answer is as follows:

EXPLAIN - Vérace
-> Sort: tab.my_date DESC, tab.mainthreadid  (actual time=0.007..0.007 rows=2 loops=1)
    -> Index lookup on tab using <auto_key0> (rn=1)  (actual time=0.002..0.002 rows=2 loops=1)
        -> Materialize  (actual time=0.069..0.070 rows=2 loops=1)
            -> Window aggregate: row_number() OVER (PARTITION BY forum.mainthreadid ORDER BY forum.my_date desc )   (actual time=0.044..0.050 rows=8 loops=1)
                -> Sort: forum.mainthreadid, forum.my_date DESC  (cost=1.25 rows=10) (actual time=0.041..0.043 rows=8 loops=1)
                    -> Filter: ((forum.approved = 1) and (forum.section <> 150))  (cost=1.25 rows=10) (actual time=0.026..0.031 rows=8 loops=1)
                        -> Table scan on forum  (cost=1.25 rows=10) (actual time=0.024..0.027 rows=10 loops=1)

and then

EXPLAIN - Akina
-> Sort: forum.my_date DESC  (actual time=0.071..0.071 rows=2 loops=1)
    -> Stream results  (actual time=0.060..0.063 rows=2 loops=1)
        -> Inner hash join (forum.my_date = lastdates.my_date), (forum.mainthreadid = lastdates.mainthreadid)  (actual time=0.059..0.062 rows=2 loops=1)
            -> Table scan on forum  (cost=0.18 rows=10) (actual time=0.007..0.010 rows=10 loops=1)
            -> Hash
                -> Table scan on lastdates  (cost=2.73 rows=2) (actual time=0.000..0.001 rows=2 loops=1)
                    -> Materialize  (actual time=0.040..0.041 rows=2 loops=1)
                        -> Table scan on <temporary>  (actual time=0.000..0.001 rows=2 loops=1)
                            -> Aggregate using temporary table  (actual time=0.032..0.033 rows=2 loops=1)
                                -> Filter: ((forum.approved = 1) and (forum.section <> 150))  (cost=1.25 rows=1) (actual time=0.015..0.020 rows=8 loops=1)
                                    -> Table scan on forum  (cost=1.25 rows=10) (actual time=0.014..0.017 rows=10 loops=1)

I'm not sure how to interpret these different plans yet - watch this space!

With mysql 5,7 byou need user defined variables, to get the last mainthreadid

SELECT 
    date, id, mainthreadid
FROM
    (SELECT 
        date,
            id,
            IF(mainthreadid = @mainid, @rownum:=@rownum + 1, @rownum:=1) rownum,
            @mainid:=mainthreadid mainthreadid
    FROM
        `forum`, (SELECT @mainid:=0, @rownum:=0) t2
    WHERE
        approved = 1 AND section != 150
    ORDER BY mainthreadid , date DESC) t1
WHERE
    rownum = 1
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top