SQL only the first row of each value
-
11-03-2021 - |
Question
MYSQL version 5.7.31
How do I make it show the distinct rows of the value/column mainthreadid
with the highest value of date
?
Note that the data is filtered using
WHERE approved = 1 AND section != 150
A fiddle is available here.
Fiddle DDL and DML
Table
CREATE TABLE forum (
id int,
mainthreadid int,
section int,
approved int,
date int(20),
title varchar(255),
text varchar(255)
);
Sample Data
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (1, 1, 5, 1, 1000, "title1", "text1");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (2, 2, 5, 1, 1000, "title2", "text2");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (3, 2, 5, 1, 1001, "title3", "text3");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (4, 2, 5, 1, 1002, "title4", "text4");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (5, 2, 5, 1, 1003, "title5", "text5");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (6, 2, 5, 1, 1004, "title6", "text6");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (7, 2, 5, 0, 1005, "title7", "text7");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (8, 8, 150, 1, 1004, "title8", "text8");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (9, 1, 5, 1, 1006, "title9", "text9");
INSERT INTO forum (id, mainthreadid, section, approved, date, title, text)
VALUES (10, 1, 5, 1, 1005, "title10", "text10");
SQL Statement
SELECT date, id, mainthreadid
FROM `forum`
WHERE approved = 1 AND section != 150 ORDER BY date DESC
Resultset
My desired result set would be:
date id mainthreadid
1006 9 1
1004 6 2
Solution
Maybe you need in
SELECT forum.*
FROM forum
JOIN ( SELECT mainthreadid, MAX(`date`) `date`
FROM forum
GROUP BY mainthreadid ) lastdates USING (mainthreadid, `date`)
?
For each mainthreadid
only last row (the row with greatest date
) is selected. If there exists more than one such row (the same maximal date) then all of them will be returned.
you dont include checks for approved = 1 and if section != 150
No problems.
SELECT forum.*
FROM forum
JOIN ( SELECT mainthreadid, MAX(`date`) `date`
FROM forum
WHERE approved = 1 AND section != 150
GROUP BY mainthreadid ) lastdates USING (mainthreadid, `date`)
OTHER TIPS
This is a classic greatest-n-per-group question and the canonical way of solving these is to use Window functions.
I decided to look at answering this problem using the latest version of MySQL 8 - 8.022. I really can't urge you strongly enough to upgrade to the latest version - you get all sorts of goodies - Window functions - extremely powerful and well worth mastering (see use of ROW_NUMBER()
below), plus you also get GENERATEd
columns (also in 5.7) and you also get the powerful EXPLAIN ANALYZE
functionality - a vast improvment over the pathetic EXPLAIN EXTENDED.
Having said that, @Akina's solution is simple and elegant and works with 5.5 (the oldest available server), so pretty good - I just wanted to explore the possibilites with MySQL 8.
So, to solve your issue I did the following using @Akina's fiddle - dbfiddle.uk has many more servers than any of the other fiddles out there and also has the best interface.
I'm showing the progression of my thought process for this question - give a man a fish, he'll eat for a day, teach him to fish and he'll never go hungry! :-) Take a look at some of the questions and answers in the greatest-n-per-group tag and you'll learn a lot - that's really the ultimate goal of this site - not just simply answering questions on-demand!
The fiddle for my solution is available here. One piece of advice for your own sanity and it's good practice anyway - NEVER call your table or column names by SQL keywords (i.e. in your example, date
and text
) - I substituted my_date
and my_text
- easier to read, debug and is more portable!
Another couple of words of advice - if you want to use DATE
s, then use the DATE
datatype - if you simply use an INTEGER
, you deprive the optimizer of a chance to make the best use of the distribution of the data - ALWAYS use the appropriate type for your data. Furthermore, INT (20) is meaningless (Bill Karwin was a MySQL manager of some sort) - plus see the fiddle here.
The first thing I did was to eliminate unwanted fields and also the records to be removed according to the question's criteria.
--
-- Eliminate approved = 0 and section = 150
-- We can see that we want the max of date and mainthreadid
-- as per Akina's solution. However, we will get them in a different way.
--
-- We can see that records with rn = 1 are the desired resultset!
--
-- Unfortunately, we can't use Window functions in a WHERE clause
--
SELECT
ROW_NUMBER() OVER (PARTITION BY mainthreadid ORDER BY my_date DESC) AS rn,
my_date,
id,
mainthreadid
FROM forum
WHERE approved = 1
AND section != 150
ORDER BY mainthreadid, my_date DESC
Result:
rn my_date id mainthreadid
1 1006 9 1 <<---- desired record - rn = 1
2 1005 10 1
3 1000 1 1
1 1004 6 2 <<---- desired record - rn = 1
2 1003 5 2
3 1002 4 2
4 1001 3 2
5 1000 2 2
As mentioned in the fiddle, one can't use Window functions in a WHERE
clause, so we need to do a sub-SELECT
using this table thus:
SELECT my_date, id, mainthreadid, rn -- rn not strictly necessary
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY mainthreadid ORDER BY my_date DESC) AS rn,
my_date,
id,
mainthreadid
FROM forum
WHERE approved = 1
AND section != 150
) AS tab
WHERE rn = 1
ORDER BY my_date DESC, mainthreadid;
Result:
my_date id mainthreadid rn
1006 9 1 1
1004 6 2 1
QED!
I decided to look at the MySQL >= 8.0.18 EXPLAIN ANALYZE
functionality for both my solution and Akina's answer. The result for my answer is as follows:
EXPLAIN - Vérace
-> Sort: tab.my_date DESC, tab.mainthreadid (actual time=0.007..0.007 rows=2 loops=1)
-> Index lookup on tab using <auto_key0> (rn=1) (actual time=0.002..0.002 rows=2 loops=1)
-> Materialize (actual time=0.069..0.070 rows=2 loops=1)
-> Window aggregate: row_number() OVER (PARTITION BY forum.mainthreadid ORDER BY forum.my_date desc ) (actual time=0.044..0.050 rows=8 loops=1)
-> Sort: forum.mainthreadid, forum.my_date DESC (cost=1.25 rows=10) (actual time=0.041..0.043 rows=8 loops=1)
-> Filter: ((forum.approved = 1) and (forum.section <> 150)) (cost=1.25 rows=10) (actual time=0.026..0.031 rows=8 loops=1)
-> Table scan on forum (cost=1.25 rows=10) (actual time=0.024..0.027 rows=10 loops=1)
and then
EXPLAIN - Akina
-> Sort: forum.my_date DESC (actual time=0.071..0.071 rows=2 loops=1)
-> Stream results (actual time=0.060..0.063 rows=2 loops=1)
-> Inner hash join (forum.my_date = lastdates.my_date), (forum.mainthreadid = lastdates.mainthreadid) (actual time=0.059..0.062 rows=2 loops=1)
-> Table scan on forum (cost=0.18 rows=10) (actual time=0.007..0.010 rows=10 loops=1)
-> Hash
-> Table scan on lastdates (cost=2.73 rows=2) (actual time=0.000..0.001 rows=2 loops=1)
-> Materialize (actual time=0.040..0.041 rows=2 loops=1)
-> Table scan on <temporary> (actual time=0.000..0.001 rows=2 loops=1)
-> Aggregate using temporary table (actual time=0.032..0.033 rows=2 loops=1)
-> Filter: ((forum.approved = 1) and (forum.section <> 150)) (cost=1.25 rows=1) (actual time=0.015..0.020 rows=8 loops=1)
-> Table scan on forum (cost=1.25 rows=10) (actual time=0.014..0.017 rows=10 loops=1)
I'm not sure how to interpret these different plans yet - watch this space!
With mysql 5,7 byou need user defined variables, to get the last mainthreadid
SELECT
date, id, mainthreadid
FROM
(SELECT
date,
id,
IF(mainthreadid = @mainid, @rownum:=@rownum + 1, @rownum:=1) rownum,
@mainid:=mainthreadid mainthreadid
FROM
`forum`, (SELECT @mainid:=0, @rownum:=0) t2
WHERE
approved = 1 AND section != 150
ORDER BY mainthreadid , date DESC) t1
WHERE
rownum = 1