Select value in Left Join ordered by Date
Question
I am trying to LEFT JOIN
a value to sort by the most recent date, but for some reason it is only returning a result for the last returned result.
These are my table layouts
cs_users
+---------+-------------+
| user_id | user_login |
+---------+-------------+
| 1 | cornerstone |
| 2 | claire |
| 3 | ben |
+---------+-------------+
cs_login_log
+----------+---------------+-----------------+---------------------+--------------+
| login_id | login_user_id | login_user_type | login_dtm | login_status |
+----------+---------------+-----------------+---------------------+--------------+
| 2 | 1 | 1 | 2019-09-02 15:23:38 | 1 |
| 6 | 1 | 1 | 2019-09-02 15:49:20 | 1 |
| 8 | 2 | 1 | 2019-09-02 16:10:14 | 1 |
| 9 | 3 | 1 | 2019-09-02 16:12:14 | 1 |
| 10 | 5 | 1 | 2019-09-02 16:12:33 | 1 |
| 25 | 2 | 1 | 2019-09-15 19:18:07 | 1 |
+----------+---------------+-----------------+---------------------+--------------+
The code I've tried so far is as follows:
Attempt 1
SELECT u.user_id,
u.user_login,
l.login_dtm
FROM cs_users AS u
LEFT JOIN (
SELECT * FROM cs_login_log WHERE login_user_type = '1' AND login_status = '1'
) l ON l.login_user_id = u.user_id
WHERE u.user_status = '1'
ORDER BY user_login ASC LIMIT 25
Attempt 2
SELECT u.user_id,
u.user_login,
l.login_dtm
FROM cs_users AS u
LEFT JOIN (
SELECT login_user_id, login_dtm, MAX(login_dtm) FROM cs_login_log WHERE login_user_type = '1' AND login_status = '1'
) l ON l.login_user_id = u.user_id
WHERE u.user_status = '1'
ORDER BY user_login ASC LIMIT 25
But for some reason in all of the results it only returns the most recent login of the last row!
Returned data
+---------+-------------+---------------------+
| user_id | user_login | login_dtm |
+---------+-------------+---------------------+
| 3 | ben | null |
| 2 | claire | null |
| 1 | cornerstone | 2019-09-02 15:23:38 |
+---------+-------------+---------------------+
I'm only new to this kind of JOIN
so no idea if I'm even anywhere near being close? Basically I want to be able to get a list of my users and their most recent successful login timestamp.
Solution
You're on the right track - except that you need a RIGHT JOIN
and not a LEFT
one - that's IF you want to include the person with the login_user_id
of 5
in your results. If not, a LEFT JOIN
will indeed work.
Firstly, the RIGHT JOIN
. You're either looking for this (works on version 5.6 - see fiddle here):
SELECT
tab.login_user_id AS id,
COALESCE(cs.user_login, '---------') AS name,
COALESCE(cs.user_id, 0) AS lid,
tab.dtm AS tme
FROM
cs_users cs
RIGHT JOIN
(
SELECT login_user_id, MAX(login_dtm) AS dtm
FROM cs_login_log
-- WHERE clauses as applicable
GROUP BY login_user_id
) tab
ON cs.user_id = tab.login_user_id
-- WHERE clauses as applicable
ORDER BY tab.login_user_id;
or something like this which works for version 8 up (fiddle).
SELECT
cs.user_id AS id,
COALESCE(cs.user_login, '--------') AS name,
tab.ltm AS tme,
tab.rn
FROM
cs_users cs
RIGHT JOIN
(
SELECT
ROW_NUMBER() OVER (PARTITION BY csl.login_user_id
ORDER BY csl.login_dtm DESC) AS rn,
csl.login_user_id AS id,
csl.login_dtm AS ltm
FROM cs_login_log csl
-- WHERE clauses as applicable
) tab
ON cs.user_id = tab.id
WHERE tab.rn = 1
-- AND whatever other clauses are applicable
ORDER BY tab.id;
This second query uses the (not strictly necessary in this case) ROW_NUMBER() Analytic (aka Window) function. These functions are very powerful and well worth getting to know and I would urge you to explore this fiddle and at least "play with" an instance of version 8 on your own personal laptop if you don't have it in work.
Also the COALESCE
function is not strictly necessary here - but it can be a great help for the likes of reporting, making them much more aesthetically pleasing.
Result:
id name lid tme
1 cornerstone 1 2019-09-02 15:49:20
2 claire 2 2019-09-15 19:18:07
3 ben 3 2019-09-02 16:12:14
5 --------- 0 2019-09-02 16:12:33
You can use EXPLAIN EXTENDED
to see which one works better for you - I have no running MySQL instances, only fiddles, so I can't do that sort of performance analysis. From version 8.20 there will be serious enhancements to this part of MySQL's functionality in this area - well worth an upgrade.
On the other hand, if you're only looking for the most recent login times for the three users in the cs_users
table, then all you have to do is simply turn the RIGHT JOIN
into a LEFT
one (see the fiddle here).
SELECT
tab.login_user_id AS id,
COALESCE(cs.user_login, '---------') AS name,
COALESCE(cs.user_id, 0) AS lid,
tab.dtm AS tme
FROM
cs_users cs
LEFT JOIN
(
SELECT login_user_id, MAX(login_dtm) AS dtm
FROM cs_login_log
WHERE login_user_type = '1' AND login_status = '1'
GROUP BY login_user_id
) tab
ON cs.user_id = tab.login_user_id
WHERE cs.user_status = 1 -- no data provided!
ORDER BY tab.login_user_id;
Result:
id name lid tme
1 cornerstone 1 2019-09-02 15:49:20
2 claire 2 2019-09-15 19:18:07
3 ben 3 2019-09-02 16:12:14
As noted in the SQL, I have no data provided for the cs_users
field user_status
(my working assumption is that it's 1 for everybody). The key to your understanding this problem is to realise that there is an ENORMOUS bug in all versions of MySQL prior to (AFAIK) 5.7 (I'm sure that all versions of 8 don't suffer from it - at least by default). See the above fiddle also.
There is a (super-)variable called sql_mode
which is (more or less) one massive hack! There are various bits and pieces you can add or subtract from this variable - it's basically a long line of different sub-variables. One of these is ONLY_FULL_GROUP_BY
which isn't on by default in version 5.6 - the one that you're using. This means that queries like this:
SELECT
login_user_id,
login_dtm,
MAX(login_dtm)
FROM cs_login_log
WHERE login_user_type = '1'
AND login_status = '1'
give a result with your data.
login_user_id login_dtm MAX(login_dtm)
1 2019-09-02 15:23:38 2019-09-15 19:18:07
Try running that query on PostgreSQL (a decent server!) and you'll get this (fiddle):
ERROR: column "cs_login_log.login_user_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2: login_user_id,
So, the reason that your (second) SQL didn't work as you'd hoped was because your internal query only returned one data point and therefore you could only join to that one record and you got NULL
for the rest - to be expected. When you GROUP BY
the login_user_id
(as @Akina also suggested) you get a result for all the users, which is what you want.
My advice is to add ONLY_FULL_GROUP_BY
to the sql_mode
of your server and you won't be caught by this again! It bit me in the ass when I was starting out on my SQL journey also! :-)
What I did was created and populated the tables:
CREATE TABLE cs_users
(
user_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_login VARCHAR (20) NOT NULL
);
and
CREATE TABLE cs_login_log
(
login_id INTEGER NOT NULL,
login_user_id INTEGER NOT NULL,
login_user_type INTEGER NOT NULL,
login_dtm DATETIME NOT NULL,
login_status INTEGER NOT NULL
);
populate:
INSERT INTO cs_users (user_login)
VALUES
('cornerstone'), ('claire'), ('ben');
and
INSERT INTO cs_login_log
VALUES
( 2 , 1 , 1 , '2019-09-02 15:23:38' , 1 ),
( 6 , 1 , 1 , '2019-09-02 15:49:20' , 1 ),
( 8 , 2 , 1 , '2019-09-02 16:10:14' , 1 ),
( 9 , 3 , 1 , '2019-09-02 16:12:14' , 1 ),
( 10 , 5 , 1 , '2019-09-02 16:12:33' , 1 ),
( 25 , 2 , 1 , '2019-09-15 19:18:07' , 1 );
Run the query (either one):
SELECT
tab.login_user_id AS id,
COALESCE(cs.user_login, '---------') AS name,
COALESCE(cs.user_id, 0) AS lid,
tab.dtm AS tme
FROM
cs_users cs
RIGHT JOIN
(
SELECT login_user_id, MAX(login_dtm) AS dtm
FROM cs_login_log
GROUP BY login_user_id
-- WHERE clauses as applicable
) tab
ON cs.user_id = tab.login_user_id
-- WHERE clauses as applicable
ORDER BY tab.login_user_id;
Result:
id name lid tme
1 cornerstone 1 2019-09-02 15:49:20
2 claire 2 2019-09-15 19:18:07
3 ben 3 2019-09-02 16:12:14
5 --------- 0 2019-09-02 16:12:33
Note the name of '--------'
and lid
of 0
for user with login_user_id
of 5
in the cs_login_log
table. Of course, these placeholders are there because there is nobody with a user_id
of 5
in the cs_users
table.
In the second fiddle, I've left in the rn
(ROW_NUMBER()
) field - yet again, not strictly necessary, but it helps to show my train of thought as I constructed the query - obviously, delete as applicable!
Next time you ask a question, you might get better responses if you include your tables as DDL (CREATE TABLE foo (...);
) and your table data as DML (INSERT INTO foo VALUES (...);
), or you could construct a fiddle (just make sure to put the fiddle stuff here also) - help us to help you! There are some articles on how to ask questions here on my profile, you might want to take a look? p.s. welcome to the forum!