Domanda

I am trying to LEFT JOIN a value to sort by the most recent date, but for some reason it is only returning a result for the last returned result.

These are my table layouts

cs_users

+---------+-------------+
| user_id | user_login  |
+---------+-------------+
|       1 | cornerstone |
|       2 | claire      |
|       3 | ben         |
+---------+-------------+

cs_login_log

+----------+---------------+-----------------+---------------------+--------------+
| login_id | login_user_id | login_user_type |      login_dtm      | login_status |
+----------+---------------+-----------------+---------------------+--------------+
|        2 |             1 |               1 | 2019-09-02 15:23:38 |            1 |
|        6 |             1 |               1 | 2019-09-02 15:49:20 |            1 |
|        8 |             2 |               1 | 2019-09-02 16:10:14 |            1 |
|        9 |             3 |               1 | 2019-09-02 16:12:14 |            1 |
|       10 |             5 |               1 | 2019-09-02 16:12:33 |            1 |
|       25 |             2 |               1 | 2019-09-15 19:18:07 |            1 |
+----------+---------------+-----------------+---------------------+--------------+

The code I've tried so far is as follows:

Attempt 1

SELECT u.user_id,
u.user_login,
l.login_dtm
FROM cs_users AS u
LEFT JOIN (
  SELECT * FROM cs_login_log WHERE login_user_type = '1' AND login_status = '1'
) l ON l.login_user_id = u.user_id
WHERE u.user_status = '1'
ORDER BY user_login ASC LIMIT 25

Attempt 2

SELECT u.user_id,
u.user_login,
l.login_dtm
FROM cs_users AS u
LEFT JOIN (
  SELECT login_user_id, login_dtm, MAX(login_dtm) FROM cs_login_log WHERE login_user_type = '1' AND login_status = '1'
) l ON l.login_user_id = u.user_id
WHERE u.user_status = '1'
ORDER BY user_login ASC LIMIT 25

But for some reason in all of the results it only returns the most recent login of the last row!

Returned data

+---------+-------------+---------------------+
| user_id | user_login  |      login_dtm      |
+---------+-------------+---------------------+
|       3 | ben         | null                |
|       2 | claire      | null                |
|       1 | cornerstone | 2019-09-02 15:23:38 |
+---------+-------------+---------------------+

I'm only new to this kind of JOIN so no idea if I'm even anywhere near being close? Basically I want to be able to get a list of my users and their most recent successful login timestamp.

È stato utile?

Soluzione

You're on the right track - except that you need a RIGHT JOIN and not a LEFT one - that's IF you want to include the person with the login_user_id of 5 in your results. If not, a LEFT JOIN will indeed work.

Firstly, the RIGHT JOIN. You're either looking for this (works on version 5.6 - see fiddle here):

SELECT 
  tab.login_user_id AS id,
  COALESCE(cs.user_login, '---------') AS name,
  COALESCE(cs.user_id, 0) AS lid, 
  tab.dtm AS tme
FROM
  cs_users cs
RIGHT JOIN
(
  SELECT login_user_id, MAX(login_dtm) AS dtm
  FROM cs_login_log
  -- WHERE clauses as applicable
  GROUP BY login_user_id
) tab
ON cs.user_id = tab.login_user_id
-- WHERE clauses as applicable
ORDER BY tab.login_user_id;

or something like this which works for version 8 up (fiddle).

SELECT 
  cs.user_id AS id, 
  COALESCE(cs.user_login, '--------') AS name,
  tab.ltm AS tme,
  tab.rn
FROM
  cs_users cs
RIGHT JOIN
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY csl.login_user_id 
                     ORDER BY csl.login_dtm DESC) AS rn,
    csl.login_user_id AS id,
    csl.login_dtm AS ltm
  FROM cs_login_log csl
  -- WHERE clauses as applicable
) tab
ON cs.user_id = tab.id
WHERE tab.rn = 1
-- AND whatever other clauses are applicable
ORDER BY tab.id;

This second query uses the (not strictly necessary in this case) ROW_NUMBER() Analytic (aka Window) function. These functions are very powerful and well worth getting to know and I would urge you to explore this fiddle and at least "play with" an instance of version 8 on your own personal laptop if you don't have it in work.

Also the COALESCE function is not strictly necessary here - but it can be a great help for the likes of reporting, making them much more aesthetically pleasing.

Result:

id  name        lid tme
1   cornerstone 1   2019-09-02 15:49:20
2   claire      2   2019-09-15 19:18:07
3   ben         3   2019-09-02 16:12:14
5   ---------   0   2019-09-02 16:12:33

You can use EXPLAIN EXTENDED to see which one works better for you - I have no running MySQL instances, only fiddles, so I can't do that sort of performance analysis. From version 8.20 there will be serious enhancements to this part of MySQL's functionality in this area - well worth an upgrade.

On the other hand, if you're only looking for the most recent login times for the three users in the cs_users table, then all you have to do is simply turn the RIGHT JOIN into a LEFT one (see the fiddle here).

SELECT 
  tab.login_user_id AS id,
  COALESCE(cs.user_login, '---------') AS name,
  COALESCE(cs.user_id, 0) AS lid, 
  tab.dtm AS tme
FROM
  cs_users cs
LEFT JOIN
(
  SELECT login_user_id, MAX(login_dtm) AS dtm
  FROM cs_login_log
  WHERE login_user_type = '1' AND login_status = '1'
  GROUP BY login_user_id
) tab
ON cs.user_id = tab.login_user_id
WHERE cs.user_status = 1  -- no data provided!
ORDER BY tab.login_user_id;

Result:

id  name    lid tme
1   cornerstone 1   2019-09-02 15:49:20
2   claire  2   2019-09-15 19:18:07
3   ben 3   2019-09-02 16:12:14

As noted in the SQL, I have no data provided for the cs_users field user_status (my working assumption is that it's 1 for everybody). The key to your understanding this problem is to realise that there is an ENORMOUS bug in all versions of MySQL prior to (AFAIK) 5.7 (I'm sure that all versions of 8 don't suffer from it - at least by default). See the above fiddle also.

There is a (super-)variable called sql_mode which is (more or less) one massive hack! There are various bits and pieces you can add or subtract from this variable - it's basically a long line of different sub-variables. One of these is ONLY_FULL_GROUP_BY which isn't on by default in version 5.6 - the one that you're using. This means that queries like this:

SELECT 
  login_user_id, 
  login_dtm, 
  MAX(login_dtm) 
FROM cs_login_log 
WHERE login_user_type = '1' 
  AND login_status = '1'

give a result with your data.

login_user_id   login_dtm                      MAX(login_dtm)
            1   2019-09-02 15:23:38 2019-09-15 19:18:07

Try running that query on PostgreSQL (a decent server!) and you'll get this (fiddle):

ERROR:  column "cs_login_log.login_user_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2:   login_user_id, 

So, the reason that your (second) SQL didn't work as you'd hoped was because your internal query only returned one data point and therefore you could only join to that one record and you got NULL for the rest - to be expected. When you GROUP BY the login_user_id (as @Akina also suggested) you get a result for all the users, which is what you want.

My advice is to add ONLY_FULL_GROUP_BY to the sql_mode of your server and you won't be caught by this again! It bit me in the ass when I was starting out on my SQL journey also! :-)

What I did was created and populated the tables:

CREATE TABLE cs_users
(
  user_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
  user_login VARCHAR (20) NOT NULL
);

and

CREATE TABLE cs_login_log
(
  login_id INTEGER NOT NULL,
  login_user_id INTEGER NOT NULL,
  login_user_type INTEGER NOT NULL,
  login_dtm DATETIME NOT NULL,
  login_status INTEGER NOT NULL
);

populate:

INSERT INTO cs_users (user_login)
VALUES
('cornerstone'), ('claire'), ('ben');

and

INSERT INTO cs_login_log
VALUES
(        2 ,             1 ,               1 , '2019-09-02 15:23:38' ,            1 ),
(        6 ,             1 ,               1 , '2019-09-02 15:49:20' ,            1 ),
(        8 ,             2 ,               1 , '2019-09-02 16:10:14' ,            1 ),
(        9 ,             3 ,               1 , '2019-09-02 16:12:14' ,            1 ),
(       10 ,             5 ,               1 , '2019-09-02 16:12:33' ,            1 ),
(       25 ,             2 ,               1 , '2019-09-15 19:18:07' ,            1 );

Run the query (either one):

SELECT 
  tab.login_user_id AS id,
  COALESCE(cs.user_login, '---------') AS name,
  COALESCE(cs.user_id, 0) AS lid, 
  tab.dtm AS tme
FROM
  cs_users cs
RIGHT JOIN
(
  SELECT login_user_id, MAX(login_dtm) AS dtm
  FROM cs_login_log
  GROUP BY login_user_id
  -- WHERE clauses as applicable
) tab
ON cs.user_id = tab.login_user_id
-- WHERE clauses as applicable
ORDER BY tab.login_user_id;

Result:

id  name        lid tme
1   cornerstone 1   2019-09-02 15:49:20
2   claire      2   2019-09-15 19:18:07
3   ben         3   2019-09-02 16:12:14
5   ---------   0   2019-09-02 16:12:33

Note the name of '--------' and lid of 0 for user with login_user_id of 5 in the cs_login_log table. Of course, these placeholders are there because there is nobody with a user_id of 5 in the cs_users table.

In the second fiddle, I've left in the rn (ROW_NUMBER()) field - yet again, not strictly necessary, but it helps to show my train of thought as I constructed the query - obviously, delete as applicable!

Next time you ask a question, you might get better responses if you include your tables as DDL (CREATE TABLE foo (...);) and your table data as DML (INSERT INTO foo VALUES (...);), or you could construct a fiddle (just make sure to put the fiddle stuff here also) - help us to help you! There are some articles on how to ask questions here on my profile, you might want to take a look? p.s. welcome to the forum!

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top